Monday, December 14, 2015

Large System Management

I've recently been involved in taming a large scale system that wasn't really planned, designed consistently, or even maintained properly. My observation was a chaotic system with the current implementation practices leading it toward more chaos.

It wasn't just a system that I had to observe and learn, but it was key to observe how personnel were interacting with the system. My initial changes involved automating some data collection and reporting that was being done manually within spreadsheets. Needless to say, the data was not accurate and maintaining a spreadsheet was overwhelmingly burdensome.

Another problem area was with Java clients that were ridiculously slow and presented less than complete pieces of information. I simply reverse engineered the Java client to find it just getting device data via snmp. A simple Perl script replaced the Java GUI status screen and executed 40x faster. I could literally collect the data from 40 different devices in the time the Java GUI presented the incomplete data for a single device. Needless to say, I stored the retrieved data in a database for use.

Today, these Perl scripts collect the data from a cron job and we always have a fresh data set based on what the equipment tells us rather than a poorly maintained spreadsheet.

Some of the other changes involved standardizing the way certain items were named and packaged. Items were being named in such a way that it caused more work to properly identify them and their usage. I'm still changing about a decade worth of ill named items, but the new convention is in p,ace and it is already making an unnecessary complexity into something trivial and maintainable.

Like the Chinese proverb says: A journey of 1000 miles begins with a single step.

I'd add to this proverb... A step in the correct direction.


No comments: