Wednesday, November 21, 2012

Processing a Moving Average using a Window Function with a Perl script

I typically collect lots of data from a wide variety of devices using snmp and store them in a flat file where I'll parse through it with a cron job and plot the data using gnuplot.  This allows me to visualize the data and spot trends or anomalies.  Sometimes you may want to see a moving average for a specific window size.  I'll use the following Perl script to parse the data and generate the desired data operation.  This essentially consolidates a number of data points inside the "window" into a single value which is the average of the data points within the "window."

The following command would read inputfile, compute the average for a window of size 3, then write the result to outputfile.

aveWinCalc.pl -i inputfile -o outputfile -w 3

example input datafile (seperated by \t)
2012-11-01:08:00:00    1200
2012-11-01:09:00:00    1225
2012-11-01-10:00:00    1312
2012-11-01-11:00:00    1355
...

------------------------------------------------------------------------


#!/usr/bin/perl

# file: aveWinCalc.pl
# Written by: Stephen B. Johnson
# 2012-11-20
# Step through every element and compute the average for
# the specified window size.
#
# my input datafile format: (separated by \t)
# YYYY-MM-DD:hh:mm:ss value
#



use Getopt::Std;

getopts("i:o:w:");

if (!$opt_i) {
   print "no input file...\n";
   exit;
}

if (!$opt_o) {
   print "no output file...\n";
   exit;
}

if (!$opt_w) {
   print "must specify a window size\n-> ";
   $opt_w = <>;
}


open (INFILE, "< $opt_i") or die $!;
open (OUTFILE, "> $opt_o") or die $!;

@data = <INFILE>;

my $i = 0;
while ($i < scalar(@data) - ($opt_w - 1)) {
   $wintotal = 0;
   for ($j = $i; $j < ($i + $opt_w); $j++) {
      ($date, $val) = split(/\t/, $data[$j]);
      $wintotal += $val;
      $ave = $wintotal / $opt_w;
   }
#   print "$date \t $ave\n";
   print OUTFILE "$date\t$ave\n";

   $i+=$opt_w;
}

close INFILE;
close OUTFILE;



Sunday, October 21, 2012

Systems

If you ever have to work with systems from several different vendors, you quickly learn the pains of having to understand the nuances of the equipment and the interactions with their support organization.  It is mostly a frustrating experience as most of the vendors have little insight into the functionality of their equipment in a large scale deployment where upgrading/changing software or hardware components on a whim just isn't feasible.

I'm always amused with the request to perform a service impacting operation to collect information.  Another humorous request is for direct access to production equipment.  Of course this can be done with VPN access, but allowing an external support team into your production environment without knowledge of the overall network/system operation. We actually saw a vendor change a setting to our production environment in the middle of the day to collect some data.  Something that we would only touch in a maintenance window as it could have adverse effects.  The follow up meetings surrounding this instance were not fun.

So you learn over time who in the vendor's support organization is most familiar with your implementation and hopefully the bounds of access are strictly honored and that no changes are made without prior consent.


Saturday, October 20, 2012

Monitoring and Alerting

One cannot appreciate the ability to collect vital data and receive alerts when something malfunctions until you have hundreds or thousands of devices spread across the world.  Collecting and trending data on devices provides a simple method of having insight to when an anomaly occurs.  If one is not collecting and trending the data, then there is no known data to compare for expected behavior.

Recently, we quickly identified a network anomaly that was introduced by a customer premise device setting which increased messaging from devices by 12 times.  Luckily, we caught this with only 15% of the devices changed and quickly addressed the situation.  Failure to recognize this could have caused huge outages for our customers.  How did we notice the anomaly?  Simply reading and plotting the CPU% and memory% of our Session Border Controllers gave us the first clue that something was happening.  This drew our attention to look at the network traffic to discover a huge increase in the number of SIP Notify messages.

Another practical example occurs when one needs to purchase licenses for equipment.  From my experience, trending the number of licenses in use or devices in use is the best way to identify the number of licenses you'll need.  Otherwise, you're at the mercy of some marketing or sales group that typically do not have a clue or basis in actual data.  So if nothing else, it provides a good data point to validate information from other sources before it's too late.

Collecting this data can be very simple with some Perl scripts and SNMP.  Once there's a simple implementation, you can invest the time and effort as needed to enhance the data collection and analysis.