Wednesday, November 21, 2012

Processing a Moving Average using a Window Function with a Perl script

I typically collect lots of data from a wide variety of devices using snmp and store them in a flat file where I'll parse through it with a cron job and plot the data using gnuplot.  This allows me to visualize the data and spot trends or anomalies.  Sometimes you may want to see a moving average for a specific window size.  I'll use the following Perl script to parse the data and generate the desired data operation.  This essentially consolidates a number of data points inside the "window" into a single value which is the average of the data points within the "window."

The following command would read inputfile, compute the average for a window of size 3, then write the result to outputfile.

aveWinCalc.pl -i inputfile -o outputfile -w 3

example input datafile (seperated by \t)
2012-11-01:08:00:00    1200
2012-11-01:09:00:00    1225
2012-11-01-10:00:00    1312
2012-11-01-11:00:00    1355
...

------------------------------------------------------------------------


#!/usr/bin/perl

# file: aveWinCalc.pl
# Written by: Stephen B. Johnson
# 2012-11-20
# Step through every element and compute the average for
# the specified window size.
#
# my input datafile format: (separated by \t)
# YYYY-MM-DD:hh:mm:ss value
#



use Getopt::Std;

getopts("i:o:w:");

if (!$opt_i) {
   print "no input file...\n";
   exit;
}

if (!$opt_o) {
   print "no output file...\n";
   exit;
}

if (!$opt_w) {
   print "must specify a window size\n-> ";
   $opt_w = <>;
}


open (INFILE, "< $opt_i") or die $!;
open (OUTFILE, "> $opt_o") or die $!;

@data = <INFILE>;

my $i = 0;
while ($i < scalar(@data) - ($opt_w - 1)) {
   $wintotal = 0;
   for ($j = $i; $j < ($i + $opt_w); $j++) {
      ($date, $val) = split(/\t/, $data[$j]);
      $wintotal += $val;
      $ave = $wintotal / $opt_w;
   }
#   print "$date \t $ave\n";
   print OUTFILE "$date\t$ave\n";

   $i+=$opt_w;
}

close INFILE;
close OUTFILE;



3 comments:

johnny said...

Thanks for posting this,

Using $i+=$windowsize skips the entries so a window of 3 on the following data

A 3
B 1
C 6
D 4
E 2
F 5

generates
c 3.33
F 3.66

We can use
$i++;

to avoid this and get a sliding window for each entry to give


C 3.33 #(av from A,B and C)
D 3.66 #(av from B,C and D)
E 4 #(av from C,D and E)
F 3.66

Which is ok, but is there a way to have the window average from the middle of the window? e.g. the average for C is taken from B,C,D?

to give

B 3.33 #(av. from A,B and C)
C 4.33 #(av from B,C and D)
D 4 #(av from C,D and E)
E 3.66
etc...

Thanks,
john

Stephen B. Johnson said...

I think the following does what you want. I'm not sure how you're getting AVE(B,C,D) = 4.33



The following produces:


2014-01-01:03:00:00 3.33333333333333
2014-01-01:04:00:00 3.66666666666667
2014-01-01:05:00:00 4
2014-01-01:06:00:00 3.66666666666667




#!/usr/bin/perl

use Getopt::Std;

getopts("i:o:w:");

if (!$opt_i) {
print "no input file...\n";
exit;
}

if (!$opt_o) {
print "no output file...\n";
exit;
}

if (!$opt_w) {
print "must specify a window size\n-> ";
$opt_w = <>;
}


open (INFILE, "< $opt_i") or die $!;
open (OUTFILE, "> $opt_o") or die $!;

@data = ;

my $i = 1;
while ($i < scalar(@data) - ($opt_w - 1)) {
$wintotal = 0;
for ($j = $i; $j < ($i + $opt_w); $j++) {
($date, $val) = split(/\t/, $data[$j-1]);
$wintotal += $val;
$ave = $wintotal / $opt_w;
}

# print "$date \t $ave\n";
print OUTFILE "$date\t$ave\n";

$i++;
}

close INFILE;
close OUTFILE;

johnny said...


Ahh, superb. ignore the last post. that works nicely with the conditions

my $i=1;

and "$j-1" in
($date, $val) = split(/\t/, $data[$j-1]);

thanks,