The following command would read inputfile, compute the average for a window of size 3, then write the result to outputfile.
aveWinCalc.pl -i inputfile -o outputfile -w 3
example input datafile (seperated by \t)
2012-11-01:08:00:00 1200
2012-11-01:09:00:00 1225
2012-11-01-10:00:00 1312
2012-11-01-11:00:00 1355
...
------------------------------------------------------------------------
#!/usr/bin/perl
# file: aveWinCalc.pl
# Written by: Stephen B. Johnson
# 2012-11-20
# Step through every element and compute the average for
# the specified window size.
#
# my input datafile format: (separated by \t)
# YYYY-MM-DD:hh:mm:ss value
#
use Getopt::Std;
getopts("i:o:w:");
if (!$opt_i) {
print "no input file...\n";
exit;
}
if (!$opt_o) {
print "no output file...\n";
exit;
}
if (!$opt_w) {
print "must specify a window size\n-> ";
$opt_w = <>;
}
open (INFILE, "< $opt_i") or die $!;
open (OUTFILE, "> $opt_o") or die $!;
@data = <INFILE>;
my $i = 0;
while ($i < scalar(@data) - ($opt_w - 1)) {
$wintotal = 0;
for ($j = $i; $j < ($i + $opt_w); $j++) {
($date, $val) = split(/\t/, $data[$j]);
$wintotal += $val;
$ave = $wintotal / $opt_w;
}
# print "$date \t $ave\n";
print OUTFILE "$date\t$ave\n";
$i+=$opt_w;
}
close INFILE;
close OUTFILE;
3 comments:
Thanks for posting this,
Using $i+=$windowsize skips the entries so a window of 3 on the following data
A 3
B 1
C 6
D 4
E 2
F 5
generates
c 3.33
F 3.66
We can use
$i++;
to avoid this and get a sliding window for each entry to give
C 3.33 #(av from A,B and C)
D 3.66 #(av from B,C and D)
E 4 #(av from C,D and E)
F 3.66
Which is ok, but is there a way to have the window average from the middle of the window? e.g. the average for C is taken from B,C,D?
to give
B 3.33 #(av. from A,B and C)
C 4.33 #(av from B,C and D)
D 4 #(av from C,D and E)
E 3.66
etc...
Thanks,
john
I think the following does what you want. I'm not sure how you're getting AVE(B,C,D) = 4.33
The following produces:
2014-01-01:03:00:00 3.33333333333333
2014-01-01:04:00:00 3.66666666666667
2014-01-01:05:00:00 4
2014-01-01:06:00:00 3.66666666666667
#!/usr/bin/perl
use Getopt::Std;
getopts("i:o:w:");
if (!$opt_i) {
print "no input file...\n";
exit;
}
if (!$opt_o) {
print "no output file...\n";
exit;
}
if (!$opt_w) {
print "must specify a window size\n-> ";
$opt_w = <>;
}
open (INFILE, "< $opt_i") or die $!;
open (OUTFILE, "> $opt_o") or die $!;
@data = ;
my $i = 1;
while ($i < scalar(@data) - ($opt_w - 1)) {
$wintotal = 0;
for ($j = $i; $j < ($i + $opt_w); $j++) {
($date, $val) = split(/\t/, $data[$j-1]);
$wintotal += $val;
$ave = $wintotal / $opt_w;
}
# print "$date \t $ave\n";
print OUTFILE "$date\t$ave\n";
$i++;
}
close INFILE;
close OUTFILE;
Ahh, superb. ignore the last post. that works nicely with the conditions
my $i=1;
and "$j-1" in
($date, $val) = split(/\t/, $data[$j-1]);
thanks,
Post a Comment