Last week I found myself trying to deal with a ton of tide data retrieved from a tide gauge installed in Castine, ME. The gauge takes 4 seconds to capture a data point. Raw (unconverted) water level is recorded first, followed by raw temperature 4 seconds later, and a reference value 4 seconds after that. A full record, therefore, is generated every 12 seconds. This produces a lot of data.
In order to facilitate the comparison of this gauge data to a NOAA primary station, I wanted to automate the task of determining the higher highs and lower lows. A six-minute averaging scheme was developed in Matlab by my friend Val in order to generate a six-minute record similar to the NOAA format. This still left us with squared peaks, so I decided to apply a filter to the data.
Problem: I wanted to use filtfilt, which filters data in the forward direction and then again in the reverse direction, resulting in a zero phase distortion. I tested filtfilt out on the full 12-second water level data and everything seemed hunky dory. The problem came when I tried it out on the 6-minute averaged data. The filter seemed to run okay with no errors, but the result was all NaN (no data) values. At first I thought it was caused by zeros in my vector data, so I changed all 0.00 into 0.01. This did not solve the problem. To make the problem even more frustrating, filtfilt worked on every other column of data coming out of the 6-min-averaging scheme except water level (time, std, etc.). What is up with that?
Work around: I used the convn function in Matlab in order to apply a convolution to the data. This filter can cause a phase distortion, however, by specifying the shape as 'same', the central part of the convolution is returned and the distortion is minimized.
For the comparison of our preliminary gauge to the NOAA primary gauge in Bar Harbor, a slight phase distortion will not affect the results. In order to really be correct, however, a filter that guarantees no phase distortion should have been used. If anyone has any ideas why filtfilt would not work on a running-averaged dataset, I am all ears. Even one of my profs could not figure it out.
I must say though, the results of my code using convn do look pretty nice on first inspection. Rounded peaks and automatically picked highs and lows (using the extrema function, modified so that returns are in linear time order, not descending):