Post date: 02-Mar-2014 11:48:33
Computational fabric needs access to the moving average of multiple high rate streams of highly variable data. I have spent a little time developing an algorithm to meet this need, and outline it here in the hope that it may be useful to someone.
I needed the algorithm to be very fast (efficient), thread-safe, and to give sensible results in the face spurious outliers. The term medyn was coined to resemble dynamic-mean and/or dynamic-median. The algorithm more closely represents a dynamic-median than a dynamic-mean, and this is important to limit the impact of very large spurious values. The distinction between mean and median is marginal for my data, and the dynamics press this to irrelevance. A fast algorithm was more important than technical precision in my need. Also in the interests of speed, I have chosen to work with 32bit integer values, and avoid floating-point arithmetic.
The medyn algorithm makes an incremental adjustment to the medyn value on receipt of every data point. If the prior adjustment was positive, and the new data point is greater than the medyn, then the adjustment is double the prior adjustment. Conversely, if the prior adjustment was positive, and the new data point is less than the medyn, then the adjustment is half the prior adjustment. In this way, the medyn accelerates increasingly rapidly towards a run of data points which are consistently above the medyn value, and decelerates when the run ends. To avoid overshoot, the adjustment is limited to some fraction of the gap between the new data point and the medyn value.
The rate of acceleration/deceleration, and the limit on velocity may be tuned to suit the requirements to hand. Increasing the acceleration from doubling to 3x, 4x, 5x etc makes the medyn's response to new data more twitchy. Increasing the limit on velocity, to a large fraction of the remaining gap, allows the medyn to respond well to a step-change in the data points.
The charts shows the medyn of high variance data with large spurious values, and a noisy square wave function with different limits on acceleration and velocity.
A simple rolling average specifies a fixed number of data points to include in the calculation. An exponential moving average considers all data points, but progressively weighted for the more recent points. The medyn calculation considers only the prior median, the prior adjustment and a new data point. The medyn relies on the limits on acceleration and velocity to maintain an echo of prior data points.
A Java implementation is below: