medyn

Post date: 02-Mar-2014 11:48:33

Computational fabric needs access to the moving average of multiple high rate streams of highly variable data. I have spent a little time developing an algorithm to meet this need, and outline it here in the hope that it may be useful to someone.

I needed the algorithm to be very fast (efficient), thread-safe, and to give sensible results in the face spurious outliers. The term medyn was coined to resemble dynamic-mean and/or dynamic-median. The algorithm more closely represents a dynamic-median than a dynamic-mean, and this is important to limit the impact of very large spurious values. The distinction between mean and median is marginal for my data, and the dynamics press this to irrelevance. A fast algorithm was more important than technical precision in my need. Also in the interests of speed, I have chosen to work with 32bit integer values, and avoid floating-point arithmetic.

The medyn algorithm makes an incremental adjustment to the medyn value on receipt of every data point. If the prior adjustment was positive, and the new data point is greater than the medyn, then the adjustment is double the prior adjustment. Conversely, if the prior adjustment was positive, and the new data point is less than the medyn, then the adjustment is half the prior adjustment. In this way, the medyn accelerates increasingly rapidly towards a run of data points which are consistently above the medyn value, and decelerates when the run ends. To avoid overshoot, the adjustment is limited to some fraction of the gap between the new data point and the medyn value.

The rate of acceleration/deceleration, and the limit on velocity may be tuned to suit the requirements to hand. Increasing the acceleration from doubling to 3x, 4x, 5x etc makes the medyn's response to new data more twitchy. Increasing the limit on velocity, to a large fraction of the remaining gap, allows the medyn to respond well to a step-change in the data points.

The charts shows the medyn of high variance data with large spurious values, and a noisy square wave function with different limits on acceleration and velocity.

A simple rolling average specifies a fixed number of data points to include in the calculation. An exponential moving average considers all data points, but progressively weighted for the more recent points. The medyn calculation considers only the prior median, the prior adjustment and a new data point. The medyn relies on the limits on acceleration and velocity to maintain an echo of prior data points.

A Java implementation is below:

/**

* Copyright (C) 2014 John Andrew Brown.

*

* This program is free software: you can redistribute it and/or modify it under

* the terms of the GNU General Public License as published by the Free Software

* Foundation, either version 3 of the License, or (at your option) any later

* version.

*

* This program is distributed in the hope that it will be useful, but WITHOUT

* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS

* FOR A PARTICULAR PURPOSE. See the GNU General Public License for more

* details.

*

* You should have received a copy of the GNU General Public License along with

* this program. If not, see <http://www.gnu.org/licenses/>.

*

* john@nhoj.info

*/

package grow.util;


import java.io.Serializable;


/**

* A medyn is a dynamic value, similar to a mean or median, computed real-time

* on a stream of data samples.

*

* <p>

* The term <b>medyn</b> was coined to resemble <b>dyn</b>amic-<b>me</b>an

* and/or <b>dyn</b>amic-<b>me</b>dian.

* </p>

*

* <p>

* The medyn algorithm makes an incremental adjustment to the medyn value on

* receipt of every data point. If the prior adjustment was positive, and the

* new data point is greater than the medyn, then the adjustment is double the

* prior adjustment. Conversely, if the prior adjustment was positive, and the

* new data point is less than the medyn, then the adjustment is half the prior

* adjustment. In this way, the medyn accelerates increasingly rapidly towards a

* run of data points which are consistently above the medyn value, and

* decelerates when the run ends. To avoid overshoot, the adjustment is limited

* to some fraction of the gap between the new data point and the medyn

* value.</p>

*

* <p>

* The rate of acceleration/deceleration, and the limit on velocity may be tuned

* to suit the requirements to hand. Increasing the acceleration from doubling

* to 3x, 4x, 5x etc makes the medyn's response to new data more twitchy.

* Increasing the limit on velocity, to a large fraction of the remaining gap,

* allows the medyn to repond well to a step-change in the data points. </p>

*

* <p>

* Medyn is Thread-safe.</p>

*

* <ul>

* <li>2013-04-27 version 3.0 Improved velocity limit.</li>

* <li>2013-03-04 version 2.0 Simplified velocity limit.</li>

* <li>2014-03-02 version 1.0</li>

* </ul>

*

* @author nworbnhoj

*/

public class Medyn

implements Cloneable, Serializable {


private static final long serialVersionUID = 0x6D6564796E_4A4142L;

/**

* The default value for acceleration.

*/

public static final int DEFAULT_ACCELERATION = 2;

/**

* The default limit on velocity.

*/

public static final int DEFAULT_VELOCITY_LIMIT = 4;

/**

* The default initial value of medyn.

*/

public static final int DEFAULT_INITIAL_MEDYN = 0;

/**

* The acceleration a ∈ Z/{-1 0 1} controls the rate at which the medyn

* accelerates towards a run of samples above(or below) the current medyn.

* For Z+ adjustment = a*adj. For Z- adjustment = adj+(a*adj)

*/

public final int ACCELERATION;

/**

* A limit to the velocity v ∈ Z+ which the medyn can move towards the most

* recent sample; stated in terms of a fraction of the gap between the

* sample and the medyn. max_adjustment = medyn/v.

*/

public int VELOCITY_LIMIT;

/**

* The current medyn value.

*/

private volatile int medyn = 10;

/**

* The prior adjustment to the medyn.

*/

private transient volatile int adjustment = 1;


/**

* A constructor allowing all aspects of the medyn to be customized.

*

* @param acceleration A small value a ∈ Z/{-1 0 1} fixing the rate which

* the medyn accelerates towards recent samples. For Z+ adjustment = a*adj.

* For Z- adjustment = adj+(a*adj)

* @param velocity_limit A small value v ∈ Z+ limiting the maximum change in

* medyn due to a single sample; where max_adjustment = medyn/v.

* @param initial An initial value of medyn.

*/

public Medyn(int acceleration, int velocity_limit, int initial) {

this.ACCELERATION = (acceleration > 1) ? acceleration : 2;

this.VELOCITY_LIMIT = (velocity_limit > 0) ? velocity_limit : 1;

this.medyn = initial;

}


/**

* A constructor with default initial medyn=0.

*

* @param acceleration A small value a ∈ Z/{-1 0 1} fixing the rate which

* the medyn accelerates towards recent samples. For Z+ adjustment = a*adj.

* For Z- adjustment = adj+(a*adj)

* @param velocity_limit A small value v ∈ Z+ limiting the maximum change in

* medyn due to a single sample; where max_adjustment = medyn/v.

*/

public Medyn(int acceleration, int velocity_limit) {

this(

acceleration,

velocity_limit,

DEFAULT_INITIAL_MEDYN);

}


/**

* A constructor with default

* {@link Medyn.DEFAULT_ACCELERATION acceleration} and default

* {@link Medyn.DEFAULT_VELOCITY_LIMIT velocity_limit}.

*

* @param initial An initial value of medyn.

*/

public Medyn(int initial) {

this(

DEFAULT_ACCELERATION,

DEFAULT_VELOCITY_LIMIT,

initial);

}


/**

* A constructor with default values default

* {@link Medyn.DEFAULT_ACCELERATION acceleration} and default

* {@link Medyn.DEFAULT_VELOCITY_LIMIT velocity_limit} default medyn=0.

*/

public Medyn() {

this(

DEFAULT_ACCELERATION,

DEFAULT_VELOCITY_LIMIT,

DEFAULT_INITIAL_MEDYN);

}


/**

* copy constructor.

*

* @param original Medyn to be copied.

*/

public Medyn(Medyn original) {

this(original.ACCELERATION, original.VELOCITY_LIMIT, original.get());

this.adjustment = original.adjustment;

}


/**

* The current value of Medyn.

*

* @return

*/

@Override

public String toString() {

return String.valueOf(medyn);

}


/**

* Override and set the current medyn value.

*

* @param medyn

* @return

*/

public int set(int medyn) {

int hold = this.medyn;

this.medyn = medyn;

return hold;

}


/**

* get the current medyn value.

*

* @return the medyn value

*/

public int get() {

return medyn;

}


/**

* Adjusts the value of medyn to include the supplied data sample.

* <p>

* The algorithm is designed to respond rapidly to significant change, to

* avoid overshoot, with minimal computational overhead. The <b>gap</b>

* between the most recent sample and the current medyn is the basis for the

* adjustment to the medyn.</li>

* <ul>

* <li>If the <b>gap</b> is in the same direction as the prior adjustment,

* then the adjustment is multiplied by acceleration; limited to

* <b>gap</b> / velocity_limit; and reapplied.</li>

* <li>If the <b>gap</b> is in the opposite direction to the prior

* adjustment (ie overshoot), then the adjustment is divided by

* acceleration; limited to median / velocity_limit; and reapplied.</li>

* </ul>

* </p>

*

* @param sample

*/

public void sample(final int sample) {


final int med = medyn;

final int gap = sample - med;

final int sign = Integer.signum(gap);


int adj = adjustment; // local variable to avoid thread interference

int maxAdj; // max velocity


if ((gap ^ adj) < 0) { // if of different sign

adj = (ACCELERATION > 0) // decelerate to avoid overshoot

? adj / ACCELERATION // heavy deceleration

: adj + (adj / ACCELERATION); // light deceleration

maxAdj = sign + (med / VELOCITY_LIMIT); // max velocity

} else { // if of the same sign

adj = (ACCELERATION > 0) // accelerate to chase change

? adj * ACCELERATION // heavy acceleration

: adj - (adj / ACCELERATION); // light acceleration

maxAdj = sign + (gap / VELOCITY_LIMIT); // max velocity

}


// bound the adjustment (velocity)

if ((Math.abs(adj) > Math.abs(maxAdj))) {

adj = maxAdj;

}


medyn = med + adj; // adjust the median

adjustment = (adj != 0) ? adj : sign; // record the adjustment

}


}