Moving Average Examples

When you want to compute a moving average on your data, the main tools at your disposal will be mutate and roll. Let's go over a few examples of how to use them together.

Moving Average#

Given input data:

Input Data

[{"value": 1}, {"value": 2}, {"value": 3}, {"value": 7}, {"value": 5}]

If we want to compute a 3-item moving average of the value key, we can do so as follows:

tidy(
data,
mutateWithSummary({
movingAvg: roll(3, mean('value')),
})
)

Here we're providing a value of 3 for the window width argument of roll and specifying the aggregation to run as the summarizer function mean. This produces the following output:

Output Data

[
{"value": 1},
{"value": 2},
{"value": 3, "movingAvg": 2},
{"value": 7, "movingAvg": 4},
{"value": 5, "movingAvg": 5}
]

Note that by default roll will only compute the aggregate value if the full window width of data is available. Since the window sizes for the first and second elements are 1 and 2, which are less than our width argument of 3, their movingAvg value is undefined.

To compute the aggregate value even when the window width is less than our desired width, we can pass in an options argument with partial set to true:

tidy(
data,
mutateWithSummary({
movingAvg: roll(3, mean('value'), { partial: true }),
})
)

With this change, the output becomes:

Output Data with Partial

[
{"value": 1, "movingAvg": 1},
{"value": 2, "movingAvg": 1.5},
{"value": 3, "movingAvg": 2},
{"value": 7, "movingAvg": 4},
{"value": 5, "movingAvg": 5}
]

And that's all there is to doing simple moving averages with tidy.

Moving Average of Rates#

Working with rates is generally a bit more challenging than plain ol' numbers since we want to sum the numerator and denominator separately in each window as opposed to averaging the sum of rates from each item. Here's where the meanRate summarizer comes in handy.

Input Data

[
{"numer": 1, "denom": 2},
{"numer": 2, "denom": 6},
{"numer": 3, "denom": 7},
{"numer": 7, "denom": 12},
{"numer": 5, "denom": 6}
]

If we want to compute a 3-item moving average of the rate defined by numer / denom, we can do so as follows:

tidy(
data,
mutateWithSummary({
movingAvg: roll(3, meanRate('numer', 'denom')),
})
)

Output Data

[
{"numer": 1, "denom": 2},
{"numer": 2, "denom": 6},
{"numer": 3, "denom": 7, "movingAvg": 0.4},
{"numer": 7, "denom": 12, "movingAvg": 0.48},
{"numer": 5, "denom": 6, "movingAvg": 0.6}
]

Note that the incorrect way of doing it would involve averaging the rates:

tidy(
data,
mutate({
rate: rate('numer', 'denom'),
}),
mutateWithSummary({
movingAvgWrong: roll(3, mean('rate')),
movingAvgRight: roll(3, meanRate('numer', 'denom')),
})
)

Output Data with Wrong

numerdenomratemovingAvgWrongmovingAvgRight
1.002.000.500
2.006.000.333
3.007.000.4290.4210.400
7.0012.00.5830.4480.480
5.006.000.8330.6150.600

Moving Average by Date#

When working with time series data, we often want be able to re-aggregate the data to a variety of different granularities: e.g., a 7-day moving average (7DMA), 45DMA, 5-week moving average (5WMA), daily, weekly, and so on. For this purpose, we can use the summarizeMomentGranularity tidy function. Note this requires our dates to be stored as moment objects and that moment is installed as a dependency in our project.

If projects are not using moment but would like this functionality, please get in touch and we can figure out a reusable solution.

Let's look at an example where we compute a 2-week moving average of value.

Input Data

datevalue
"2020-01-01T00:00:00.000Z"3.00
"2020-01-02T00:00:00.000Z"1.00
"2020-01-03T00:00:00.000Z"4.00
"2020-01-04T00:00:00.000Z"5.00
"2020-01-05T00:00:00.000Z"2.00
"2020-01-06T00:00:00.000Z"4.00
"2020-01-07T00:00:00.000Z"3.00
"2020-01-08T00:00:00.000Z"1.00
1 of 4

The first step is to rollup our data from being at a daily granularity to say a weekly granularity before we compute our moving average.

tidy(
data,
summarizeMomentGranularity('w', { value: sum('value') })
);

At this point, our data has been summed up per week.

Output Weekly Data

valuedatetimestamp
15.02019-12-30T00:00:00.000Z2019-12-30T00:00:00.000Z
21.02020-01-06T00:00:00.000Z2020-01-06T00:00:00.000Z
20.02020-01-13T00:00:00.000Z2020-01-13T00:00:00.000Z
19.02020-01-20T00:00:00.000Z2020-01-20T00:00:00.000Z

We are now ready to compute our moving average:

tidy(
data,
summarizeMomentGranularity('w', { value: sum('value') }),
mutateWithSummary({
movingAvg: roll(2, mean('value'))
})
);

Output Data

valuedatetimestampmovingAvg
15.02019-12-23T00:00:00.000Z2019-12-23T00:00:00.000Z
21.02019-12-30T00:00:00.000Z2019-12-30T00:00:00.000Z18.0
20.02020-01-06T00:00:00.000Z2020-01-06T00:00:00.000Z20.5
19.02020-01-13T00:00:00.000Z2020-01-13T00:00:00.000Z19.5

And we're done, in two steps we've moved from daily values in our input data to weekly values which include a 2-week moving average.

Moving Average by Date with Zerofill#

We often end up with data that is missing values for some dates, which throws off the windows used to compute moving averages. To get around this, we can use complete to expand our data to include zeroes (or other values) where data is missing in a sequence.

Consider the following data:

Input Data

datevalue
"2020-01-06T00:00:00.000Z"4.00
"2020-01-07T00:00:00.000Z"3.00
"2020-01-08T00:00:00.000Z"1.00
"2020-01-11T00:00:00.000Z"2.00
"2020-01-12T00:00:00.000Z"5.00
"2020-01-13T00:00:00.000Z"6.00
"2020-01-14T00:00:00.000Z"1.00
"2020-01-17T00:00:00.000Z"3.00
1 of 2

It's missing values for January 9, 10, 15, 16, and 18. If we run our tidy flow on it to compute a 7-day moving average without accounting for these missing values we get

tidy(
data,
mutateWithSummary({
movingAvg: roll(7, mean('value'), { partial: true })
})
);

Output Data

datevaluemovingAvg
2020-01-06T00:00:00.000Z4.004.00
2020-01-07T00:00:00.000Z3.003.50
2020-01-08T00:00:00.000Z1.002.67
2020-01-11T00:00:00.000Z2.002.50
2020-01-12T00:00:00.000Z5.003.00
2020-01-13T00:00:00.000Z6.003.50
2020-01-14T00:00:00.000Z1.003.14
2020-01-17T00:00:00.000Z3.003.00
1 of 2

As you can see, the moving average doesn't account for the missing days.

To get around this, we can use complete on the data to expand it to include rows for when there is data missing from the expected day-by-day sequence. Since our date column uses moments, we will have to convert it to an ISO timestamp string first so we can use fullSeqDateISOString on it. This can be done via summarizeMomentGranularity as shown above or with a simple mutate since we're going to work with daily data.

tidy(
data,
mutate({ timestamp: d => d.date.toISOString() }),
complete(
{ timestamp: fullSeqDateISOString('timestamp', 'day') },
{ value: 0 }
),
mutateWithSummary({
movingAvg: roll(7, mean('value'), { partial: true })
})
);

Alternative example with summarizeMomentGranularity โ€“ granularity should match in fullSeqDateISOString and summarizeMomentGranularity

tidy(
data,
summarizeMomentGranularity('day', { value: sum('value') }),
complete(
{ timestamp: fullSeqDateISOString('timestamp', 'day') },
{ value: 0 }
),
mutateWithSummary({
movingAvg: roll(7, mean('value'), { partial: true })
})
);

Output Data

timestampdatevaluemovingAvg
2020-01-06T00:00:00.000Z2020-01-06T00:00:00.000Z4.004.00
2020-01-07T00:00:00.000Z2020-01-07T00:00:00.000Z3.003.50
2020-01-08T00:00:00.000Z2020-01-08T00:00:00.000Z1.002.67
2020-01-09T00:00:00.000Z0.002.00
2020-01-10T00:00:00.000Z0.001.60
2020-01-11T00:00:00.000Z2020-01-11T00:00:00.000Z2.001.67
2020-01-12T00:00:00.000Z2020-01-12T00:00:00.000Z5.002.14
2020-01-13T00:00:00.000Z2020-01-13T00:00:00.000Z6.002.43
1 of 2

Now we can see that rows have been added with value: 0 for the missing dates in the sequence, which enables our moving average to be computed correctly.

Last updated on by Peter Beshai