As stated earlier, just like the Lag 1 And Cycles, Trend and Cyclic has been built to analyze certain phenomenon in the data.
Each observation is characterized by 2 variables, a time and a signal value.
In this step, you will analyze the data with and without White Noise at the same time.
Visualize the data
Let’s have a look at the data using the following SQL:
select
tcnn.time
, tcnn.signal as signal
, tcwn.signal as signal_wn
, tc4n.signal as signal_4n
, tcnn.signal - tcwn.signal as delta_wn
, tcnn.signal - tc4n.signal as delta_4n
from
forecast_trend_and_cyclic tcnn
join forecast_trend_and_cyclic_and_wn tcwn on tcnn.time = tcwn.time
join forecast_trend_and_cyclic_and_4wn tc4n on tcnn.time = tc4n.time
And here is the result in a graph:
As you can visually notice:
- the data set without white noise (in blue) is following a sine wave
- the data set with white noise (in red) tend to follow a sine wave too but with some irregularities
- the data set with 4 time white noise (in yellow) tend to follow a sine wave too but with stronger irregularities
- they all have a positive (increasing) trend
Dates & intervals
As the ozone reading value is provided for a certain date, let’s have a look at date values using the following SQL:
select 'max' as indicator, to_varchar(max(time)) as value
from forecast_trend_and_cyclic union all
select 'min' , to_varchar(min(time))
from forecast_trend_and_cyclic union all
select 'delta days' , to_varchar(days_between(min(time), max(time)))
from forecast_trend_and_cyclic union all
select 'count' , to_varchar(count(1))
from forecast_trend_and_cyclic
indicator |
value |
max |
2002-05-15 |
min |
2001-01-01 |
delta days |
499 |
count |
500 |
As you can notice, you have 500 data points spread across 499 days. This implies that data is available on a daily basis.
The same analysis is applicable to the dataset with white noise.
Generic statistics
Now, let’s have a look at some additional statistical elements using the following SQL:
with data as (
select l1cnn.signal as value_nn, l1cwn.signal as value_wn
from forecast_lag_1_and_cycles l1cnn join forecast_lag_1_and_cycles_and_wn l1cwn on l1cnn.time = l1cwn.time
)
select 'max' as indicator , round(max(value_nn), 2) as value_nn
, round(max(value_wn), 2) as value_wn from data union all
select 'min' , round(min(value_nn), 2)
, round(min(value_wn), 2) from data union all
select 'delta min/max' , round(max(value_nn) - min(value_nn), 2)
, round(max(value_wn) - min(value_wn), 2) from data union all
select 'avg' , round(avg(value_nn), 2)
, round(avg(value_wn), 2) from data union all
select 'median' , round(median(value_nn), 2)
, round(median(value_wn), 2) from data union all
select 'stddev' , round(stddev(value_nn), 2)
, round(stddev(value_wn), 2) from data
The result should be:
indicator |
value without White Noise |
value with White Noise |
value with 4 x White Noise |
max |
159.84 |
161.12 |
167.49 |
min |
0.83 |
-0.54 |
-3.28 |
delta min/max |
159.01 |
161.66 |
170.76 |
avg |
75.47 |
75.48 |
75.5 |
median |
81.76 |
80.61 |
78.96 |
stddev |
43.21 |
43.33 |
43.78 |
As you can notice the average and median values are all in the same range of values for each datasets.
Here is a graph which can help you visualize the signal values in ascending order:
Data Distribution
Now let’s have a look at the data distribution using the NTILE function.
The following SQL will partition the data into 8 groups and get the same generic statistics as before but for each group:
with data as (
select ntile(8) over (order by signal asc) as tile, signal
from forecast_trend_and_cyclic
where signal is not null
)
select tile
, round(max(signal), 2) as max
, round(min(signal), 2) as min
, round(max(signal) - min(signal), 2) as "delta min/max"
, round(avg(signal), 2) as avg
, round(median(signal), 2) as median
, round(abs(avg(signal) - median(signal)), 2) as "delta avg/median"
, round(stddev(signal), 2) as stddev
from data
group by tile
The reason you are using 8 tiles here is because the signal has 4 waves. Each wave is made of a declining and rising part, so 8 parts in total. And our goal here is to assess if these parts are more or less with the same shape.
The result should be:
tile |
max |
min |
delta |
avg |
median |
delta |
stddev |
1 |
18.14 |
0.83 |
17.31 |
13.69 |
15.33 |
1.64 |
4.56 |
2 |
41.71 |
18.27 |
23.43 |
24.68 |
20.52 |
4.17 |
7.06 |
3 |
54.4 |
42.49 |
11.91 |
51.03 |
51.35 |
0.32 |
2.73 |
4 |
82.79 |
54.55 |
28.24 |
63.37 |
58.84 |
4.53 |
9.25 |
5 |
90.61 |
83.45 |
7.16 |
87.7 |
87.51 |
0.19 |
1.69 |
6 |
121.59 |
90.64 |
30.95 |
101.74 |
98.68 |
3.05 |
10.63 |
7 |
126.5 |
121.6 |
4.9 |
123.78 |
123.61 |
0.16 |
1.62 |
8 |
159.84 |
126.64 |
33.2 |
140.18 |
138.62 |
1.56 |
11.66 |
As you can notice, the deltas between the average and median are either really small or large. The small values correspond to slowly rising phases whereas the larger values relates to steeper phases of the curve.
Now let’s do it for the data set with white noise.
with data as (
select ntile(8) over (order by signal asc) as tile, signal
from forecast_trend_and_cyclic_and_wn
where signal is not null
)
select tile
, round(max(signal), 2) as max
, round(min(signal), 2) as min
, round(max(signal) - min(signal), 2) as "delta min/max"
, round(avg(signal), 2) as avg
, round(median(signal), 2) as median
, round(abs(avg(signal) - median(signal)), 2) as "delta avg/median"
, round(stddev(signal), 2) as stddev
from data
group by tile
Provide an answer to the question below then click on Validate.