solarforecastarbiter.reference_forecasts.persistence.persistence_probabilistic

solarforecastarbiter.reference_forecasts.persistence.persistence_probabilistic(observation, data_start, data_end, forecast_start, forecast_end, interval_length, interval_label, load_data, axis, constant_values)[source]

Make a probabilistic persistence forecast using the observation from data_start to data_end. In the forecast literature, this method is typically referred to as Persistence Ensemble (PeEn). [1] [2] [3]

The function handles forecasting either constant variable values or constant percentiles. In the examples below, we use GHI to be concrete but the concepts also apply to other variables (AC power, net load, etc.).

If forecasting constant variable values (e.g. forecast the probability of GHI being less than or equal to 500 W/m^2), the persistence forecast is:

\[F_n(x) = ECDF(GHI_{t_{start}}, ..., GHI_{t_{end}}) Prob(GHI_{t_f} <= 100 W/m^2) = F_n(100 W/m^2)\]

where \(t_f\) is a forecast time and \(F_n\) is the empirical CDF (ECDF) function computed from the n observations between \(t_{start}\) = data_start and \(t_{end}\) = data_end, which maps from variable values to probabilities.

If forecasting constant probabilities (e.g. forecast the GHI value that has a 50% probability), the persistence forecast is:

\[F_n(x) = ECDF(GHI_{t_{start}}, ..., GHI_{t_{end}}) Q_n(p) = \inf {x \in \mathrf{R} : p \leq F_n(x) } p_{t_f} = Q_n(50%)\]

where \(Q_n\) is the quantile function based on the n observations between \(t_{start}\) = data_start and \(t_{end}\) = data_end, which maps from probabilities to variable values.

Parameters:
  • observation (datamodel.Observation) –
  • data_start (pd.Timestamp) – Observation data start. Forecast is inclusive of this instant if observation.interval_label is beginning or instant.
  • data_end (pd.Timestamp) – Observation data end. Forecast is inclusive of this instant if observation.interval_label is ending or instant.
  • forecast_start (pd.Timestamp) – Forecast start. Forecast is inclusive of this instant if interval_label is beginning or instant.
  • forecast_end (pd.Timestamp) – Forecast end. Forecast is inclusive of this instant if interval_label is ending or instant.
  • interval_length (pd.Timedelta) – Forecast interval length
  • interval_label (str) – instant, beginning, or ending
  • load_data (function) – A function that loads the observation data. Must have the signature load_data(observation, data_start, data_end) and properly account for observation interval label.
  • axis ({'x', 'y'}) – The axis on which the constant values of the CDF is specified. The axis can be either x (constant variable values) or y (constant percentiles).
  • constant_values (array_like) – The variable values or percentiles.
Returns:

forecasts (list of pd.Series) – The persistence forecasts, returned in the same order as constant_values. If axis is x, the forecast values are percentiles (e.g. 25%). If instead axis is y, the forecasts values have the same units as the observation data (e.g. MW).

Raises:

ValueError – If the axis parameter is invalid.

References

[1]Allessandrini et al. (2015) “An analog ensemble for short-term probabilistic solar power forecast”, Appl. Energy 157, pp. 95-110. doi: 10.1016/j.apenergy.2015.08.011
[2]Yang (2019) “A universal benchmarking method for probabilistic solar irradiance forecasting”, Solar Energy 184, pp. 410-416. doi: 10.1016/j.solener.2019.04.018
[3]Doubleday, Van Scyoc Herndandez and Hodge (2020) “Benchmark probabilistic solar forecasts: characteristics and recommendations”, Solar Energy 206, pp. 52-67. doi: 10.1016/j.solener.2020.05.051