Forecast Evaluation Overview Why Evaluate Point Forecasts vs
Transcription
Forecast Evaluation Overview Why Evaluate Point Forecasts vs
Overview Evaluating forecast distributions Combining forecast distributions Disseminating results The DIME website Further perspectives Forecast Evaluation LSE CATS Operational Weather Risk Meeting @ GaTech Point Forecasts vs Ensemble Forecasts Why Evaluate Ensemble weather forecasts appear to be invaluable to weather dependent business Methods for handling and valuing ensemble forecasts are subject of ongoing research An ensemble forecast is a set of numbers: London temperatures tomorrow: 23° 23°, 26° 26°, 27° 27°, 29° 29°, 31° 31° LSE CATS Operational Weather Risk Meeting @ GaTech 3 Temperature at London Heathrow – Point Forecast LSE CATS Operational Weather Risk Meeting @ GaTech A point forecast is a single number: London temperature tomorrow: 25° 25° The ultimate criterion is their usefulness to end users LSE CATS Operational Weather Risk Meeting @ GaTech 2 4 Evaluating Point Forecasts 5 For classical point forecasts we can evaluate the error: London temperature tomorrow: 25° 25° Reality turns out to be: 28° 28° Error: 3° 3 A point forecast is considered good if the error is small on average LSE CATS Operational Weather Risk Meeting @ GaTech 6 Temperature at London Heathrow – Ensemble Forecast Enhancing Point Forecasts – Method of Dressing Dressing point forecasts adds uncertainty information Probability Point Forecast LSE CATS Operational Weather Risk Meeting @ GaTech 7 Temperature Kernel width needs adjustment – Many possible ways to do that! 8 Dressed Ensemble Forecast Dressing Ensemble Forecasts Dressing ensemble forecasts Probability Kernel Function Temperature www.dime.lse.ac.uk How do we combine the probability densities? LSE CATS Operational Weather Risk Meeting @ GaTech 9 Combining Forecast Distributions To quantify the potential usefulness of different forecast distributions we need to evaluate their skill Take into account that the different forecasts have different skills Determine optimal combination by looking at the skill, e.g. Ignorance LSE CATS Operational Weather Risk Meeting @ GaTech 10 Evaluating Forecast Distributions There exist many different methods to combine forecast distributions LSE CATS Operational Weather Risk Meeting @ GaTech How do we compare reality (a single number) with the forecast (a distribution)? 11 LSE CATS Operational Weather Risk Meeting @ GaTech 12 Different Problems Require Different Skill Scores! What DIME does DIME aims to investigate the skill of NWP’s and dressing techniques using various skill scores DIME disseminates background information A forecast can be Specific (electricity output from a wind farm) General (wind speed) What is the best dressing method for your application? These require different measures of skill LSE CATS Operational Weather Risk Meeting @ GaTech Evaluating various skill scores of operational NWP’s over a period of time MODELS MODEL 1 Ignorance Skill Brier Skill Score Score 4.5 0.25 Bet on the outcome of tomorrow’s weather Tomorrow’s Temperature … Temperature … 4 5 6 7 8 9 10 11 12 Spread wealth for safety What do these skill scores mean? LSE CATS Operational Weather Risk Meeting @ GaTech Restrict spread to make money LSE CATS Operational Weather Risk Meeting @ GaTech 15 When Is a Forecast Distribution Good in Weather Roulette? Skill Scores for Binary Event Forecasts – The Brier Score A good forecast distribution balances between spread and accuracy Criterion is: A forecast distribution is better the more money it yields in weather roulette Forecast: It freezes with probability p It doesn’t with probability (1(1-p) The ignorance reflects the expected rate of wealth growth. LSE CATS Operational Weather Risk Meeting @ GaTech 16 Binary (yes/no) events are e.g.: Will it freeze? Will precipitation exceed a threshold? etc. A forecast for a binary event: 14 Ignorance And Weather Roulette Example of an Assessment of Skill LSE CATS Operational Weather Risk Meeting @ GaTech 13 17 Brier score reflects the quality of binary event forecasts LSE CATS Operational Weather Risk Meeting @ GaTech 18 Comparing Forecasts Skill evaluation done properly Skill is a statistical quantity and needs errorbars Two different forecasts can be compared by means of their ignorance NWP schemes Ignorance Skill Brier Skill Score Score NWP schemes … MODEL 1 4.5 0.25 … MODEL 2 3.8 0.3 … Ignorance Skill Brier Skill Score Score … MODEL 1 4.5 ± 0.2 0.25 ± 0.03 … MODEL 2 3.8 ± 0.1 0.3 ± 0.02 … (errorbars will be suppressed from now on) LSE CATS Operational Weather Risk Meeting @ GaTech Comparing Forecasts Ignorance Skill Score 20 Combining Forecasts Dressing allows us to compare point forecasts and ensemble forecasts MODELS LSE CATS Operational Weather Risk Meeting @ GaTech 19 Brier Skill Score MODELS … MODEL 1 (Ens .) (Ens.) 4.5 0.25 … MODEL 2 (Ens .) (Ens.) 3.8 0.3 … MODEL 3 (Point Forecast) 5 0.6 … LSE CATS Operational Weather Risk Meeting @ GaTech The table grows again… MODEL 1 (Ens .) (Ens.) MODEL 2 (Ens .) (Ens.) Ignorance Skill Score 4.5 3.8 Brier Skill Score 0.25 0.3 … MODEL 3 (Point Forecast) 5.0 0.6 … MODEL 1 and MODEL 2 3.2 0.2 … LSE CATS Operational Weather Risk Meeting @ GaTech 21 DIME Objectives Dissemination of Results Objectively compare operational NWP model ensemble forecasts by their skill Suggest schemes to combine ensemble forecasts and evaluate the skill of these schemes Provide actual weather forecasts for specific locations using a selection of our methods LSE CATS Operational Weather Risk Meeting @ GaTech 23 … … 22 The medium to disseminate DIME results is the internet The web allows users to interactively request the products they are most interested in www.dime.lse .ac.uk uk www.dime.lse.ac. LSE CATS Operational Weather Risk Meeting @ GaTech 24