Search site

Sampled data in detail

This page gives more information about when it is appropriate to use sampled data, some things to be aware of when using them and details about how the data show uncertainties in the projections. 

  • What information does it provide?

UKCP09 sampled data provides 10,000 equally plausible climate projections for each 25 km grid square and aggregated region, 30-year time period and emission scenario.

At a given location, each of the 10,000 variants provided contains projected change values for a number of variables for all temporal averages (months, season and annual). As such, each projection captures inter-variable and temporal dependencies. However note that the data is not spatially coherent; this means that you cannot aggregate adjacent grid squares to form custom areas (see the Climate change projections report, Annex 2 for more details).

The sampled data provides values between the 1% and 99% probability levels. Since the projections are statistically less robust in the tails of their distributions, the sampled data is clipped beyond the 1st and 99th probability levels. That is, for each variable, location, time period, averaging period and emission scenario, values below the 1% probability level are set to the value of the 1% probability level, and values above the 99% probability level are set to the value of the 99% probability level. For example, if the 1% probability level is 2.0ºC then all values in the original sample data of less than 2.0˚C will be set to 2.0˚C.

All variables in the sampled data were clipped at the 1st and 99th percentile. A more significant level of clipping was applied to the following variables:

VariableLow percentileHigh percentileLong name
precip variance595Variance of daily precipitation rate
temp variance595Variance of mean air temperature at 1.5 metres
precip skewness595Skewness of daily precipitation rate

Please note that these three variables that have been clipped at the 5th and 95th percentiles are not available through the UI. They are only used as inputs to generate change factors when running the Weather Generator.

The method used to generate the sampled data incorporates a multivariate analysis that, due to computational limits, can only process a certain number of variables simultaneously. This limitation means that the variables have been processed in two separate batches (labelled in UKCP09 as Batch 1 and Batch 2).

The sampled data contains 10,000 projections (variants). You can sub-sample the data using the UKCP09 User Interface . There are 4 methods of sub-sampling the data that are available for selection:

  1. Select All: Users wishing to work with all 10,000 available can select this option. 
  2. Random sampling: A random sample of between 100 and 9,999 variants with repetition allowed so that a single variant may be randomly selected more than once.
  3. Select a specific set of model variants: A specific variant can be specified. Each of the 10,000 variants in the sampled data has a unique ID number 0-9999. These IDs can be used within the User Interface to identify and select specific variants within the sampled data that are of particular interest. Selecting by ID allows users to re-use the same variants in different requests. Users can also select model variants by their unique ID number when sampling probabilistic projections to condition the UKCP09 Weather Generator.
  4. Sampling a particular subset of the probabilities: You can select a variable at a given probability level for a given averaging period (for example maximum temperature in the summer at the 90% probability level). You can select variants defined by a single climate variable or by two variables (within a single batch.

 Back to top...

  • What should I use it for?

The sampled data can be used to examine the climate projections in a number of different ways. You may consider using sampled data for:

· Initial sensitivity assessments of the implications of the probabilistic climate and climate change projections (e.g. impacts and risk assessments)

In order to undertake a risk, impacts or adaptation assessment, you should have an understanding of the climate under which those vulnerabilities, impacts and adaptation options will be experienced.

To this end, you could use the sampled data to:

  1. Explore the probabilities associated with exceeding various thresholds or combination of thresholds (e.g. temperature and precipitation thresholds).
  2. Explore the implications of the probabilistic projections related to acceptable levels of risk.
  3. Explore the sensitivity of these thresholds or levels of acceptable risks within the probabilistic projections.

You could use the results of these initial investigations to refine your search to more specific selections of climate projections that can support a detailed risk and impacts analysis and further assessments of adaptation options.

  • Providing projections for input into risk, impacts and adaptation assessments

This is the intended purpose of this sampled data. Using the User Interface  you are able to select the sampled data on which to base your adaptation assessments. You can select data based on: location (25 km grid square or aggregated area), variables, emissions scenario, 30-year future time period, and temporal averaging period).

When inputting sampled data into impact models. bear in mind that the selected data should all come from the same batch.

  • Developing customised ways of visualising probabilistic climate projections

Within UKCP09, visual presentations of data are developed primarily using the CDF data. A limited number of images based on the sampled data are available from the UKCP09 User Interface . In addition, users wishing to develop their own customised images can do so using the sampled data.

  • Exploring implications across sequential 30-year time periods

You can also use sampled data to establish projections that allow you to explore implications across sequential 30-year time periods. However, this cannot be achieved in the User Interface, as it does not allow you to access to more than one 30-year time period.

In order to create multiple 30-year time period projections the ".csv" data files from each request require stitching together. You should take care when interpreting and using such a stitched data set: you are not creating a transient (temporally continuous) projection but rather stitching together two or more 30-year averages. Take care to ensure time periods are stitched together in the correct order.

There are several things to bear in mind when using the sampled data:

  • You need to sample at least 100 variants

100 variants are considered the smallest number of samples that are needed to maintain the probabilistic representativeness of the original sampled data. Even at this minimum level, there is the possibility that the representation can significantly diverge from that of the full population of the 10,000 variants. We advise that you explore the sensitivity of your selection before using it as the basis for decision-making. 

  • Sample data is produced by location

As sampled data is produced by location, you should not use sampled data from different locations (grid squares or aggregated regions) to explore consistency of changes across those different locations. Should you require climate projections that are consistent across different locations then see the Spatially coherent projections pages.

Sampled data is also available for two sets of aggregated areas, namely administrative regions and river basins. You should not attempt to aggregate the sampled data to create projections for your own self-defined areas.

  • Take care using with the probabilities at the tails of distribution

When selecting from the sampled data by variable or pair of variables, you should take care when using and interpreting results in the tails of distribution (defined here as probabilities of less than 10% or greater than 90%). Data falling outside of the 10%-90% range are less robust, and values outside the 1%-99% range have been clipped, as explained above.

  • You can only use joint probabilities from the same batch

Users can only explore joint probabilities using variables from the same batch. Examining between variables in different batches is not appropriate and is not permitted within the User Interface.

· Take care when using a subset of samples

When selecting from a particular subset of the sampled data, take care to ensure that you understand the nature of the part of the probabilistic projections you have produced. Selecting a specific variant requires careful consideration and justification and could lead to a biased decision if used incorrectly.

Specific care should be taken when using and interpreting results in the tails - probabilities less than 10% or greater than 90% - of the projections. You should also understand the implications of the size of the selected subset.

Back to top...

  • Uncertainties

Emissions uncertainty is explored through the use of three sets of sampled data each developed using one of the three UKCP09 emission scenarios:

  • Low emissions scenario (IPCC SRES B1)
  • Medium emissions scenario (IPCC SRES A1B)
  • High emissions scenario (IPCC SRES A1FI)

A single sampled data file provides variants associated with a single, user-specified emissions scenario. Multiple sampled data files will need to be integrated to explore the implications for the results (risk, impacts and adaptation assessments) of emissions uncertainty.

For more information see the Handling uncertainty page.

Back to top...