Kriging and epistemic uncertainty : a critical discussion

Transcription

Kriging and epistemic uncertainty : a critical discussion
Kriging and epistemic uncertainty : a critical
discussion
Kevin Loquin and Didier Dubois
Abstract Geostatistics is a branch of statistics dealing with spatial phenomena modelled by random functions. In particular, it is assumed that, under some well-chosen
simplifying hypotheses of stationarity, this probabilistic model, i.e. the random function describing spatial dependencies, can be completely assessed from the dataset by
the experts. Kriging is a method for estimating or predicting the spatial phenomenon
at non sampled locations from this estimated random function. In the usual kriging
approach, the data is precise and the assessment of the random function is mostly
made at a glance by the experts (i.e. geostatisticians) from a thorough descriptive
analysis of the dataset. However, it seems more realistic to assume that spatial data
is tainted with imprecision due to measurement errors and that information is lacking to properly assess a unique random function model. Thus, it would be natural to
handle epistemic uncertainty appearing in both data specification and random function estimation steps of the kriging methodology. Epistemic uncertainty consists of
some meta-knowledge about the lack of information on data precision or on the
model variability. The aim of this paper is to discuss the pertinence of the usual random function approach to model uncertainty in geostatistics, to survey the already
existing attempts to introduce epistemic uncertainty in geostatistics and to propose
some perspectives for developing new tractable methods that may handle this kind
of uncertainty.
Key words: geostatistics; kriging; variogram; random function; epistemic uncertainty; fuzzy subset; possibility theory
Kevin Loquin
IRIT, Université Paul Sabatier, 118 Route de Narbonne, F-31062 Toulouse Cedex 9. e-mail: [email protected]
Didier Dubois
IRIT, Université Paul Sabatier, 118 Route de Narbonne, F-31062 Toulouse Cedex 9. e-mail:
[email protected]
1
2
Kevin Loquin and Didier Dubois
1 Introduction
Geostatistics is the application of the formalism of random functions to the reconnaissance
and estimation of natural phenomena.
This is how Georges Matheron [40] explains the term geostatistics in 1962 to describe a scientific approach to estimation problems in geology and mining. The development of geostatistics in the 1960s resulted from the industrial and economical
need for a methodology to assess the recoverable reserves in mining deposit. Naturally, the necessity to take into account uncertainty in such methods appeared. That
is the reason why statisticians were needed by geologists and mining industry to
perform ore assessment consistently with the available information.
Today, geostatistics is no longer restricted to this kind of application. It is applied
in disciplines such as hydrology, meteorology, oceanography, geography, forestry,
environmental monitoring, landscape ecology, agriculture or for ecosystem geographical and dynamic study.
Underlying each geostatistical method is the notion of random function [12]. A
random function describes a given spatial phenomenon over a domain. It consists
of a set of random variables, each of which describes the phenomenon at some location of the domain. By analogy with a random process, which is a set of random
variables indexed by time, a random function is a set of random variables indexed
by locations. When little information is available about the spatial phenomenon, a
random function is only specified by the set of means associated to its random variables over the domain and its covariance structure for all pairs of random variables
induced by this random function. These parameters describe, respectively, the spatial trend and spatial dependencies of the underlying phenomenon. The dependence
structural assumption underlying most of the geostatistical methods is based on the
intuitive idea that, the closer are the regions of interest, the more similar is the phenomenon in these areas. In most geostatistical methods, the dependencies between
the random variables are preferably described by a variogram instead of a covariance structure. The variogram depicts the variance of the increments of the quantity
of interest as a function of the distance between sites.
The spatial trend and spatial dependence structure of this model are commonly
supposed to be of a given form (typically, linear for the trend and spherical, power
exponential, rational quadratic for the covariance or variogram structure) with a
small number of unknown parameters. From the specification of these moments,
many methods can be derived in geostatistics. By far, kriging is the most popular
one. Suppose a spatial phenomenon is partially observed at selected sites. The aim
of kriging is to predict the phenomenon at unobserved sites. This is the problem of
spatial estimation, sometimes called spatial prediction. Examples of spatial phenomena estimations are soil nutrient or pollutant concentrations over a field observed on
a survey grid, hydrologic variables over an aquifer observed at well locations, and
air quality measurements over an air basin observed at monitoring sites.
The term kriging was coined by Matheron in honor of D.G. Krige who published
an early account of this technique [37] with applications to estimation of a min-
Kriging and epistemic uncertainty : a critical discussion
3
eral ore body. In its simplest form, a kriging estimate of the field at an unobserved
location is an optimized linear combination of the data at observed locations. The
method has close links to Wiener optimal linear filtering in the theory of random
functions, spatial splines and generalized least squares estimation in a spatial context.
A full application of a kriging method by a geostatistician involves different
steps:
1. An important structural analysis is performed: usual statistical tools like histograms, empirical cumulative distributions, can be used in conjunction with an
analysis of the sample variogram.
2. In place of the sample variogram, that does not respect suitable mathematical
properties , a theoretical variogram is chosen. The fitting of the theoretical variogram model to the sample variogram, informed by the structural analysis, is
performed.
3. Finally, from this variogram specification (which is an estimate of the dependence structure of the model), the kriging estimate is computed at the location of
interest by solving a system of linear equations of the least squares type.
Kriging methods have been studied and applied extensively since 1970 and later
on adapted, extended, and generalized. Georges Matheron, who founded the “Centre
de Géostatistiques et de Morphologie Mathématique de l’école des Mines de Paris”
in Fontainebleau, proposed the first systematic approach to kriging [40]. Many of
his students or collaborators followed his steps and worked on the development and
dissemination of geostatistics worldwide. We can mention here Jean-Paul Chilès,
Pierre Delfiner [10] or André G. Journel [32] among others. All of them worked on
extending, in many directions, the kriging methodology.
However, very few scholars discussed the nature of the uncertainty that underlies
the standard Matherionan geostatistics except G. Matheron himself [43] and even
fewer considered alternative theories to probability theory that could more reliably
handle epistemic uncertainty in geostatistics. Epistemic uncertainty is uncertainty
that stems from a lack of knowledge, from insufficient available information, about
a phenomenon. It is different from uncertainty due to the variability of the phenomenon. Typically, intervals or fuzzy sets are supposed to handle epistemic uncertainty, while probability distributions are supposed to properly quantify variability.
More generally, imprecise probability theories, like possibility theory [21], belief
functions [49] or imprecise previsions [53] are supposed to jointly handle those two
kinds of uncertainty. Consider the didactic example of a dice toss where you have
more or less information about the number of facets of the dice. When you know
that a dice has 6 facets, you can easily evaluate the variability of the dice toss: 1
chance over 6 for each facet from 1 to 6; but now, suppose that you miss some
information about the number of facets and that you just know that the dice has
either 6 or 12 facets, you can not propose a unique model of variability of the dice
toss, you can just propose two: in the first case, 1 chance over 6 for each facet
from 1 to 6 and 0 chance for each facet from 7 to 12, in the second case, 1 chance
over 12 for each facet from 1 to 12. This example enables the following simple
4
Kevin Loquin and Didier Dubois
conceptual extrapolation: when you are facing a lack of knowledge or insufficient
available information on the studied phenomenon, it is safer to work with a family
of probability distributions, i.e. to work with sets of probability measures, to model
uncertainty. Such models are generically called imprecise probability models.
Bayesian methods address the problem by attaching prior probabilities to each
potential model. However, this kind of uncertainty is of purely epistemic origin and
using a single subjective probability to describe it is debatable, since it represents
much more information than what is actually available. In our dice toss example,
choosing a probability value for the occurrence of each possible model, even if we
choose a uniform distribution, i.e. a probability of 1/2, for each possible model, is
much more information than actually available about the occurrence of the possible
models. Besides, it is not clear that subjective and objective probabilities can be
multiplied as they represent information of a very different nature.
This paper proposes a discussion of the standard approach to kriging in relation
with the presence of epistemic uncertainty pervading the data or the choice of a
variogram. In the first part of the paper basics of kriging theory are recalled and the
underlying assumptions discussed. Then, a survey of some existing intervallist or
fuzzy extensions of kriging is offered. Finally, a preliminary discussion of the role
novel uncertainty theories could play in this topic is provided.
2 Some basic concepts in probabilistic geostatistics
Geostatistics, is commonly viewed as the application of the “Theory of Regionalized
Variables” to the study of spatially distributed data. This theory is not new and
borrows most of its models and tools from the concept of stationary random function
and from techniques of generalized least-squares prediction.
Let D be a compact subset of R` and Z = {Z(x), x ∈ D} denotes a real valued
random function. A random function (or equally a random field) is made up of a
set of random variables Z(x), for each x ∈ D. In other words, Z is a set of random
variables Z(x) indexed by x. Each Z(x) takes its values in some real interval Γ ⊆ R.
In this approach, Z is the probabilistic representation of a deterministic function
z : D −→ Γ .
The data consists of n observations Zn = {z(xi ), i = 1, . . . , n} understood as a
realization of the n random variables {Z(xi ), i = 1, . . . , n} located at the n known
distinct sampling positions {x1 , . . . , xn } in D. Zn is the only available objective
information about Z on D.
Kriging and epistemic uncertainty : a critical discussion
5
2.1 Structuring assumptions
Different structuring assumptions of a random field have been proposed. They
mainly aim at making the model easy to use in practice. Results of geostatistical
methods highly depend on the choice of those assumptions.
2.1.1 The second-order stationary model
In geostatistics, the spatial dependence between the two random variables Z(x) and
Z(x0 ), located at different positions x, x0 ∈ D, is considered an essential aspect of the
model. All geostatistical models strive to capture such spatial dependence, in order
to provide information about the influence of the neighborhood of a point x on the
random variable Z(x).
A random function Z is said to be second-order stationary if any two random
variables Z(x) and Z(x0 ) have equal mean values, and their covariance function only
depends on the separation h = x − x0 . Formally, ∀x, x0 ∈ D, there exist a constant
m ∈ R and a positive definite covariance function C : D → R, such that
(
E[Z(x)] = m,
(1)
E[Z(x) − m][Z(x0 ) − m] = C(x − x0 ) = C(h).
Such a model implies that the variance of the random variables Z(x) is constant
all over the domain D. Indeed, for any x ∈ D, V (Z(x)) = C(0). In the simplest
case, the random function is supposed to be Gaussian, and the correlation function
isotropic, i.e. not depending on the direction of vector x − x0 , so that h is a positive
distance value h =k x − x0 k` .
A second-order stationary random function will be denoted by SRF in the rest of
the paper.
2.1.2 The intrinsic model
This model is slightly more general than the previous one: it only assumes that
the increments Yh (x) = Z(x + h) − Z(x), and not necessarily the random function Z
itself, form a second-order stationary random function Yh , for every vector h. More
precisely, for each location x ∈ D, Yh (x) is supposed to have a zero mean and a
variance depending only on h and denoted by 2γ(h). In that case, Z is called an
intrinsic random function, denoted by IRF in the rest of the paper, and characterized
by:
(
E[Yh (x)] = E[Z(x + h) − Z(x)] = 0,
(2)
V[Yh (x)] = V[Z(x + h) − Z(x)] = 2γ(h).
γ(h) is the variogram. The variogram is a key concept of geostatistics. It is supposed
to measure the dependence between locations, as a function of their distance.
6
Kevin Loquin and Didier Dubois
Every SRF is an IRF, the contrary is not true in general. Indeed, from any covariance function of an SRF, we can derive an associated variogram as:
γ(h) = C(0) −C(h).
(3)
Indeed,
1
V[Z(x + h) − Z(x)] ,
2
1
= V[Z(x + h)] + V[Z(x)] − 2Cov(Z(x + h), Z(x)) ,
2
1
= 2C(0) − 2C(h) .
2
γ(h) =
In the opposite direction, the covariance function of an IRF is generally not of the
form C(h) and cannot be derived from its variogram γ(h). Indeed, the inference from
the second to the third line of the above derivation shows that equality (3) only holds
if the variance of a random function is constant on the domain D. This is the case for
an SRF but not for an IRF. For example, unbounded variograms have no associated
covariance function. It does not mean that the covariance between Z(x) and Z(x+h),
when Z is an IRF, does not exist, but it is not, generally, a function of the separation
h. The variogram is a more general structuring tool than the covariance function of
the form C(h).
2.2 Simple kriging
Kriging boils down to spatially interpolating the data set Zn by means of a linear
combination of the observed values at each measurement location. The interpolation
weights depend on the interpolation location and the available data over a domain
of interest. In such a method, for estimating the value of the random function at an
unobserved site, the dependence structure of the random function is used. Once the
variogram is estimated, the kriging equations are obtained by least squares minimization.
Consider a second-order stationary random function Z, i.e. satisfying (1), informed by the data set Zn = {z(xi ), i = 1, . . . , n}. Any particular unknown value
Z(x0 ), x0 ∈ D, is supposed to be estimated by a linear combination of the n collected data points {z(xi ), i = 1, . . . , n}. This estimation, denoted by z∗ (x0 ), is given
by:
n
z∗ (x0 ) = ∑ λi (x0 )z(xi ).
(4)
i=1
The computation of z∗ (x0 ) depends on the estimation of the kriging weights Λn (x0 ) =
{λi (x0 ), i = 1, . . . , n} at location x0 . In the kriging paradigm, each weight λi (x0 )
corresponds to the influence of the value z(xi ) in the computation of z∗ (x0 ). More
Kriging and epistemic uncertainty : a critical discussion
7
precisely, the value z∗ (x0 ) is the linear combination of the data set Zn = {z(xi ), i =
1, . . . , n}, weighted by the set of influence weights Λn (x0 ).
Kriging weights are computed by solving a system of equations induced by a
least squares optimization method. It is deduced from the minimization of the estimation error variance Z(x0 ) − Z ∗ (x0 ), where Z(x0 ) is the random variable underlying the SRF Z at location x0 and Z ∗ (x0 ) = ∑ni=1 λi (x0 )Z(xi ) is the “randomized”
counterpart of the kriging estimate (4). The minimization of V[Z(x0 ) − Z ∗ (x0 )] is
carried out under the unbiasedness condition: E[Z(x0 )] = E[Z ∗ (x0 )]. This unbiasedness condition has a twofold consequence: first, it induces the following condition
on the kriging weights:
n
∑ λi (x0 ) = 1.
i=1
Indeed, due the stationarity of the mean (1), E[Z ∗ (x0 )] = E[Z(x0 )] ⇒ ∑ni=1 λi (x0 )m =
m ⇒ ∑ni=1 λi (x0 ) = 1.
Second, it implies that minimizing the variance can be rewritten in terms of
2
mean squared error to minimize. Indeed, V[Z(x0 ) − Z ∗ (x0 )] = E Z(x0 ) − Z ∗ (x0 ) −
E[Z(x0 ) − Z ∗ (x0 )) ]2 and the second term is zero. Thus,
2
V[Z(x0 ) − Z ∗ (x0 )] = E Z(x0 ) − Z ∗ (x0 ) .
Thus, the kriging problem comes down to find the least squares estimate of Z at
location x0 under the constraint
n
∑ λi (x0 ) = 1.
i=1
To obtain the kriging equations, the variance V[Z(x0 ) − Z ∗ (x0 )] is rewritten as follows:
n
n
n
∑ ∑ λi (x0 )λ j (x0 )C(xi − x j ) − 2 ∑ λ j (x0 )C(x0 − x j ) + V[Z(x0 )],
i=1 j=1
(5)
j=1
where V[Z(x0 )] = C(0), so that kriging weights only depend on the covariance function. In order to minimize the above mean squared error, the derivative according to
each kriging weight λi (x0 ) is computed:
n
∂
V[Z(x0 ) − Z ∗ (x0 )] = 2 ∑ λ j (x0 )C(xi − x j ) − 2C(x0 − xi ), ∀i = 1, . . . , n.
∂ λi (x0 )
j=1
The equations providing the kriging weights are thus obtained by letting these partial
derivatives vanish. The simple kriging equations are thus of the form:
n
C(x0 − xi ) =
∑ λ j (x0 )C(xi − x j ), ∀i = 1, . . . , n.
j=1
(6)
8
Kevin Loquin and Didier Dubois
The similarity between equations (4) and (6) is striking. The influence weights,
in the simple kriging method, are the same weights as the ones that express, for all
the locations {xi , i = 1, . . . , n}, the dependence between Z(x0 ) and Z(xi ), quantified
by C(x0 − xi ), as the weighted average of the covariances C(xi − x j ) between Z(xi )
and the random variables {Z(x j ), j = 1, . . . , n}. It can be summarized by this remark:
the influence weights of the kriging estimate are the same as the influence weights
of the dependence evaluations. It is clear that some proper dependence assessments
should be the basis for any sensible interpolation of the observations. However, it
does not seem to exist a direct intuitive interpretation why the observations should
be combined (by means of (4)) just like the dependencies (by means of (6)).
In the case of kriging with an unknown mean based on the intrinsic model, the
covariance function is replaced by the variogram in the kriging equations (6). Moreover there is an additional parameter value to be found, namely the Lagrange parameter needed to ensure the unbiasedness condition (that not always reduces to
summing the kriging weights to 1); see [10], Section 3.4.
3 Variogram or covariance function estimation
In kriging, the dependence information between observations are taken into account
to interpolate the set of points {(xi , Z(xi )), i = 1, . . . , n}. The most popular tool that
models these dependencies is the variogram and not the covariance function, because the covariance function estimation is biased by the mean. Indeed, if the mean
is unknown, which is generally the case, it affects the covariance function estimation. Geostatisticians proposed different functional models of variogram to comply
with the observations and with the physical characteristics of a spatial domain [10].
In the first part of this section, we present the characteristics of the most popular
variogram models.
Choosing one model or even combining some models to propose a new one is
a subjective task requiring the geostatistician expertise and some prior descriptive
analysis of the dataset Zn . The data is explicitly used only when a regression analysis is performed to fit the variogram model parameters to the empirical variogram.
An empirical variogram, i.e. a variogram explicitly obtained from the dataset Zn
and not by some regression on a functional model, is called a sample variogram in
the literature. In this section, we will see that a sample variogram does not fulfil (in
its general expression) the conditional negative definiteness requirement imposed on
a variogram model. We will briefly discuss this point, which explains why a sample
variogram is never used by geostatisticians to carry out an interpolation by kriging.
Kriging and epistemic uncertainty : a critical discussion
9
Figure 1 Qualitative properties of standard variogram models
3.1 Theoretical models of variogram or covariance functions
For the sake of clarity, we restrict this presentation of variogram models to isotropic
models. An isotropic variogram is invariant to the direction of the separation x − x0 .
Thus an isotropic variogram is a function γ(h), defined for h ≥ 0 ∈ R such that
h =k x − x0 k` .
Under the isotropy assumption, the variogram models have the following common behavior: they increase with h and, for most models, when h −→ ∞, they stabilize at a certain level. A non-stabilized variogram models a phenomenon whose
variability has no limit at large distances. If, conversely, the variogram converges to
a limiting value called the sill, it means that there is a distance, called the range,
beyond which Z(x) and Z(x + h) are uncorrelated. In some sense, the range gives
some meaning to the concept of area of influence. Another parameter of a variogram
that can be physically interpreted is the nugget effect: it is the value taken by the
variogram when h tends to 0. A discontinuity at the origin is generally due to geological discontinuities, measurement noise or positioning errors. Figure 1 shows a
qualitative standard variogram graph where the sill, the range and the nugget effect
are represented.
Beyond this standard shape, other physical phenomena can be modelled in a variogram. For instance, the hole effect, understood as the tendency for high values to be
surrounded by low values, is modelled by bumps on the variogram (or holes in the
covariance function). Periodicity, which is a special case of hole effect can appear
in the variogram. Explicit formulations of many popular variogram or covariance
function models can be found in [10].
Usual variogram models do not perfectly match the dependence structure corresponding to the geostatistician’s physical intuition and sample variogram analysis.
Generally, a linear combination of variograms is used, in order to obtain a more
satisfying fitting of the theoretical variogram with the sample variogram and the
10
Kevin Loquin and Didier Dubois
geostatistician’s intuition. Such a variogram is obtained by :
J
γ(h) =
∑ γ j (h).
j=1
The main reason is that such linear combinations preserve the negative definiteness
conditions requested for variograms, as seen in the next subsection.
Moreover, when the variogram varies with the direction of the separation x − x0 ,
it is said to be anisotropic. Some particular anisotropic variograms can be derived
from marginal models. The most simple procedure to construct an anisotropic variogram on R` is to compute the product of its marginal variograms, assuming the
separability of the anisotropic variogram.
3.2 Definiteness properties of covariance and variogram functions
Mathematically, variograms, covariance functions are strongly constrained. Being
extensions of the variance, some of its properties are propagated to mathematical
definitions of covariance and variogram. In particular, the positive definiteness of
the covariance function and similarly the conditional negative definiteness of the
variogram are inherited from the positivity of variances.
The variance of linear combinations of random variables {Z(xi ), i = 1, . . . , p},
p
given by ∑i=1
µi Z(xi ), could become negative if the chosen covariance function
model were not positive definite or similarly if the chosen variogram model were
not conditionally negative definite [2].
When considering an SRF, the variance of linear combinations of random variables {Z(xi ), i = 1, . . . , p} is expressed, in terms of the covariance function of the
form C(h), by
p p
h p
i
V ∑ µi Z(xi ) = ∑ ∑ µi µ jC(x j − xi ).
(7)
i=1
i=1 j=1
Since the variance is positive, the covariance function C should be positive definite
in the sense of the following definition:
Definition 1 (Positive definite function) A real function C(h), defined for any h ∈
R` , is positive definite if, for any natural integer p, any set of real `-tuples {xi , i =
1, . . . , p} and any real coefficients {µi , i = 1, . . . , p},
p
p
∑ ∑ µi µ jC(x j − xi ) ≥ 0.
i=1 j=1
Now in the case of a general IRF, i.e. an IRF with no covariance function (1) of
the form C(h), it can be shown [10] that the variance of any linear combination of
p
increments of random variables ∑i=1
µi (Z(xi ) − Z(x0 )) can be expressed, under the
p
condition that ∑i=1 µi = 0, by
Kriging and epistemic uncertainty : a critical discussion
h
V
p
i
∑ µi (Z(xi ) − Z(x0 ))
=V
i=1
h
p
11
i
∑ µi Z(xi )
i=1
p
p
= − ∑ ∑ µi µ j γ(x j − xi ).
(8)
i=1 j=1
p
µi = 0, expressions (7) and
Let us remark that for an SRF, under the condition ∑i=1
(8) can easily be switched by means of relation (3).
Since the variance is positive, the variogram γ should be conditionally negative
definite in the sense of the following definition:
Definition 2 (Conditionally negative definite function) A function γ(h), defined
for any h ∈ R` , is conditionally negative definite if, for any choice of p, {xi , i =
p
1, . . . , p} and {µi , i = 1, . . . , p}, conditionally to the fact that ∑i=1
µi = 0,
p
p
∑ ∑ µi µ j γ(x j − xi ) ≤ 0.
i=1 j=1
From expression (7), the covariance function of any SRF is necessarily positive
definite. Moreover, it can be shown that, from any positive covariance function,
there exists a Gaussian random function having this covariance function. But some
types of covariance functions are incompatible with some classes of random functions [1]. Note that the same problem holds for variograms and conditional negative
definiteness for IRF. This problem, which is not solved yet, was dubbed “internal
consistency of models” by Matheron [44, 45].
Since the covariance function of any SRF is necessarily positive definite, it means
that any function that is not positive definite (resp. conditionally negative definite)
cannot be the covariance of an SRF (resp. the variogram of an IRF).
3.3 Why not use the sample variogram ?
The estimation of spatial dependencies by means of the variogram or the covariance function is the key to any kriging method. The intuition underlying spatial
dependencies is that points x ∼ y that are close together should have close values
Z(x) ∼ Z(y) because the physical conditions are similar at those locations.
In order to make this idea more concrete, it is interesting to plot the increments
|z(xi ) − z(x j )|, quantifying the closeness z(xi ) ∼ z(x j ), as a function of the distance
ri j =k xi − x j k` , that measures the closeness xi ∼ x j .
The variogram cloud is among the most popular visualization tools used by the
geostatisticians. It plots the empirical distances ri j on the x-axis against the halved
2
squared increments vi j = 21 z(xi ) − z(x j ) on the y-axis. The choice of the halved
squared increments is due to the definition of the variogram of an IRF (2).
Figure 2 shows the variogram cloud (in blue) obtained with observations taken
from the Jura dataset available on the website http://goovaerts.pierre.googlepages.com/.
This dataset is a benchmark used all along Goovaerts book [26]. This dataset
presents concentrations of seven pollutants (cadmium, cobalt, chromium, copper,
12
Kevin Loquin and Didier Dubois
Figure 2 Variogram cloud and sample variogram
nickel, lead and zinc) measured in the French Jura region. On Figure 2, the distance
is the Euclidean distance in R2 and the variogram cloud has been computed from
cadmium concentrations at 100 locations.
From the variogram cloud it is possible to extract the sample variogram. It is
obtained by computing the mean value of the halved squared increments v in classes
of distance. The sample variogram can be defined by:
γ̂(h) =
2
1
z(xi ) − z(x j ) ,
∑
h
2|V∆ | i, j∈V h
∆
where V∆h is the set of pairs of locations such that k xi − x j k` ∈ [h − ∆ , h + ∆ ]. |V∆h |
is the cardinality of V∆h , i.e. the number of pairs in V∆h .
Figure 2 shows the sample variogram associated to the plotted cloud variogram.
It has been computed for 12 sampling locations and for a class radius ∆ equal to
half the sampling distance.
As seen in the previous sections, geostatistics relies on sophisticated statistical
models, but, in practice, geostatisticians eventually quantify these dependencies by
means of a subjectively chosen theoretical variogram. Why don’t they try to use the
empirical variogram in order to quantify the influence of the neighborhood of a point
on the value at this point ? It turns out that these empirical tools (variogram cloud or
sample variogram) generally do not fulfil the conditional negative definite requirement. In order to overcome this difficulty, two methods are generally considered:
either an automated fitting (by means of a regression analysis on the parameters
of a variogram model) or manual fitting made at a glance. Empirical variograms are
considered by the geostatisticians only as visualization or preliminary guiding tools.
Kriging and epistemic uncertainty : a critical discussion
13
7
6
5
4
3
2
1
0
−1
0
10
20
15
10
5
0
Figure 3 Kriging with a short-ranged variogram
3.4 Sensitivity of kriging to variogram parameters
The kriging parameters, i.e. range, sill and nugget effect affect the results of kriging
in various ways. For one thing, while the kriging weights sum to 1, they are not
necessarily all positive. In particular, the choice of the range of the variogram will
affect the sign of the kriging weights.
In figures 3 and 4 we consider a set of data points that form two significantly
separated clusters : there are many data-points between abcissae 0 and 5 with an
increasing trend, as well as between 10 and 15 with a decreasing trend, but none
between 5 and 10. In one cluster the data suggests an increasing function, and a
decreasing function in the other one. Figure 3 is the result of kriging with a shortranged variogram that only covers the area in each cluster of points. Figure 4 is the
result of kriging with a long-ranged variogram covering the two clusters. In the first
case, the range of the variogram does not cover the gap between the clusters.The
kriged values get closer to the mean value of the data points for kriged values of locations far away from these points. This effect creates a hollow between the clusters
at the center of the gap between them. The kriging weights are then all positive. On
the contrary, in the second case, the general trend of the data suggests a hill, which
is accounted for by the results of kriging, and can only be achieved through negative
kriging weights between the clusters of data points
A positive nugget effect may prevent the kriged surface from coinciding with the
data points. The effect of changing the sill is less significant. Nevertheless, it is clear
that the choice of the theoretical variogram parameters has a non-negligible impat
on the kriged surface.
14
Kevin Loquin and Didier Dubois
10
9
8
7
6
5
4
3
2
1
0
0
10
20
15
10
5
0
Figure 4 Kriging with a long-ranged variogram
4 Epistemic uncertainty in kriging
The traditional kriging methodology is idealized in the sense that it assumes more
information than already available. The stochastic environment of the kriging approach is in some sense too heavy compared to the actual available data, which is
scarce. Indeed, the actual data consists of a single realization to the presupposed
random function. This issue has been addressed in critiques of the usual kriging
methodology. In the kriging estimation procedure, epistemic uncertainty clearly lies
in two places of the process: the knowledge of data points and the choice of the
mathematical variogram. One source of global uncertainty is the lack of knowledge
on the ideal variogram that is used in all the estimation locations of a kriging application. Such uncertainty is global, in the sense that it affects the random function
model over the whole kriging domain. This kind of global uncertainty, to which
Bayesian approaches can be applied, contrasts with some local uncertainty that may
pervade the observations. In the usual approaches (Bayesian or not), these observations are supposed to be perfect, because they are modelled as precise values.
However in the 1980’s, some authors were concerned by the fact that epistemic uncertainty also pervades the available data, which are then modelled by means of
intervals or fuzzy intervals.
Besides, the impact of epistemic uncertainty on the kriged surface should not be
confused with the measure of precision obtained by the kriging variance V[Z(x0 ) −
Z ∗ (x0 )]. This measure of precision just reflects the lack of statistical validity of
kriging estimates at locations far from the data, under the assumption that the real
spatial phenomenon is faithfully captured by a random function (which is not the
case). The fact that the kriging variance does not depend on the measured data in
a direct way makes it totally inappropriate to account for epistemic uncertainty on
measurements. Moreover epistemic uncertainty on variogram parameters leads to
uncertainty about the kriging variance itself.
Kriging and epistemic uncertainty : a critical discussion
15
4.1 Imprecision in the variogram
Sample variograms (see for instance Figure 2) are generally far from the ideal theoretical variogram models (see for instance Figure 1) fulfilling the conditional negative definite condition. Whether the fitting is automatic (by means of a regression
analysis on the parameters of a model) or the fitting is manual and made at a glance,
an important epistemic transfer can be noticed. Indeed, whatever the method, the
geostatistician tries to summarize some objective information ( the sample variogram ) by means of a unique subjectively chosen dependence model, the theoretical
variogram. As pointed out by A. G. Journel [34]:
Any serious practitioner of geostatistics would expect to spend a good half of his or her
time looking at all faces of a data set, relating them to various geological interpretations,
prior to any kriging.
Except in [5, 6], this fundamental step of the kriging method is never quite discussed in terms of the epistemic uncertainty it creates. Intuitively, however, there is
a lack of information to properly assess a single variogram. This lack of information is a source of epistemic uncertainty, by definition [30]. As the variogram model
plays a critical role in the calculation of the reliability of a kriging estimation, the
epistemic uncertainty on the theoretical variogram fit should not be neglected. Forgetting about epistemic uncertainty in the variogram parameters, as propagated to
the kriging estimate, may result in underestimated risks and a false confidence in
the results.
4.2 Kriging in the Bayesian framework
The Bayesian kriging approach is supposed to handle this subjective uncertainty
about features of the theoretical variogram, as known by experts. In practice, the
structural (random function) model is not exactly known beforehand and is usually
estimated from the very same data from which the predictions are made. The aim
of Bayesian kriging is to incorporate epistemic uncertainty in the model estimation
and thus in the associated prediction.
In Omre [46], the user has a guess on the non stationary random function Z. This
guess is given by a random function Y on the domain D whose moments are known
and given by, ∀x, x + h ∈ D,
(
E[Y (x)] = mY ,
(9)
Cov[Y (x),Y (x + h)] = CY (h).
From the knowledge of CY (h), the variogram can also be used thanks to the relation
γY (h) = CY (0) −CY (h).
The random function Y , and more precisely functions mY , CY and γY , is the available prior subjective information about the random function Z whose value must
16
Kevin Loquin and Didier Dubois
be predicted at location x0 . In the Bayesian updating procedure of uncertainty, how
uncertainty about Y is transferred to Z, is modelled by the law that handles the uncertainty on Z conditionally to Y , i.e. the law of Z|Y . In our context, the covariance
function or the variogram of the updating law have to be estimated. They are defined
by:

0
0
 CZ|Y (h) = Cov[Z(x), Z(x + h)|Y (x ); x ∈ D],
(10)
 γZ|Y (h) = 1 V[Z(x) − Z(x + h)|Y (x0 ); x0 ∈ D].
2
From standard works on statistical Bayesian methods [7, 29], Omre extracts the
Bayes updating rules for the bivariate characteristic functions of random functions,
which are the variogram and the covariance function. The Bayesian updating rules
that enable to compute the posterior uncertainty on Z from the prior uncertainty on
Y provided by (9). The updating law (10) is equivalently obtained by:


 mZ = a0 + mY ,
CZ (h) = CZ|Y (h) +CY (h),


γZ (h) = γZ|Y (h) + γY (h),
where a0 is an unknown constant, which is (according to Omre) introduced to make
the guess less sensitive to the actual level specified, i.e. less sensitive to the assessment of mY .
From this updating procedure of the moments, one can retrieve the moments of
Z needed for kriging. What is missing in this procedure are the covariance function
or the variogram of Z|Y defined by (10). Omre proposes a usual fitting procedure to
estimate these functions. As a preliminary, we can observe that
γZ|Y (h) = γZ (h) − γY (h),
1
= V[Z(x) − Z(x + h)] − γY (h),
2
1
1
= E[(Z(x) − Z(x + h))2 ] − (mY (x) − mY (x + h))2 − γY (h).
2
2
A sample variogram is thus defined by
ˆ (h) =
γZ|Y
2
1
z(xi ) − z(x j ) − (mY (xi ) − mY (x j ))2 − 2γY (h),
∑
h
2|V∆ | i, j∈V h
∆
where V∆h is the set of pairs of locations such that k xi − x j k` ∈ [h − ∆ , h + ∆ ]. |V∆h |
is the cardinality of V∆h , i.e. the number of pairs in V∆h .
Eventually, the Bayesian kriging system is given by
n
CZ|Y (x0 − xi ) +CY (x0 − xi ) =
∑ λ j (x0 )CZ|Y (xi − x j ) +CY (xi − x j ), ∀i = 1, . . . , n.
j=1
Kriging and epistemic uncertainty : a critical discussion
17
Note that the rationale for this approach is not that the variogram of Z|Y is easier
to estimate than the one of Z. The above procedure tries to account for the epistemic uncertainty about the variogram, and the Bayesian approach does it by correcting a prior guess on the random function, then by correcting via a conditional
term. Another Bayesian approach is proposed in the paper of Handcock and Stein
[27]. It shows that ordinary kriging with a Gaussian stationary random function and
unknown mean m can be interpreted in terms of Bayesian analysis with a prior distribution locally uniform on the mean parameter m. More generally, they propose
a systematic Bayesian analysis of the kriging methodology for different mean and
variogram parametric models. Other authors developed this approach [14, 25, 9]. It
is supposed to take into account epistemic uncertainty in the sense that it is supposed to handle the lack of knowledge on the model parameters by assigning a prior
probability distribution to these parameters.
To our view, a unique prior distribution, even if claimed to be non informative in
the case of plain ignorance, is not the proper representation to capture epistemic uncertainty on the model. A unique prior models the supposedly known variability of
the considered parameter, not ignorance about it. In fact it is not clear that such parameters are subject to variability. As a more consistent approach, a robust Bayesian
analysis of the kriging could be performed. Robust Bayesian analysis consists of
working with a family of priors in order to lay bare the sensitivity of estimators to
epistemic uncertainty on the model’s parameters [8, 48].
4.3 Imprecision in the data
Because available information can be of various types and qualities, ranging from
measurement data to human geological experience, the treatment of uncertainty in
data should reflect this diversity of origin. Moreover, there is only one observation
made at each location, and this value is in essence deterministic. However one may
challenge the precision or accurateness of such measurements. Especially, geological measurements are often highly imprecise.
Let us take a simple example: the measurement of permeability in an aquifer. It
results from the interpretation of a pumping test: when pumping water from a well,
the water level will decrease in that well and also in neighboring wells. The local
permeability is obtained by fitting theoretical draw-down curves to the experimental
ones. There is obviously some imprecision in such fitting that is based on approximations to the reality (e.g., homogeneous medium). Epistemic uncertainty due to
measurement imperfections should pervade the measured permeability data. For the
inexact (imprecise) information resulting from unique assessments of deterministic
values, a nonfrequentist or subjective approach reflecting imprecision could be used.
Epistemic uncertainty about such deterministic numerical values naturally takes
the form of intervals. Asserting z(x) ∈ [a, b] comes down to claiming that the actual
value of a quantity z(x) lies between a and b. Note that while z(x) is an objective
quantity, the nature of the interval [a, b] is epistemic, it represents expert knowledge
18
Kevin Loquin and Didier Dubois
about z(x) and has no existence per se. The interval [a, b] is a set of mutually exclusive values one of which is the right one: the natural interpretation of the interval is
that z(x) 6∈ [a, b] is considered impossible.
A fuzzy subset F [21, 54] is a richer representation of the available knowledge in
the sense that the membership degree F(r) is a gradual estimation of the conformity
of the value z(x) = r to the expert knowledge. In most approaches, fuzzy sets are
representations of knowledge about underlying precise data. The membership grade
F(r) is interpreted as a degree of possibility of z(x) = r according to the expert [55].
In this setting, membership functions are interpreted as possibility distributions that
handle epistemic uncertainty due to imprecision on the data.
Possibility distributions can often be viewed as nested sets of confidence intervals
[18]. Let Fα = {r ∈ R : F(r) ≥ α} be called an α-cut. F is called a fuzzy interval
if and only if ∀0 < α ≤ 1, Fα is an interval. When α = 1, F1 is called the mode of
F if reduced to a singleton. If the membership function is continuous, the degree of
certainty of z(x) ∈ Fα is equal to 1 − α, in the sense that any value outside Fα has
possibility degree α. So it is sure that z(x) ∈ S(F) = limα→0 Fα (this is the support
of F), while there is no certainty that the most plausible values in F1 contain the
actual value. Note that the membership function can be retrieved from its α-cuts, by
means of the relation:
F(r) = sup α.
r∈Fα
Therefore, suppose that the available knowledge supplied by an expert comes in
the form of nested confidence intervals {Ik , k = 1, . . . , K} such that I1 ⊂ I2 ⊂ · · · ⊂
IK with increasing confidence levels ck > ck0 if k > k0 , the possibility distribution
defined by
F(r) = min max(1 − ck , Ik (r)),
k=1,...,K
is a faithful representation of the supplied information. Viewing a possibility degree
as an upper probability bound [53], F is an encoding of the probability family {P :
P(Ik ) ≥ ck }. If cK = 1 then the support of this fuzzy interval is IK .
If an expert only provides a mode c and a support [a, b], it makes sense to represent this information as the triangular fuzzy interval with mode c and support [a, b]
[19]. Indeed F then encodes a family of (subjective) probability distributions containing all the unimodal ones with support included in [a, b].
5 Intervallist kriging approaches
This section and the next one refers to works done in the 1980’s. Even if some
of them can be considered obsolete, their interest lies in their being early attempts
to handle some form of epistemic uncertainty in geostatistics. While some of the
proposed procedures look questionable, it is useful to understand their merits and
limitations in order to avoid pitfalls and propose a well-founded methodology to that
effect. Since then, it seems that virtually no new approaches have been proposed in
Kriging and epistemic uncertainty : a critical discussion
19
the recent past, even if some of the problems posed more than 20 years ago have
now received more efficient solutions, for instance the solving of interval problems
via Gibbs sampling [23].
5.1 The quadratic programming approach
In [22, 36], the authors propose to estimate z∗ (x0 ), from imprecise information available as a set of constraints on the observations. Such constraints can also be seen
as inequality-type data, i.e. the observation located at the position xi is of the form
z(xi ) ≥ a(xi ) and/or z(xi ) ≤ b(xi ).
This approach also assumes a global constraint which is that whatever the position x0 ∈ D, the kriging estimate z∗ (x0 ) is bounded, which can be translated by
∀x0 ∈ D, z∗ (x0 ) ∈ [a, b].
(11)
For instance any ore mineral grade is necessary a value within [0, 100%].
Any kind of data, i.e. precise or inequality-type, can always be expressed in terms
of an interval constraint:
z(xi ) ∈ [a(xi ), b(xi )], ∀i = 1, . . . , n.
(12)
Indeed precise data can be modelled by constrained data (12) with equal upper and
lower bound and an inequality-type data z(xi ) ≥ a(xi ) (resp. z(xi ) ≤ b(xi )) can be
expressed as [a(xi ), b] (resp. [a, b(xi )]). Thus the data set is now given by Z¯n =
{z̄(xi ) = [a(xi ), b(xi )], i = 1, . . . , n}.
As mentioned by A. Journel [35], this formulation of the problem allows to cope
with the recurring question of the positiveness of the kriging weights, which the basic kriging approaches cannot ensure. Negative weights are generally seen as being
“evil”, due to the fact that the measured spatial quantity is positive and their linear
combination (4) with some negative weights could lead to a negative kriging estimate. More generally, nothing prevents the kriged values to violate range constraints
induced by practical considerations on the studied quantity. Hence, one is tempted
by the incorrect conclusion that all kriging weights should be positive. Actually,
having some negative kriging weights is quite useful, since it allows a global kriging estimate to fall outside the range [mini z(xi ), maxi z(xi )]. Instead of forcing the
weights to be positive, the constraint-based approach forces the estimate to be positive by adding a constraint on the estimate to the least squares optimization problem.
More generally, the global constraint (11), solves the problem of getting meaningful
kriging estimates.
In [39], J.L. Mallet proposes a particular solution to the problem of constrained
optimization given by means of quadratic programming, i.e. to the problem of minimizing (or maximizing) a quadratic form (the error variance) under the constraint
that the solution of this optimization program is inside the range [a, b].
The dual expression [10] of the kriging estimate (4) is given by :
20
Kevin Loquin and Didier Dubois
n
z∗ (x0 ) = ∑ νiC(xi − x0 ).
(13)
i=1
This expression is obtained by incorporating in (4), the kriging weights that are the
solutions of the kriging equation (6). Thus the dual kriging weights {νi , i = 1, . . . , n}
now reflect the dependencies between observations {C(xi − x j ), i, j = 1, . . . , n} and
the observations {z(xi ), i = 1, . . . , n}1 .
Built on Mallet’s approach [39], Dubrule and Kostov [22, 36] proposed a solution to this interpolation problem, that takes the form (13), where the dual kriging
weights {νi , i = 1, . . . , n} are obtained by means of the quadratic program minimizing
n
n
∑ ∑ νi ν jC(xi − x j ),
(14)
i=1 j=1
subject to n constraints
n
a(xi ) ≤
∑ ν jC(x j − xi ) ≤ b(xi )
j=1
induced by the dataset Z¯n = {z̄(xi ) = [a(xi ), b(xi )], i = 1, . . . , n}. When only precise observations (i.e. when no inequality-type constraint) are present, the system
reduces to a standard simple kriging system.
However, the ensuing treatment of these constraints is ad hoc. Indeed, the authors
propose to select one bound among a(xi ), b(xi ) for each constraint, namely the one
supposed to affect the kriging estimate. They thus select a precise data set made
of the selected bounds. The choice of this data set is just influenced by the wishes
of the geostatistician in front of the raw data and on the basis of some preliminary
kriging steps performed from some available precise data (if any). These limitations
can nowadays be tackled using Gibbs sampling methods [23].
5.2 The soft kriging approach
Methodology
In 1986, A. Journel [35] studied the same problem of adapting the kriging methodology in order to deal with what he called “soft” information. According to him,
1
It can be noted that, in the precise framework, the dual formalism of kriging is computationally
interesting. Indeed, the kriging system to be solved is obtained by minimization of (14), whatever the position of estimation x0 . It means that the kriging system has to be solved only once to
provide an interpolation over all the domain. However, this system is difficult to solve and badly
conditionned. Whereas the non dual systems, where the matrices’ coefficients are generally scarce,
are more tractable. Therefore, it should be preferred to solve the dual kriging system in case of a
high quantity of estimation points but with a small dataset and it should be preferred to solve the
usual kriging system in case of a small number of estimation points but with a large dataset.
Kriging and epistemic uncertainty : a critical discussion
21
Figure 5 Prior information on the observations
“soft” information consists of imprecise data z̃(xi ), especially intervals, encoded by
cumulative distribution functions (cdf) Fxi .
The cumulative distribution function Fxi , attached to a precise value z(xi ) = ai =
bi can be modelled by a step-function cdf with parameter ai = bi , i.e.:
(
1, if s ≥ a(xi ) = b(xi ),
Fxi (s) =
0, otherwise.
(cf. Figure 5.(a)). At each location xi where a constraint interval z̄(xi ) of the form
(12) is present, the associated cdf Fxi is only known outside the constraint interval
where it is either 0 or 1, i.e. :


 1, if s ≥ b(xi ),
Fxi (s) = 0, if s ≤ a(xi ),
(15)


? otherwise.
(cf. Figure 5.(c)). If the expert is unable to decide where, within an interval z̄(xi ) =
[a(xi ), b(xi )], the value z(xi ) may lie, a non informative prior cdf (15) should be
used, not as a uniform cdf within that interval, as the principle of maximum entropy
would suggest, since it is not equivalent to a lack of information.
In addition to the constraint interval z̄(xi ) of Dubrule and Kostov [22, 36], some
prior information allows quantifying the likelihood of value z(xi ) within that interval. The corresponding cumulative distribution function Fxi (cf. Figure 5.(b)) is thus
completed with prior subjective probabilities.
At any other location, a minimal interval constraint exists (cf. (11) and Figure
5.(d)): z∗ (x) ∈ [a, b]. This constraint, as in the quadratic programming approach of
Dubrule and Kostov, enables the problem of negative weights to be addressed.
22
Kevin Loquin and Didier Dubois
From this set of heterogeneous prior pieces of information, that we will denote
by Z˜n = {z̃(xi ) = Fxi , i = 1, . . . , n}, Journel [35] proposes to construct a “posterior”
cdf at the kriging estimation location x0 , denoted by
Fx0 |Z˜n (s) = P(Z(x0 ) ≥ s|Z˜n ).
In its simplest version, the so-called “soft” kriging estimate of the “posterior” cdf
Fx0 |Z˜n is defined as a linear combination of the prior cdf data, for a given threshold
value s ∈ [a, b], i.e.
n
Fx0 |Z˜n (s) = ∑ λi (x0 , s)Fxi (s),
(16)
i=1
where the kriging weights, for a given threshold s ∈ [a, b], are obtained by means of
usual kriging based on the random function Y (x) = Fx (s) at location x.
Despite its interest, there are some aspects of this approach that are debatable:
1. The use of Bayesian semantics. Journel proposes to use the terminology of
Bayesian statistics, by means of the term prior for qualifying the probabilistic
information attached to each piece of data and the term posterior, for qualifying
the probabilistic information on the estimation point. However, in his approach,
the computation of the posterior cdf is not made by means of the Bayesian updating procedure. He probably made this terminological choice because of the
subjectivist nature of the information However, this choice is not consistent with
the Bayesian statistics.
2. The choice of a linear combination of the cdfs to compute the uncertain estimate. A more technical criticism of his approach concerns the definition of the
kriged “posterior” cdf (16). The appropriateness of this definition supposes that
the cdf of a linear combination of random variables is the linear combination
of cdfs of these random variables. However, this is not correct. Propagating uncertainty on parameters of an operation is not as simple as just replacing the
parameters by their cdfs in the operation. Indeed, the cdf of Z ∗ (x0 ) in (4), when
{Z(xi ), i = 1, . . . , n} are random variables with cdfs given by {Fxi , i = 1, . . . , n}, is
not given by (16), but via a convolution operator that could be approximated by
means of a Monte Carlo method. If we assume a complete dependence between
measurements of Z(xi ), one may also construct the cdf of Z ∗ (x0 ) as a weighted
sum of their quantile functions (inverse of cdf).
These defects make this approach theoretically unclear, with neither an interpretation in the Bayesian framework nor in the frequentist framework. Note that the
author [35] already noted the strong inconsistency of his method, namely the fact
that the “posterior” cdf (16) does not respect the monotonicity property, inherent
to the definition of a cumulative distribution function. Indeed, when some kriging
weights are negative, it is not warranted that for s > s0 , Fx0 |Z˜n (s) > Fx0 |Z˜n (s0 ). He
proposes an ad hoc correction of the kriging estimates, replacing the decreasing
parts of Fx0 |Z˜n by flat parts.
In spite of these criticisms of the well-foundedness of the Journel’s approach, a
basic idea for handling epistemic uncertainty in the data appears in his paper. In-
Kriging and epistemic uncertainty : a critical discussion
23
deed, the way Journel proposes to encode the dataset is the first attempt by some
geostatisticians, to our knowledge, to handle incomplete information (or epistemic
uncertainty) in kriging. Indeed the question mark in the encoding of a uniform intervallist data (15) is the first modelling of ignorance in geostatistics. This method
tends to confuse subjective, Bayesian, and epistemic uncertainty. This confusion can
now be removed in the light of recent epistemic uncertainty theories. Interestingly,
their emergence [21, 53] occurred when the confusion between subjectivism (de
Finetti’s school of probability [13]) and Bayesianism began to be clarified.
6 Fuzzy kriging
There are two main fuzzy set counterparts of statistical methods : The first one
extends statistical principles like error minimisation, unbiasedness or stationarity to
fuzzy set-valued realisations. Such an adaptation of prediction by kriging to triangular fuzzy data was suggested by Diamond [17]. The second one applies the extension
principle to the kriging estimate [4, 5, 6] in the spirit of sensitivity analysis.
6.1 Diamond’s fuzzy kriging
In the late 1980’s Phil Diamond was the first to extend Matheronian statistics to the
fuzzy set setting, with a view to handle imprecise data. The idea was to exploit the
notion of fuzzy random variables which had emerged a few years earlier after several
authors (see [11] for a bibliography). Diamond’s approach relies on the Puri and
Ralescu version of fuzzy random variables [47], which is influenced by the theory
of random sets developed in the seventies by Matheron himself [42]. Diamond also
proposed an approach to fuzzy least squares in the same spirit [15].
6.1.1 Methodology
The data used by Diamond [17] are modelled by triangular fuzzy numbers, because
of both their convenience and their applicability in most practical cases. Those triangular fuzzy numbers T̂ are defined by their mode T m and left and right bounds of
their support T − and T + . They are then denoted by T̂ = (T m ; T − , T + ). The set of
all fuzzy triangular numbers is denoted by T .
Diamond proposes to work with a distance D2 on T that makes the metric space
(T , D2 ) complete [17] :
∀Â, B̂ ∈ T , D2 (Â, B̂) = (Am − Bm )2 + (A− − B− )2 + (A+ − B+ )2 .
24
Kevin Loquin and Didier Dubois
A Borel σ -algebra B can be constructed on this complete metric space. This allows
the definition of fuzzy random variables [47], viewed as mappings from a probability space to a specific set of functions, namely a set (T , B) of triangular fuzzy
random numbers.
The expectation of a triangular fuzzy random number X̂ is obtained by extending
the concept of Aumann integral [3], defined for random sets, to all α-cuts of X̂.
Definition 3 Let X̂ be a triangular fuzzy random number, i.e. a T -valued random
variable, the α-cuts of its expectation, denoted by Ê[X̂], are given by:
α
∀α ∈ [0, 1], Ê[X̂] = EAumann [X̂ α ]
It can be shown that the expected value of a triangular fuzzy random number X is a
triangular fuzzy number, that will be denoted by Ê[X̂] = (E[X]m ; E[X]− , E[X]+ ).
From those definitions, Diamond proposes to extend the concept of random function to triangular fuzzy random functions, which are T -valued random functions.
He proposes to work with second-order stationary triangular fuzzy random functions Ẑ, that verify, ∀x, x + h ∈ D,
(
Ê[Ẑ(x)] = (M m ; M − , M + ) = M̂,
Cov(Ẑ(x), Ẑ(x + h)) = (Cm (h);C− (h),C+ (h)) = Ĉ(h),
where the triangular fuzzy expected value is constant on D and the triangular fuzzy
covariance function is defined by :
 m
m
m
m 2

 C (h) = E[Z (x)Z (x + h)] − (M )
C− (h) = E[Z − (x)Z − (x + h)] − (M − )2

 +
C (h) = E[Z + (x)Z + (x + h)] − (M + )2
(17)
Now, from this definition of fuzzy covariance function, the problem is to predict
the value of the regionalized triangular fuzzy random variable Ẑ(x0 ) at x0 . For this
prediction the following linear estimator is used
ẑ∗ (x0 ) =
n
M
λi (x0 )ẑ(xi ),
i=1
where {ẑ(x
), i = 1, . . . , n} are fuzzy data located on precise locations {xi , i =
Li
1, . . . , n},
is the extension of the Minkowski addition of intervals to fuzzy triangular numbers.
The set of precise kriging weights {λi (x0 ),
by minimiza
i = 1, . . . , n} is obtained
tion of the precise mean squared error D = E D2 (Ẑ ∗ (x0 ), Ẑ(x0 ))2 The unbiasedness
condition is extended to fuzzy quantities induces the usual condition ∑ni=1 λi (x0 ) = 1
on kriging weights.
Due to the form of distance D2 , the expression to be minimized, along the same
line as simple kriging, can be expressed by :
Kriging and epistemic uncertainty : a critical discussion
n
n
n
D = ∑ ∑ λi (x0 )λ j (x0 )C(xi − x j ) − 2 ∑ λ j (x0 )C(x0 − x j ) + C(x0 − x0 ),
i=1 j=1
25
(18)
j=1
with C(xi − x j ) = Cm (xi − x j ) +C− (xi − x j ) +C+ (xi − x j ), ∀i, j = 0, . . . , n. The minimization of the error (18) leads to the following kriging system:
 n

 ∑ λ j (x0 )C(xi − x j ) − C(x0 − xi ) − θ − Li = 0, ∀i = 1, . . . , n




j=1



n


 λi (x0 ) = 1
∑
 i=1

n


 L λ (x ) = 0

i i 0
∑


 i=1



Li , λi (x0 ) ≥ 0, ∀i = 1, . . . , n.
Where L1 , L2 , . . . , Ln and θ are Lagrange multipliers which allow, under KuhnTucker conditions, solving the optimization program for finding the set of kriging
weights {λi (x0 ), i = 1, . . . , n} minimizing the error D. It should be noted that, in
1988, i.e. one year before the publicaation of his fuzzy kriging article, Philip Diamond published the same approach, restricted to interval data [16].
6.1.2 Discussion
Despite its mathematical rigor, there are several aspects of this approach that are
debatable:
1. the shift from a random function to a fuzzy valued random function,
2. the choice of a scalar distance D2 between fuzzy quantities,
3. the use of a Hukuhara difference in the computation of fuzzy covariance (17).
1. The first point presupposes a strict adherence to the Matheron school of geostatistics. However, it makes the conceptual framework (both at the conceptual and
practical level) even more diffficult to grasp. The metaphor of a fuzzy random field
looks like an elusive artefact. The fuzzy random function is a mere substitute to
a random function, and leads to a mathematical model with more parameters than
the standard kriging technique. The key question is then: does it properly handle
epistemic uncertainty?
2. The choice of a precise distance between fuzzy intervals is in agreement with
the use of a precise variogram and it leads to a questionable way of posing the least
square problem.
First, a precise distance is used to measure the variance of the difference between
the triangular fuzzy random variables Ẑ(x0 ) and Ẑ ∗ (x0 ). This is in contradiction
with using a fuzzy-valued covariance when defining the stationarity of the triangular
fuzzy random function Ẑ(x). Why not then define the covariance between the fuzzy
random variables Ẑ(x) and Ẑ(x + h) as E[D2 (Ẑ(x), M̂)D2 (Ẑ(x + h), M̂)], i.e. like
26
Kevin Loquin and Didier Dubois
the variance of Ẑ(x0 ) − Ẑ ∗ (x0 )? Stationarity should then be expressed as C(h) =
E[D2 (Ẑ(x), M̂)D2 (Ẑ(x + h), M̂)].
However, insofar as fuzzy sets represent epistemic uncertainty, the fuzzy random
function might represent a fuzzy set of possible standard random functions, one of
which is the right one. Then, the scalar variance of a fuzzy random variable based on
distance D2 evaluates the precise variability of functions representing the knowledge
about ill-known crisp (i.e. fuzzy) realizations. However, it does not evaluate the imprecise knowledge about the variability of the underlying precise realizations [11].
The meaning of extracting a precise variogram from fuzzy data and of the problem
of minimizing the scalar variance of the membership functions (18) remains unclear.
To our opinion, the approach of Diamond is not cogent for handling epistemic uncertainty. In [11] a survey of possible notions of variance of fuzzy random variables,
with discussions on their significance in the scope of epistemic uncertainty is proposed. It is argued that if a fuzzy random variable represents epistemic uncertainty,
its variance should be imprecise or fuzzy as well.
3. The definition of second-order stationarity for triangular fuzzy random functions is highly questionable. The fuzzy covariance function Ĉ(h) (17) proposed by
Diamond is supposed to reflect the epistemic uncertainty on the covariance between
Ẑ(x) and Ẑ(x + h), which finds its source in the epistemic uncertainty conveyed by
Ẑ. In his definition (17) of Ĉ(h), Diamond uses the Hukuhara difference [31]
between supports of triangular fuzzy numbers
(E[Z m (x)Z m (x + h)]; E[Z − (x)Z − (x + h)], E[Z + (x)Z + (x + h)])
and M̂. The Hukuhara difference between two intervals is of the form [a, b][c, d] =
[a − c, b − d]. Note that, the result may be such that a − c > b − d, i.e. not an interval.
So, it is not clear that the inequalities C− (h) ≤ Cm (h) ≤ C+ (h) always hold when
computing E[Ẑ(x)Ẑ(x + h)] M̂ 2 .
The Hukuhara difference [31] between intervals is defined such that,
[a, b] [c, d] = [u, v] ⇐⇒ [a, b] = [c, d] ⊕ [u, v] = [c + u, d + v],
where ⊕ is the usual Minkowski addition of intervals.
This property of the Hukuhara difference allows interpreting the epistemic transfer induced by this difference in the covariance definition of Diamond. In the standard case, the identity E[Z(x) − m][Z(x + h) − m] = E[Z(x)Z(x + h)]) − m2 = C(h)
holds. When extending it to the fuzzy case in Diamond method, it is assumed that :
• Ẑ(x)Ẑ(x + h) and M̂ 2 are triangular fuzzy intervals when Ẑ(x) and M̂ are such.
This is only a coarse approximation.
• [E[Z − (x)Z − (x + h)], E[Z + (x)Z + (x + h)]] = [C− (h),C+ (h)] ⊕ [M − , M + ], so that
the imperfect knowledge about Ĉ(h) ⊕ M̂ 2 is identified to the imperfect knowledge about Ê[Ẑ(x)Ẑ(x + h)]. An alternative definition is to let
[E[Z − (x)Z − (x + h)], E[Z + (x)Z + (x + h)]] [M − , M + ] = [C− (h),C+ (h)],
Kriging and epistemic uncertainty : a critical discussion
27
using Minkowski difference of fuzzy intervals instead of Hukuhara difference
in equation (17). It would ensure that the resulting fuzzy covariance is always a
fuzzy interval, but it would be more imprecise. Choosing between both expressions require some assumption about the origin of epistemic uncertainty in this
calculation.
• Besides, stating the fuzzy set equality Ĉ(h) = Ê[(Ẑ(x) − Ê(Z(x)))(Ẑ(x + h) −
Ê(Z(x + h)))] does not enforce the equality of the underlying quantities on each
side.
Finally, the Diamond approach precisely interpolates between fuzzy observations
at various locations. Hence, the method does not propagate the epistemic uncertainty bearing on the variogram. Albeit fuzzy kriging provides a fuzzy interval estimate ẑ∗ (x0 ), it is difficult to interpret this fuzzy estimate as picturing our knowledge
about the actual z∗ (x0 ) one would have obtained via kriging if the data had been
precise. Indeed, the scalar influence coefficients in Diamond method reflect both the
spatial variability of Z and the variability of the epistemic uncertainty of observations. This way of handling intervals or fuzzy intervals as “real” data is in fact much
influenced by Matheron’s random sets where set realizations are understood as real
objects (geographical areas), not as imprecise information about precise locations.
The latter view of sets as epistemic constructs is more in line with Shafer’s theory of
evidence [49], which also uses the formalism of random sets, albeit with the purpose
of grasping incomplete information.
Overall, viewed from the point of view of epistemic uncertainty, this approach to
kriging looks questionable both at the philosophical and computational levels. Nevertheless the technique has been used in practical applications [52] by Taboada et
al. in a context of evaluation of reserves in an ornamental granite deposit in Galicia
in Spain.
6.2 Bardossy’s fuzzy kriging
Not only may the epistemic uncertainty about the data z(xi ) be modelled by intervals
or fuzzy intervals, but one may argue that the variogram itself in its mathematical
version should be a parametric function with interval-valued or fuzzy set-valued parameters. While Diamond was proposing a highly mathematical approach to fuzzy
kriging, Bardossy et al. [4, 5, 6] between 1988 and 1990 also worked on this issue of
extending kriging to epistemic uncertainty caused by fuzzy data. Beyond this adaptation of the kriging methodology to fuzzy data, they also propose in their method
to handle epistemic uncertainty on the theoretical variogram model.
In their approach, the variogram is tainted with epistemic uncertainty because the
parameters of the theoretical variogram model are supposed to be fuzzy subsets. The
epistemic uncertainty of geostatisticians regarding these parameters is then propagated to the variogram by means of the extension principle. Introduced by Lotfi
Zadeh [54], it provides a general method for extending non fuzzy models or functions in order to deal with fuzzy parameters. For instance, fuzzy set arithmetics [21],
28
Kevin Loquin and Didier Dubois
that generalizes interval arithmetics, has been developed by applying the extension
principle to the classical arithmetic operations like addition, subtraction...
Definition 4 If U,V and W are sets, and f is a mapping from U × V to W . Let A
be a fuzzy subset on U with a membership function also denoted by µA , likewise a
fuzzy set B on V . The image of (A, B) in W by the mapping f is a fuzzy subset C on
W whose membership function is obtained by:
µC (w) =
sup
min(µA (u), µB (v)).
(u,v)∈U×V |w= f (u,v)
In terms of possibility theory, it comes down to computing the degree of possibility
Π ( f −1 (w)), w ∈ W . Actually, in their approach, Bardossy et al. do not directly use
such a fuzzy variogram model in the kriging process. Their approach is, in a sense,
more global since they propose to apply the extension principle, not only to the
variogram model, but to the entire inversed kriging system and to the obtained kriging estimate z∗ (x0 ), because it is a function of the observations {z(xi ), i = 1, . . . , n},
of the parameters of the variogram model {a j , j = 1, . . . , p} and of the estimation
position x0 . In other words, they express the kriging estimate as
z∗ (x0 ) = f (z(x1 ), . . . , z(xn ), a1 , . . . , a p , x0 ),
and they apply the extension principle to propagate the epistemic uncertainty of the
fuzzy observations {ẑ(xi ), i = 1, . . . , n} and of the fuzzy parameters of the variogram
model {â j , j = 1, . . . , p} to the kriging estimate ẑ∗ (x0 ). They propose to numerically
solve the optimisation problem induced by their approach, without providing details.
This approach is more consistent with the epistemic uncertainty involved in the
kriging methodology than the Diamond’s method. However, there does not seem to
be a tractable solution that can be applied to large datasets because of the costly
optimisation involving fuzzy data. The question whether the epistemic uncertainty
conveyed in an imprecise variogram is connected or not to the epistemic uncertainty
about the data is worth considering. However, even in the presence of a precise
dataset, one may argue that the chosen variogram is tainted with epistemic uncertainty that only the expert, who chooses it, could estimate.
7 Uncertainty in kriging: a prospective discussion
The extensions of kriging studied above may lead to a natural questioning about the
nature of the uncertainty that pervades this interpolation method. Indeed, taking into
account this kind of imperfect knowledge suggests, in the first stance, that the usual
approach does not properly handle the available information. Being aware that information is partially lacking is in itself a piece of (meta-)information. Questioning
the proper handling of uncertainty in kriging leads to examine two issues:
Kriging and epistemic uncertainty : a critical discussion
29
• Is the random function model proposed by Matheron and followers cogent in
spatial prediction?
• How to adapt the kriging method to epistemic uncertainty without making the
problem intractable?
These questions seem require a reassessment of the role of probabilistic modeling
in the kriging task, supposed to be of an interpolative nature, while it heavily relies
on the use of least squares methods that are more central to regression techniques
than to interpolation per se.
7.1 Spatial vs. fictitious variability
It is commonly mentioned that probabilistic models are natural representations of
phenomena displaying some form of variability. Repeatability is the central feature
of the idea of probability as pointed out by Shafer and Vovk [50]. This is embodied
by the use of probability trees, Markov chains and the notion of sample space. A
random variable V (ω) is a mapping from a sample space Ω to the real line, and
variability is captured by binding the value of V to the repeated choices of ω ∈ Ω ,
and the probability measure that equips Ω summarizes the repeatability pattern.
In the case of the random function approach to geostatistics, the role of this scenario is not quite clear. Geostatistics is supposed to handle spatial variability of a
numerical quantity z(x) over some geographical area D. Taken at face value, spatial variability means that when the location x ∈ D changes, so does z(x). However,
when x is fixed z(x) is a precise deterministic value. Strictly speaking, these considerations would lead us to identify the sample space with D, equipped with the
Lebesgue measure.
However, the classical geostatistics approach after Matheron is at odds with this
simple intuition. It postulates the presence of a probability space Ω such that the
quantity z depends on both x and ω ∈ Ω . z is taken as a random function: for each
x, the actual value z(x) is substituted with a random variable Z(x) from a sample
space Ω to the real line. The probability distribution of Z(x) is thus attached to
the quantity of interest z(x) at a location x. It implicitly means that this quantity of
interest is variable (across ω) and that you can quantify the variability of this value.
In the spatial interpolation problem solved by kriging, this kind of postulated
variability at each location x of a spatial domain D, corresponds to no actual phenomenon. As Chilès and Delfiner [10] (p. 24) acknowledge,
The statement “z(x) is a realization of a random function Z(x)” or even “of a stationary
random function,” has no objective meaning.
Indeed, the quantity of interest at an estimation site x is deterministic and a single
observation z(xi ) for a finite set of locations xi is available. It does not look sufficient
to determine a probability distribution at each location x even if each Z(x) were
actually tainted with variability.
30
Kevin Loquin and Didier Dubois
In fact, geostatisticians consider random functions not as reflecting randomness
or variability actually present in natural phenomena, but as a pure mathematical
model whose interest lies in the quality of predictions it can deliver. As Matheron
said:
Il n’y a pas de probabilité en soi, il y a des modèles probabilistes2
The great generality of the framework, whereby a deterministic spatial phenomenon is considered as a (unique) realisation of a random function is considered
to be non-constraining because it cannot be refuted by reality, and is not directly
viewed as an assumption about the phenomenon under study. The spatial ergodicity
assumption on the random function Z(x) is instrumental to relate its fictitious variability at each location of the domain to the spatial variability of the deterministic
quantity z(x). While this assumption is easy to interpret in the temporal domain, it is
less obvious in the spatial domain. The rle of spatial ergodicity and stationarity assumptions is mainly to offer theoretical underpinnings to the least square technique
used in practice. In other words, the random function approach is to be taken as a
formal black-box model for data-based interpolation, and has no pretence to represent any real or epistemic phenomenon (beyond the observed data z(xi )). Probability
in geostatistics is neither objective nor subjective: it is mathematical.
7.2 A deterministic justification of simple kriging
One way of interpreting random functions in terms of actual (spatial) randomness
is to replace pointwise locations by subareas (“blocks”) over which average estimations can be computed. Such blocks must be small enough for ensuring a meaningful spatial resolution but large enough to contain a statistically significant number
of measurements. This is called the trade-off between objectivity and spatial resolution. At the limit, using a single huge block, the random function is the same at each
point and reflects the variability of the whole domain. On the contrary, if the block
is very small, only a single observation is available, and an ill-known deterministic
function is obtained.
Some authors claim the deterministic nature of the kriging problem should be acknowledged. Journel [33] explains how to retrieve all equations of kriging without
resorting to the concept of a random function. This view is close to what Matheron
calls the transitive model. The first step is to define experimental mean m̂, standard deviation σ̂ and variogram γ̂ from the set of observation points {(xi , z(xi )), i =
1, . . . , n} in a block A . The two first quantities are supposed to be good enough
approximations of the actual mean mA and standard deviation σA of z(x) in block
A , viewed as a random variable with sample space A (and no more a fictitious random variable with an elusive sample space Ω ). The sample variogram value γ̂(h)
approximates the quantity :
2
cited by J.-P. Chilès
Kriging and epistemic uncertainty : a critical discussion
31
2
Ah (z(x + h) − z(x)) dx
R
γA (h) =
2|Ah |
,
taken over the set Ah formed by intersecting A and its translated by −h. In fact
γA (h) = γA (−h) and the variogram value γA (h) applies to the domain Ah ∪ A−h .
For h small enough it is representative of A itself.
Journel [33] shows that there exists a stationary random function ZA (x) having
such empirical characteristics: mA , σA and γA .
Thus, if we define z∗ (x) = ∑ni=1 λi (x)z(xi ), the estimation variance (under the unbiasedness condition), defined by V[ZA (x) − Z ∗ (x)] = E[(ZA (x) − Z ∗ (x))2 ], where
Z ∗ (x)R is the “randomized” kriging estimate of ZA (x) coincides with the spatial inte(z(x)−z∗ (x))2 dx
. Hence, ordinary kriging is basically the process of minimizing
gral A
|A |
a spatially averaged squared error over the domain A on the basis of available observations.
The following assumption is made:
A ' {x : x ∈ A , x + hi ∈ A , i = 1, . . . , n},
where hi = x0 − xi . It means that we restrict the kriging to the vicinity of the sample
points xi and that this estimation area is well within A . It leads to retrieve the kriging
equations.
The unbiasedness assumption of the stochastic kriging is replaced by requiring a
zero average error over A that no longer depends on x0 :
∗
A (z(x) − z (x))dx
R
eA (x0 ) =
|A |
= 0.
Note that RA z∗ (x)dx = ∑ni=1R λi (x0 ) A z(x +R hi )dx, and that due to Rthe above assumption, A z(x + hi )dx = A z(x)dx. So, A z(x)dx = ∑ni=1 λi (x0 ) A z(x)dx and
therefore, ∑ni=1 λi (x0 ) = 1.
Then the squared error can be developed as
R
R
n
n
n
(z(x)−z∗ (x))2 = z(x)2 −2 ∑ λi (x)z(x)z(x+hi )+ ∑ ∑ λi (x)λ j (x)z(x+hi )z(x+h j ).
i=1
i=1 j=1
The spatially averaged squared error is obtained by integrating this expression over
A . If we introduce the counterpart of a covariance in the form
R
C(h) =
A
z(x)z(x + h)dx
− m2A = σA2 − γA (h),
|A |
it can be shown that we recognize, in the above mean squared error, the expression (5) of the simple kriging variance based on stationary random functions. Of
course the obtained linear system of equations is also the same and requires positive
definiteness of the covariance matrix, hence the use of a proper variogram model
fitted from the sample variogram. However under the purely deterministic spatial
32
Kevin Loquin and Didier Dubois
approach, this positiveness condition appears as a property needed to properly solve
the least square equations. It is no longer related to the covariance of a random
function. Failure of this condition on the sample variogram may indicate an illconditioning of the measured data that precludes the possibility of a sensible least
square interpolation.
In summary, the whole kriging method can be explained without requiring the
postulate of a random function over D, which may appear as an elusive artefact.
There is no random function Z a realization of which is the phenomenon under study,
but rather a random variable A on each block A the sample space of which is the
block itself, that we can bind to a stationary random function ZA on the block. While
this remark will not affect the kriging practice (since both the deterministic and the
stochastic settings lead to the same equations in the end), it becomes important
when epistemic uncertainty enters the picture, as it sounds more direct to introduce
it in the concrete deterministic approach than in the abstract stochastic setting. It
also suggests that teaching the kriging method may obviate the need for deep, but
non-refutable, stochastic concepts like ergodicity and stationarity.
7.3 Towards integrating epistemic uncertainty in spatial
interpolative prediction
The above considerations lead us to a difficult task if epistemic uncertainty is to
be inserted into the kriging method. Generalizing the random function framework
to fuzzy random functions, whose mathematical framework is now well developed,
looks hopeless. Indeed it certainly would not help providing a tractable approach,
since the simplest form of kriging already requires a serious computational effort.
Adding interval uncertainty to simple kriging would also be mathematically
tricky. It has been shown above that the method proposed by Diamond is not quite
cogent, as it handles intervals or fuzzy intervals as objective values to which a scalar
distance can be applied. The approach of Bardossy looks more convincing, even if
the use of interval arithmetic is questionable. Computing an interval-valued sample
variogram via optimisation is a very difficult task. Indeed, the computation of an
interval-valued sample variance is a NP-hard problem [24].
The extension of the least squares method to interval-valued functions, if done
properly, is also a challenging task as it comes down to inverting a matrix having
interval-valued coefficients. In this respect the fuzzy least squares approach of Diamond [15], based on a scalar distance between fuzzy intervals is also problematic.
It is not clear to see what the result tells us about the uncertainty concerning all least
squares estimates that can be found from choosing precise original data inside the
input intervals.
Diamond’s kriging approach produces a scalar variogram, hence scalar influence
coefficients, to be derived, which does not sound natural, as one may on the contrary
expect that the more uncertain the data, the more uncertain the ensuing variogram.
On the other hand, extending the least square method to fuzzy data in a meaning-
Kriging and epistemic uncertainty : a critical discussion
33
ful way, that is by letting the imprecision of the variogram impact the influence
coefficients looks computationally challenging. One may think of a method dual
to the Diamond’s approach, that would be based on precise data plus an imprecise
variogram, thus leading to imprecise interpolation between precise data. Such an
imprecise variogram would be seen as a family of theoretical variograms induced
by the sample variogram. Even if we could compute fuzzy influence coefficients in
an efficient way from such imprecise or fuzzy variograms, it is not correct to apply
interval or fuzzy interval arithmetic to the linear combination of fuzzy data when
the influence coefficients are fuzzy, even if their uncertainty were independent from
the uncertainty pervading the data, due to the normalisation constraint [20]. But the
epistemic uncertainty of influence coefficients partially depends on the quality of
the data (especially if an automatic fitting procedure is used for choosing the variogram). So it is very difficult use data uncertainty in a non-redundant way in the
resulting fuzzy kriging estimates.
As far as epistemic uncertainty is concerned, there is a paradox in kriging that is
also present in interpolation techniques if considered as prediction tools: the kriging
result is precise. However, intuitively, the farther is x0 from the known points xi ,
the less we know about z(x0 ). A cogent approach to estimating the loss of information when moving away from the known locations is needed. Of course, within
the kriging approach, one can resort to using the kriging variance as an uncertainty
indicator, but it is known not to depend on the data values z(xi ), and again relies
on assumptions on the underlying fictitious random function that is the theoretical
underpinning of kriging. It is acknowledged [51] that kriging variance is not estimation variance but rather some index of data configuration. Thus, it seems obvious
that techniques more advanced than the usual kriging variance are required for producing a useful estimation of the kriging error or imprecision.
So, a rigorous handling of epistemic uncertainty in kriging looks like a nontrivial task. Is it worth the effort? In fact, kriging is a global interpolation method
that does not take into account local specificities of terrain since the variogram relies
on averages of differences of measured values at pairs of points located at a given
distance from each other. Indeed parameters of the variogram are estimated globally.
This critique can be found repeatedly in the literature. This point emphasizes the
need to use other kinds of possibly imprecise knowledge about the terrain than the
measured points.
Overall, the handling of epistemic uncertainty in spatial prediction (independently of the problem of the local validity of the kriging estimates) could be carried
out using one of the following methodologies
1. Replace the kriging approach by techniques that would be mathematically simpler, more local, and where the relationship between interpolation coefficients
and local dependence information would be more direct. For instance we could
consider interpolation techniques taking into account local gradient estimates
from neighboring points (even interpolating between locally computed slopes).
This would express a more explicit impact of epistemic uncertainty, present in the
measured data and in the knowledge of local variations of the ill-known spatial
function, on the interpolation expression, obviating the need for reconsidering a
34
Kevin Loquin and Didier Dubois
more genuine fuzzy least square method from scratch. This move requires further investigations of the state of the art in the interpolation area so as to find a
suitable spatial prediction technique.
2. Use probabilistic methods (such as Monte-Carlo or Gibbs sampling) to propagate uncertainty taking the form of epistemic possibility distributions (intervals
or fuzzy intervals) on variogram parameters and/or observed data. Such an idea
is at work for instance in the transformation method of Michael Hanss [28] for
mechanical engineering computations under uncertainty modelled by fuzzy sets.
The idea is to sample a probability distribution so as to explore the values of a
complex function over an uncertainty domain. In such a method the probability
distribution is just a tool for guiding the computation process. The set of obtained
results (scenarii) should not be turned into a histogram but into a range of possible outputs. The use of fuzzy sets would come down to explore a family of
nested confidence domains with various confidence values, thus yielding a fuzzy
set of possible outputs (e.g. a kriged value). The merit of this approach, recently
developed by the authors [38] is to encapsulate already existing kriging methods within a stochastic simulation scheme, the only difference with other similar
stochastic methods being the non-probabilistic exploitation of the results.
8 Conclusion
The stochastic framework of geostatistics and the ensuing kriging methodology are
criticized in the literature for three reasons:
• The purely mathematical nature of the random function setting and the attached
assumptions of stationarity and ergodicity, that are acknowledged to be nonrefutable;
• the questionable legitimacy, for local predictions, of a global index of spatial
dependence such as the variogram, that averages out local trends; of course, the
use of selected neighborhoods of measured values that change with each kriged
location point can address this issue, albeit at the expense of a loss of continuity
of the kriged surface.
• the computational burden of the kriging interpolation method and the poor interpretability of its influence coefficients.
On the first point, it seems that the choice of modeling a deterministic quantity
by a random variable does not respect the principle of parsimony. If a deterministic
model yields the same equations as the stochastic one, and moreover seems to coincide with our perception of the underlying phenomenon, the simpler model should
be preferred (this is the case with simple kriging, as shown above). And the practical test according to the best prediction should be mitigated by the appraisal of the
complexity of the used modeling framework.
On the second point, a variogram represents global information about a domain.
Here, we do face a major difficulty common to all statistical approaches. Even if
Kriging and epistemic uncertainty : a critical discussion
35
the set of observations is large over the whole domain, local predictions will have a
very poor validity if the number of observations in the vicinity of the predicted value
location is too small. This conflict between the requested precision of predicted
values and the necessity of large observation samples is pointed out by the advocates
of kriging too.
The computational burden of kriging, even if not actually so high in the simpler
versions, may pose a difficulty if epistemic uncertainty must be taken into account.
As shown in section 4, available methods that try to introduce epistemic uncertainty
into this technique seem to make it even more complex, and sometimes mathematically debatable, while by construction, they are supposed to provide imprecise
outputs. Besides, it is not so easy to relate the form of the variogram and the expressions of the kriging coefficients, and to figure out how they affect the derivatives
of the interpolated function, while one may have some prior information on such
derivatives from geological knowledge of a prescribed terrain.
Devising a spatial prediction method that could be simple enough to remain
tractable under epistemic uncertainty, and realistic enough to provide faithful information about a given terrain where some measurements are available remains a
challenging task, and an open research problem. Three lines of research have been
explored so far
• Treating fuzzy observations like complex crisp observations in a suitable metric
space: this approach is not really treating epistemic uncertainty, as discussed in
section 6.1.
• Applying fuzzy arithmetics. This is used also by Diamond when computing
the interpolation step. However, it cannot be used throughout the whole kriging
method, because there is no explicit expression of the influence weights in terms
of the variogram parameters. And would there be one, replacing scalar arithmetic
operations by fuzzy ones would lead to a considerable loss of precision.
• Using optimisation techniques as popular in the interval analysis area. This was
suggested very early by Bardossy in the fuzzy case, Dubrule and Kostov in the
interval case. But it looks already computationally intractable to study the sensitivity of the kriging estimates to variogram parameters lying in intervals via
optimisation.
The most promising line of research is to adapt the stochastic simulation methods to
the handling of fuzzy interval analysis [38]. Indeed, it would enable existing kriging methods and stochastic exploration techniques to be exploited as such. The only
difference is that the input data would be specified as representing epistemic uncertainty by nested sets of confidence intervals, and that the results of the computation
would not be interpreted as a probability distribution, but exploited levelwise to
form the fuzzy kriged values.
36
Kevin Loquin and Didier Dubois
Acknowledgements
This work is supported by the French Research National Agency (ANR) through the
CO2 program (project CRISCO2 ANR-06-CO2-003). The issue of handling epistemic uncertainty in geostatistics was raised by Dominique Guyonnet. The authors
also wish to thank Jean-Paul Chilès and Nicolas Desassis for their comments on a
first draft of this paper and their support during the project.
References
1. Armstrong M (1990) Positive Definiteness is Not Enough. Math. Geol. 24:135–143.
2. Armstrong M, Jabin R (1981) Variogram Models Must Be Positive Definite. Math. Geol.
13:455–459.
3. Aumann RJ (1965) Integrals of set-valued functions. J. Math. Anal. Appl. 12:1–12.
4. Bardossy A, Bogardi I, Kelly WE (1988) Imprecise (fuzzy) information in geostatistics. Math.
Geol. 20:287–311.
5. Bardossy A, Bogardi I, Kelly WE (1990) Kriging with imprecise (fuzzy) variograms. I: Theory. Math. Geol. 22:63–79.
6. Bardossy A, Bogardi I, Kelly WE (1990) Kriging with imprecise (fuzzy) variograms. II: Application. Math. Geol. 22:81–94.
7. Berger, JO (1980) Statistical decision theory. Springer-Verlag, Berlin.
8. Berger J (1994) An overview of robust Bayesian analysis [with Discussion]. Test 3:5–124.
9. Berger JO, de Oliveira V, Sanso B (2001) Objective Bayesian analysis of spatially correlated
data. Journal of the American Statistical Association 96:1361–1374.
10. Chilès JP, Delfiner P (1999) Geostatistics, Modeling Spatial Uncertainty. Wiley, New York,
N.Y.
11. Couso I, Dubois D (2009) On the variability of the concept of variance for fuzzy random
variables, IEEE Transactions on Fuzzy Systems, IEEE, Vol. 17 N. 5, p. 1070-1080.
12. Cressie NAC (1993) Statistics for Spatial Data, Revised Edition. New York: John Wiley &
Sons, New York, N.Y.
13. de Finetti B (1974) Theory of probability: a critical introductory treatment. John Wiley &
Sons, New York, N.Y.
14. de Oliveira V, Kedem B, Short DA (1997) Bayesian prediction of transformed Gaussian random fields. Journal of the American Statistical Association 92:1422–1433.
15. Diamond P (1988) Fuzzy least squares. Information Sciences, 46:141–157.
16. Diamond P (1988) Interval-valued random functions and the kriging of intervals. Math. Geol.
20:145–165.
17. Diamond P (1989) Fuzzy kriging. Fuzzy Sets and Systems 33:315–332.
18. Dubois D (2006) Possibility theory and statistical reasoning. Computational Statistics & Data
Analysis 51:47–69.
19. Dubois D, Foulloy L, Mauris G, Prade H (2004) Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities. Reliable Computing, 10:273–297.
20. Dubois D, Prade H (1981) Additions of interactive fuzzy numbers”, IEEE Trans. on Automatic Control, 26(4), 926–936.
21. Dubois D, Prade H (1988) Possibility Theory. Plenum Press, New York.
22. Dubrule O, Kostov C (1986) An interpolation method taking into account inequality constraints: I. Methodology. Math. Geol. 18:33–51.
23. Emery X (2003) Disjunctive kriging with hard and imprecise data. Math. Geol. 35:699–718.
24. Ferson S, Ginzburg L, Kreinovich V, Longpr L, Aviles M (2002) Computing variance for
interval data is NP-hard. SIGACT News 33:108–118.
Kriging and epistemic uncertainty : a critical discussion
37
25. Gaudard M, Karson M, Linder E, Sinha D (1999) Bayesian spatial prediction. Environmental
and Ecological Statistics 6:147–171.
26. Goovaerts P (1997) Geostatistics for Natural Resources Evaluation. Oxford Univ. Press, NewYork.
27. Handcock MS, Stein ML (1993) A Bayesian analysis of kriging. Technometrics 35:403–410.
28. Hanss M (2002) The transformation method for the simulation and analysis of systems with
uncertain parameters. Fuzzy Sets and Systems 130:277–289.
29. Hartigan JA (1969) Linear Bayesian methods. J. Royal Stat. Soc. Ser. B 31:446–454.
30. Helton JC, Oberkampf WL (2004) Alternative representations of epistemic uncertainty. Reliability Engineering and System Safety 85:1–10.
31. Hukuhara M (1967) Integration des applications measurables dont la valeur est un compact
convexe, Funkcialaj Ekvacioj, 10:205–223.
32. Journel AG, Huijbregts CJ (1978) Mining Geostatistics. New York: Academic Press.
33. Journel AG (1985) The deterministic side of geostatistics Math. Geol. 17:1–15.
34. Journel AG (1986) Geostatistics: Models and Tools for the Earth Sciences. Math. Geol.
18:119–140.
35. Journel AG (1986) Constrained interpolation and qualitative information - The soft kriging
approach. Math. Geol. 18:269–286.
36. Kostov C, Dubrule O (1986) An interpolation method taking into account inequality constraints: II. Practical approach. Math. Geol. 18:53–73.
37. Krige DG (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. Journal of the Chemical, Metallurgical and Mining Society of South Africa 52:119–
139.
38. Loquin K, Dubois D, A fuzzy interval analysis approach to kriging with ill-known variogram
and data, in preparation.
39. Mallet JL (1980) Rgression sous contraintes linaires : application au codage des variables
alatoires. Revue de Statistiques appliques 28:57–68.
40. Matheron G, Blondel F (1962) Traité de géostatistique appliquée. Editions Technip, Paris.
41. Matheron G (1969) Le krigeage universel. Cahiers du Centre de Morphologie Mathmatique
de Fontainebleau, Fasc. 1, École des Mines de Paris.
42. Matheron G (1975) Random Sets and Integral Geometry, John Wiley & Sons, New-York,
N.Y.
43. Matheron G (1978) Estimer et choisir: essai sur la pratique des probabilités. École des Mines
de Paris.
44. Matheron G (1987) Suffit-il pour une covariance d’être de type positif? Études Géostatistiques
V, Séminaire CFSG sur la Géostatistique, Fontainebleau, Sciences de la Terre Informatiques.
45. Matheron G (1989) The Internal Consistency of Models in Geostatistics. In M. Armstrong
(Ed.), Geostatistics, Proceedings of the Third International Geostatistics, Avignon, Kluwer
Academic Publishers, Dordrecht, 21–38.
46. Omre H (1987) Bayesian Kriging - merging observations and qualified guesses in kriging.
Math. Geol. 19:25–39.
47. Puri ML, Ralescu DA (1986) Fuzzy random variables. J. Math. Anal. Appl. 114:409–422.
48. Rios Insua D, Ruggieri F (2000) Robust Bayesian Analysis. Springer, Berlin.
49. Shafer G (1976) A mathematical theory of evidence. Princeton university press Princeton,
N.J.
50. Shafer G, Vovk V (2001) Probability and Finance: It’s Only a Game! New York: Wiley.
51. Srivastava R M (1986) Philip and Watson–Quo vadunt? Math. Geol. 18:141–146.
52. Taboada J, Rivas T, Saavedra A, Ordóñez C, Bastante F, Giráldez E (2008) Evaluation of the
reserve of a granite deposit by fuzzy kriging. Engineering Geol. 99:23–30.
53. Walley P (1991) Statistical reasoning with imprecise probabilities. Chapman and Hall, London.
54. Zadeh LA (1965) Fuzzy sets, Inf. Control, 8:338–353.
55. Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems,
1:3–28.