ON CONTROLLED SAMPLING SAMPLE SURVEYS

Transcription

ON CONTROLLED SAMPLING SAMPLE SURVEYS
ON CONTROLLED SAMPLING
SCHEMES WITH APPLICATIONS TO
SAMPLE SURVEYS
THESIS
SUBMITTED TO THE
KUMAUN UNIVERSITY, NAINITAL
BY
ILA PANT
FOR THE AWARD OF THE DEGREE OF
DOCTOR OF PHILOSOPHY
IN
STATISTICS
UNDER THE SUPERVISION OF
Dr. NEERAJ TIWARI
READER AND CAMPUS HEAD
DEPARTMENT OF STATISTICS
KUMAUN UNIVERSITY, S.S.J.CAMPUS, ALMORA-263601
UTTARAKHAND (INDIA)
2008
1
CERTIFICATE
This is to certify that the thesis entitled “ON CONTROLLED
SAMPLING SCHEMES WITH APPLICATIONS TO SAMPLE
SURVEYS” submitted to the Kumaun University, Nainital for the
degree of DOCTOR OF PHILOSOPHY IN STATISTICS is a record
of bonafide work carried out by Mrs. ILA PANT, under my guidance
and supervision. I hereby certify that she has completed the research
work for the full period as required in the ordinance 6. She has put in
the required attendance in the Department and signed in the
prescribed register during the period. I also certify that no part of this
thesis has been submitted for any other degree or diploma.
(Dr. Neeraj Tiwari)
Reader and Campus head,
Department of Statistics
Soban Singh Jeena Campus
Almora (Uttarakhand)
2
ACKNOWLEDGEMENT
I am very much grateful to my supervisor Dr. Neeraj Tiwari,
Reader and Campus head, Department of Statistics, Soban Singh
Jeena Campus, Almora, for excellently guiding and constantly
supporting me to do my research work on “ON CONTROLLED
SAMPLING SCHEMES WITH APPLICATIONS TO SAMPLE
SURVEYS”. I have learned a lot while working with my supervisor
Dr. Neeraj Tiwari.
I have no words to express my feelings for constant
encouragement and help I received from my loving husband Mr. Raj
Kishore Bisht.
I would like to express my gratitude to all faculty members of
the Department of Statistics, S. S. J. Campus Almora.
I am very thankful to Mr. Girish Kandpal, Mr. Virendra Joshi,
Mrs. Ruchi Tiwari, Mr. Girja Pandey, Mr. Lalit Joshi and Mr. C. M.
S. Adhikari for their co-operation in completing my research work.
I convey my sincere gratitude to my staff of the Department of
Economics and Statistics, Udham Singh Nagar for a lot of support in
the completion of my research work.
I find no words to express my feelings for constant
encouragement, blessing and inspiration, I received from my father
Mr. H. C. Pant, mother Mrs. Bimla pant, father in law Mr. D. K.
3
Bisht, elder sister Mrs. Himanshi Gunwant and younger brother Mr.
Bhaskar pant.
Help received from Banaras Hindu University (BHU) and
IASRI, Pusa New Delhi is also gratefully acknowledged.
Last but not least, I am much grateful to the God for giving me
the strength, patient and very supporting family members.
Date:
(ILA PANT)
4
CONTENTS
CHAPTERS
PAGE NO.
CHAPTER I
INTRODUCTION
1-37
1.1
Historical background of Controlled Selection
1
1.2
Controlled Selection: Concept and definition
6
1.3
Review of literature
12
1.4
Estimates of the variances
28
1.5
Frame-work of the thesis
35
CHAPTER II
ON AN OPTIMAL CONTROLLED NEAREST
PROPORTIONAL TO SIZE SAMPLING SCHEME
38-98
2.1
Introduction
38
2.2
The optimal controlled sampling design
43
2.3
Examples
55
5
CHAPTER III
TWO DIMENSIONAL OPTIMAL CONTROLLED
NEAREST PROPORTIONAL TO SIZE SAMPLING
DESIGN USING QUADRATIC PROGRAMMING
3.1
Introduction
3.2
The two dimensional optimal controlled
99-159
99
nearest proportional to size sampling design
105
3.3
Examples
111
3.4
Variance estimation for the proposed
126
plan
CHAPTER IV
ON STATISTICAL DISCLOSURE CONTROL USING
RANDOM ROUNDING AND CELL PERTURBATION
TECHNIQUES
160-211
4.1
Introduction
160
4.2
Controlled cell perturbation: The proposed methodology
167
4.3
Examples
176
6
CHAPTER V
OPTIMAL CONTROLLED SELECTION PROCEDURE
FOR SAMPLE CO-ORDINATION PROBLEM USING
LINEAR
PROGRAMMING
212-272
5.1
Introduction
212
5.2
The optimal controlled procedure
217
5.3
Examples
227
CHAPTER VI
THE APPLICATION OF FUZZY LOGIC TO THE
SAMPLING SCHEME
273-305
6.1
Introduction
273
6.2
Fuzzy logic approach
279
6.3
The proposed procedure
284
CHAPTER VII: SUMMARY
306-310
REFERENCES
311-322
7
CHAPTER I
INTRODUCTION
1.1 HISTORICAL BACKGROUND OF CONTROLLED
SELECTION
In most of the practical situations, it is not possible to collect
the information about each and every unit of the given population,
from which we have to draw some conclusions. This is because it is
very costly and time consuming to collect the information about each
and every unit of the population. Thus in these situations a device
known as sampling survey is used to draw the inferences about the
given population. Sampling survey consists of selecting a part of a
finite population, followed by making of inferences about the entire
population on the basis of the selected part. The purpose of sampling
theory is to develop methods of sample selection and of estimation
that provides estimates which are precise enough to draw the
conclusions about the given population.
8
There exist many sampling procedures in the literature, out of
which the simplest method of sampling is simple random sampling
(SRS), in which each and every unit of the population has equal
chance of being included in the sample i.e. there is no restriction in
the selection of sampling units. Thus the procedure of simple random
sampling may result in the selection of a sample which is not quite
desirable. For example suppose we have to conduct a sample survey
and we are using the procedure of SRS. Then it may happen that the
selected sampling units are not important from the point of view of
the character under study or it may happen that the selected sampling
units are geographically spread out, thereby not only increasing the
expenditure on travel and wastage of time but also affecting adversely
the supervision of the fieldwork. All these factors would seriously
affect the quality of the data collected and also the precision of the
estimate of the parameter. Hence there arises the need of developing a
suitable sampling methodology, in which some controls have imposed
to reduce the risk of above mentioned factors.
The concept of introducing controls in sampling procedures
has originated first with the procedure of stratified sampling, which
provides us the opportunity to represent all the homogeneous sub-
9
groups within a heterogeneous population. For example suppose, we
want to perform a sample survey to know the status of women in
Uttarakhand and we have to select a sample of size 4 from 13
districts. If we use SRS to select a sample of size 4, then all the 13
districts has the same chance of being selected in the sample. Thus it
may happen that the sample selected would consists of all the districts
from the hill area or all the districts from the plan area and thus could
not fulfill the purpose of the survey because in this situation the
selected sample would either represent the status of the women
belonging to only hilly areas or only to the plan areas, respectively.
Stratified sampling would be advantageous in this situation as we can
form two strata of whole population, one consisting of districts
belonging to hill areas and other consisting of districts belonging to
plan areas. Thus two districts can be selected from each stratum and
then the selected sample would more preciously represent the status of
women in Uttarakhand. In this example we observe that the sample
which consists of the districts from both the hilly and the plan areas
appears to be most precise and desirable. Such samples are termed as
“preferred” or desirable samples and all other samples are termed as
“non-preferred” or undesirable samples. Thus in this example
10
stratified sampling has been used to impose a restriction or control (in
the form of representing both the hilly and plane regions of the state)
on the selection of the sample. Systematic sampling can also be used
for achieving controls. In Systematic sampling only k preferred
samples with equal probability of selection (1/k) are left and the
remaining ( N C n − k ) non-preferred samples have zero probability of
selection. Besides stratified sampling and systematic sampling there
also exists sampling procedures which imposes one or more
restrictions on the selection of the sampling units. In fact any
departure from SRS can be considered as a control, which increases
the probability of selection of preferred combination of units and
consequently decreases the selection probability of non-preferred
combination of units.
We observe that in all these sampling procedures the selection
of the sample is partially controlled. But in many cases there may
arises the situations in which, it is necessary to further control the
selection of the sampling units. In these situations all the above
sampling procedures may not be sufficient and the sampler has to use
another method of sampling with some more restrictions (controls) on
the selection of the sampling units. Hence there arises the need of
11
developing a suitable sampling procedure which reduces the risk of
getting a non-preferred sample from the population and increases the
selection probability of preferred sample combination. The sampling
designs which eliminate or assign very small selection probability to
the non-preferred samples by imposing controls while selecting
samples, are called controlled sampling designs and the procedure of
selecting the samples using these controlled sampling designs are
known as “Controlled Selection” or “Controlled Sampling”.
The term “Controlled Selection “ or “Controlled Sampling” is
rather uncommon in the field of sample surveys, however the need of
this technique in sampling was long felt. The problem of imposing
controls while selecting samples for National samples of countries in
U. S. was discussed by Frankel and Stock (1942) and Goodman and
Kish (1950). A slight modified version of controlled selection was
adopted for more general use by Hess, Reidel and Fitzpatrick (1961)
in the selection of hospitals and patients for the data for 1961 universe
of nonfederal, short term general medical hospitals in the United
States. To further examine the relative advantages of controlled
selection, Waterton (1983) used the technique for a data on a postal
survey of Scottish school leavers carried out in 1977. In recent years,
12
there had been a lot of work in the field of controlled selection due to
its practical importance. This has been discussed in section 1.3.
1.2 CONTROLLED SELECTION: CONCEPT AND
DEFINITION
“Controlled Selection” or “Controlled Sampling” as the name
suggests, is a method of selecting the samples from the finite
population by imposing certain restrictions or controls while selecting
the population units in the sample. While selecting the sampling units,
the sampler has to keep in mind certain facts about the survey such as
the cost of the survey, time taken in the completion of the survey and
other factors related to the sample survey. While selecting the sample
it may happen that the selected sample consists of the units which are
very costly and does not fit into the budget of the survey or these units
may be spread out very far so that the completion time of the survey
may be increased. Thus the sampler wants to avoid these types of
sample combinations. For this purpose the sampler has to impose
certain restrictions in the selection of the sample. These restrictions
may be of different kind, depending upon the requirement of the
13
particular sampling plan. Due to these restrictions, some combination
of units become preferable and other combination of units becomes
non-preferable.
The technique of controlled selection is used in
sampling to minimize as far as possible the probability of selecting the
non-preferred samples, while conforming strictly to the requirements
of probability sampling. Although the concept of controlled selection
was being used by the statisticians for a long period of time, it
received considerable attention in recent years due to its practical
importance.
The controlled selection has been defined by various authors in
following manner:
According to Goodman and Kish (1950) the controlled
selection is, “Any process of selection in which, while maintaining the
assigned probability for each unit, the probabilities of selection for
some or all preferred combinations of n out of N units are larger than
in stratified random sampling (and the corresponding probabilities of
selection for at least some non-preferred combinations are smaller
than in stratified random sampling)”.
14
Wilkerson (1960) describes the controlled selection as, “The
probability selection of a sample pattern from a set of patterns, which
have been purposively established, so that, taken as a group, they give
to each primary sampling unit its proper chance of appearing in the
final sample. Each pattern is set up in accordance with controls, which
may be as rigid as desired to ensure that it satisfies selected criteria of
proper distribution”.
According to Hess and Srikantan (1966), “Controlled selection,
a technique of sampling from finite universe, permits multiple
stratification beyond what is possible by stratified random sampling,
while conforming strictly to the requirements of probability
sampling”.
Controlled selection is applicable in many other areas. Some of
the applications of controlled selection have been given in the
following subsection.
15
1.2.1 Applications of Controlled
Selection to Statistical
Problems:
The concept of controlled selection is applicable not only in the
field of sampling but this concept is related with many other fields
and thus it can be used for many purposes. The concept of controlled
selection and controlled rounding are closely related with each other
as the controlled rounding procedure can be used to solve the problem
of controlled selection. Statistical Disclosure Control (SDC) is one of
the areas where controlled rounding can be used and hence the
concept of controlled selection can also be used for the purpose of
SDC. SDC can be defined as a technique of generation and
dissemination
of
statistical
information
concerned
with
the
management of the risks of disclosing information about respondents
in a table. Since all the cells in the table are not confidential, thus we
have to protect only those cells which contains confidential
information and these cells are called sensitive cells. Thus in the
technique of SDC, we have to impose certain restrictions in the
publication of the sensitive cells so that none can get the confidential
information. Here we see that we are imposing certain restrictions in
16
the publication of sensitive cells and thus we are using the concept of
controlled selection in a different way.
Another area where controlled selection can be used is the
procedure of overlap of sampling units in two or more different
surveys. In the procedure of overlap of sampling units we either take
as many units as possible common in the different surveys (called
maximization of overlap of sampling units) or try to ignore the
common units in different surveys (called minimization of overlap of
sampling units). Since in both the procedures (i.e. the maximization
and minimization of overlap of sampling units) we have to impose
certain restrictions or controls while conducting the survey, so that we
can get maximum or minimum units common in different surveys,
thus we can say that again we have to use the concept of controlled
selection.
Again if we have some prior knowledge about the population
for which we have to conduct the sample survey, then we can use this
information for improving the efficiency of the survey. We know that
in probability proportional to size sampling (PPS), we assign the
probability to the population units according to their size, but in some
situations, it may happen that some auxiliary information related with
17
the population units is also available. This information can also be
utilized while assigning the selection probabilities to the population
units, to increase the efficiency of the survey. For utilizing the
auxiliary information, we use the concept of fuzzy logic. More
information about the population units allows imposing more
restrictions in assigning the probability to the population units, thus
the sampling procedure becomes more efficient. Here, we are again
imposing the controls while assigning the initial selection
probabilities to the population units and thus we are using the concept
of controlled selection. In the following paragraph, we are giving in
brief the basic idea of fuzzy logic and fuzzy inference system.
(a) Fuzzy Logic: The fuzzy logic is a logic, which deals with the
values, which are approximate rather than exact. The classical logic
relies on something which is either true or false. A True element is
usually assigned a value of 1 and false has a value 0. Thus, something
either completely belongs to a set or it is completely excluded from
the set. The fuzzy logic broadens this definition of classical logic. The
basis of the logic is fuzzy sets. Unlike in classical sets, where
membership is full or none, an object is allowed to belong only partly
18
to one set. The membership of an object to a particular set is described
by a real value which lies between 0 and 1. Thus, for instance, an
element can have a membership value 0.5, which describes a 50%
membership in a given set. Such logic allows a much easier
application of many problems that cannot be easily implemented
using classical approach.
(b) Fuzzy Inference System: Using fuzzy inference system, we can
utilize all the auxiliary information to find out the final results. Mat
Lab Fuzzy Logic Toolbox provides the facility of constructing Fuzzy
inference system. For this purpose, one needs to choose the Baseline
model i.e. input variables, output variables, implication method,
aggregation method and defuzzification method. Construction of rules
in fuzzy inference system is an important part. Rules can be defined
from a common knowledge about required inference procedure.
1.3
REVIEW OF LITERATURE
In this section we describe the earlier work done in different
aspects on controlled selection and its related topics. We first give in
subsection 1.3.1, the description of the work done on controlled
19
selection in one and two-dimensions. In subsection 1.3.2, we describe
the work done on controlled rounding and disclosure control and in
subsection 1.3.3 we give the work done on the sample co-ordination
problem. In the last subsection i.e. in 1.3.4, we give in brief the work
done on Fuzzy logic and other areas in which fuzzy logic had been
applied.
1.3.1 Controlled Selection in One and Two-Dimensions:
The technique of controlled selection was originally formulated
by Goodman and Kish (1950). They used the technique of controlled
selection to a specified problem of selecting twenty one primary
sampling units to represent the North-central states and found that by
the use of this technique, the between first stage unit components of
the variance were reduced from 11% to 32% below the same
components corresponding to the stratified random sampling.
Hess and Srikantan (1966) used the data for 1961 universe of
non-federal, short-term general medical hospitals in the United States
to illustrate the applications of estimation and variance formulae for
controlled selection. They pointed out some advantages that can be
20
expected from controlled selection over one way stratified random
sampling. These were:
1.
Controls may be imposed to secure proper distribution
geographically or otherwise and to ensure adequate sample size
for subgroups that are domains of study.
2. To secure moderate reduction in the sampling errors of a
multiplicity of characters simultaneously.
3.
To secure the significant reduction of the sampling error in the
global estimates of specified key variables.
In the study of Hess and Srikantan (1966), Waterton (1983)
used the data available from a postal survey of Scottish school leavers
carried out in 1977, to describe the advantages of controlled selection
and compare the efficiency of controlled selection with multiple
proportionate stratified random sampling.
Different approaches have been given by various authors to
implement controlled selection. These may be broadly classified into
three categories, namely
1. Method of typical experimental design configurations.
2. Method of emptying boxes.
21
3. Method of linear programming.
Now we shall give in brief, some of the work done by various
authors on the above three approaches.
First, let us consider the method of typical experimental design
configurations.
Chakrabarti (1963) was the first who used a balanced
incomplete block design (BIBD) with parameters v = N, b < N C n , k =
n, r and λ to construct controlled simple random sampling without
replacement design, where N is the population size and n > 2 is the
sample size and the parameters v, b, r, k, λ have their usual meanings.
The method discussed by Chakrabarti has certain limitations, as the
BIBD does not exist for many combinations of v and k. For instance,
no BIBD exist for v = 8, k = 3 with b < 8 C3 = 56 blocks. To overcome
this drawback, BIBD with repeated blocks were used by Wynn (1977)
and Foody and Hedayat (1977).
Avadhani and Sukhatme (1973) also worked on this method,
using BIBD, to minimize the probability of selecting a non-preferred
sample.
22
Gupta, Nigam and Kumar (1982) extended the idea of
experimental design configurations to obtain controlled sampling
designs with inclusion probability proportional to size (IPPS) by using
BIBD.
Nigam, Kumar and Gupta (1984) used typical configuration of
different types of experimental designs such as BIBD with or without
repeated blocks, supplemented blocks, Partially balanced incomplete
blocks and cyclic designs for obtaining controlled IPPS sampling
plans, with the property cπiπj ≤ πij ≤ πiπj for all i ≠ j = 1, 2, …, N and c
is some positive constant, 0 < c < 1.
Gupta, Srivastava and Reddy (1989) used binary incomplete
connected block designs to construct controlled IPPS sampling
designs.
Srivastava and Saleh (1985) and Mukhopadhyay and Vijayan
(1996) suggested the use of ‘t-designs’ to replace simple random
sampling without replacement (SRSWOR) designs to construct
controlled sampling designs.
The work on two dimensional controlled selection problems
was mainly due to Patterson (1954), Yates (1960) and Jessen
23
(1969,1970,1973,1975 and 1978) under the titles ‘lattice sampling’,
‘two way stratification’ and ‘multi- stratification’.
Patterson (1954) and Yates (1960) examined the case in which
the cells are of equal size and marginal constraints are integers-all
being equal in the case of squares. They also discussed the case of a
rectangle where the marginal constraints of rows/columns would be a
multiple of those for the other. They considered the method of
selection as well as the properties of such samples. Yates suggested
the name ‘lattice sampling’ for his schemes.
Jessen (1969) discussed four methods of selecting PNR
(Probability non replacement) sample and analyzed their properties.
These methods provide samples in which the probability of including
the ith element in the sample is proportional to the size of the element.
His ‘Method 2’ is superior to ‘Method 1’ in the sense that it involves
lesser steps than ‘Method 1’. ‘Method 3’ provides positive πij’s for all
element pairs, although it is complex in nature. The ‘Method 4’ is
limited to size n = 2, whereas the other methods can be used for any n.
Jessen (1970) considered the general problem of sampling
from multidimensional universe with the objective of selecting
24
samples that are repetitive of the universe in each sample’s
dimensions as well as jointly.
Jessen (1973) examined some properties of a simple two way
probability lattice sampling, i.e. selecting a set of cells of unequal
sizes where the probabilities of selection of cells are proportional to
the size of the cells and sample sizes along rows and columns are
fixed.
Jessen (1975) discussed the construction and the related
estimated problems for square and cubic lattice in the case of
‘Random Lattices’ and ‘Probability Lattices’.
Jessen (1978) summarized his earlier works and extended them
to more general situations.
The second approach, known as ‘The method of emptying
boxes’, was proposed by Hedayat and Lin (1980). They proposed the
method of emptying boxes to construct controlled IPPS sampling
plans satisfying πij > 0 and πij < πiπj for all i ≠ j = 1, 2, …, N. This
method is quite close to the decremental method of Jessen (1969).
The third approach which is used extensively in recent years is
the ‘Linear programming approach’.
25
Causey, Cox and Ernst (1985) were the first who used the
transportation model to solve two and more dimensional controlled
selection problems. Using transportation model, Causey, Cox and
Ernst (1985) developed an algorithm for controlled selection, which
completely solves the two-dimensional problem. They showed with
the help of an example that the solution of the three-dimensional
controlled selection problem does not always exists. They also
provide a method for maximizing and minimizing the overlap of
sampling units in two different surveys.
Rao and Nigam (1990, 1992) used the simplex method in linear
programming to solve the one dimensional controlled selection
problems for two different situations, namely (1) Controlled sampling
design for specified πij’s (and hence πi’s) and (2) Controlled sampling
designs for specified πi’s and hence the πij’s subject to the constraints
cπiπj ≤ πij ≤ πiπj for all i ≠ j = 1, 2,…, N, where c is some positive
constant, 0 < c < 1. Their approach provides optimal solutions and is
superior to all previous work done in this direction, however, they did
not consider two and more dimensional controlled selection problems.
Sitter
and
Skinner
(1994)
also
proposed
the
linear
programming approach by applying the ideas of Rao and Nigam, to
26
multiway stratification.
However, they did not consider controls
beyond stratification and also the computations required by their
procedure increases rapidly as the number of cells in the multiway
classification increases.
Tiwari and Nigam (1998) suggested a method for two
dimensional controlled selections using simplex method in linear
programming. Their method derives its inspiration from the optimal
controlled sampling designs of Rao and Nigam (1990, 1992). They
also proposed an alternate variance estimator for controlled selection
designs, as Horvitz-Thompson estimator could not be used to their
plan due to non fulfillment of the condition πij ≤ πiπj.
Lu and Sitter (2002) developed some methods to reduce the
amount of computation so that very large problems became feasible
using the linear programming approach.
Tiwari, et. al. (2007) proposed an optimal controlled sampling
design for one dimensional controlled selection problems. Tiwari et.
al. (2007) used the quadratic programming to obtain the optimal
controlled sampling design. This design ensures the probability of
selecting non-preferred samples exactly equal to zero, rather than
minimizing it, without sacrificing the efficiency of Horvitz-Thompson
27
estimator based on an associated uncontrolled IPPS sampling plan.
The idea of ‘nearest proportional to size sampling designs’,
introduced by Gabler (1987), is used to construct the proposed design.
For variance estimation the Yates-Grundy form of Horvitz-Thompson
estimator can be used, as the proposed procedure satisfies the
necessary and sufficient conditions required for the estimation of H-T
estimator.
1.3.2 Controlled Rounding/ Disclosure Control:
Rounding techniques involve the replacement of the original
data by multiples of a given rounding base. Rounding methods are
used for many purposes, such as for improving the readability of data
values, to control statistical disclosure in tables, to solve the problem
of iterative proportional fitting (or raking) in two-way tables and
controlled selection. Statistical disclosure control is one of the area in
which rounding methods are widely used. In the following
paragraphs, we first discuss the work done in controlled rounding and
then give some work done by different authors for disclosure control.
28
Fellegi (1975) proposed a Random rounding technique for one
dimensional table. This random rounding technique unbiasedly rounds
the cell values and also maintains the additivity of the rounded table.
Drawback of this method is that it can be applied to only one
dimensional table.
Cox and Ernst (1982) proposed a method based on
transportation model, to solve completely the controlled rounding
problem i.e. the problem of optimally rounding the real valued entries
in a two-way tabular array to adjacent integer values in a manner that
preserves the tabular (additive) structure of the array. The method
consists of the replacement of a real number a by an adjacent integer
value R(a), where R(a) equals either a  or a  +1 with a  = integer
part of a . Here the rounding base is 1. Any problem with rounding
base B can be reduced to rounding base 1 by dividing all entries by B.
If A denotes the tabular array then R(A) will be an optimal controlled
rounding of A if the p th root of the sum of p th powers of the absolute
values of the differences between the values in A and R(A) is
minimized. Also if R(a) − a < 1 then the rounding is referred to as
“zero restricted controlled rounding”.
29
Cox (1987) presented a constructive algorithm for achieving
unbiased controlled rounding in two and three dimensions, which is
simple to implement by hand for small to medium sized table.
According to Nargundkar and saveland (1972), the rounding
procedure is said to be unbiased if E ( R(a )) = a i.e. if the expected
value of the rounded entry equals to the corresponding original
(unrounded) entry of the given table. The procedure of Cox (1987) is
based on the concept of ‘alternating row-column path’ in an array.
The main drawback of the procedure of Cox (1987) is that this
procedure is somewhat arbitrary and sometimes needed a large
number of iterations for a solution. Thus without changing the basic
concept of the method of Cox (1987), Tiwari and Nigam (1993)
introduced a method for unbiased controlled rounding which
terminates in fewer steps.
As discussed earlier one of the method of achieving Statistical
Disclosure Control is controlled rounding. All the methods of
controlled rounding discussed above can be used for SDC, but there
also exist some other methods in the literature for SDC such as cell
suppression, partial cell suppression and cell perturbation etc. Cell
suppression can be defined as a method in which sensitive cells are
30
not published i.e. they are suppressed. To make sure that the
suppressed cells can not be derived by subtraction from published
marginal totals, additional cells are selected for suppression and these
cells are known as complementary suppressions. In the method of cell
suppression one has to find out the complementary suppressions in
such a way that the loss of information in minimum. Different
methods of cell suppression are discussed by various authors such as
Cox (1980), Sande (1984), Carvalho et. al. (1994) and Fischetti and
Salazar (2000) etc. Method of partial cell suppression was discussed
by Fischetti and Salazar (2003). In the method of partial cell
suppression
instead
of
wholly
suppressing
primary
and
complementary cells, some intervals obtained with the help of a
mathematical model are published for these cell entries. The loss of
information in partial cell suppression is smaller in comparison to
complete cell suppression.
In order to reduce the amount of data loss that occurs from cell
suppressions, Salazar (2005) proposed an improved method and
termed it as “Cell perturbation”. This method is closely related to the
classical controlled rounding methods and has the advantage that it
also ensures the protection of sensitive cells to a specified level, while
31
minimizing the loss of information. However, this method has some
disadvantages also. Firstly, it perturbs all the cell values resulting a
large amount of data loss. Secondly, the marginal cell values of the
resultant tables are not preserved, thereby disturbing the marginal
which are non-sensitive and expected to be published in their original
form.
1.3.3 Sample Co-Ordination Problem:
The problem of co-ordination of sampling units has been a
topic of interest for more than fifty years. Different methods have
been proposed by various authors in order to solve the sample coordination problem. Some of the methods proposed by earlier authors
are given in the following paragraph.
The first approach on sample co-ordination problem was
discussed by Keyfitz (1951). Keyfitz (1951) proposed an optimum
procedure for selecting one unit per stratum designs, when initial and
new designs have identical stratification, with only the change in
selection probability. Fellegi (1963, 1966), Gray and Platck (1963)
and Kish (1963) also proposed methods for sample co-ordination
problem but these methods are in general restricted to either two
32
successive samples or to small sample size. In order to solve the
problem in context of a larger sample size, Kish and Scott (1971)
proposed a method for sample co-ordination problem. Brever et. al.
(1972) introduced the concept of permanent random number (PRN)
for solving the sample co-ordination problem. The concept of linear
programming approach for solving the problem of sample coordination problem was first discussed by Causey et al. (1985).
Causey et. al. (1985) proposed an optimum linear programming
procedure for maximizing the expected number of sampling units
which are common to the two designs, when the two sets of sample
units are chosen sequentially. Ernst and Ikeda (1995) also presented a
linear programming procedure for overlap maximization under very
general conditions. Ernst (1996) developed a procedure for sample coordination problem, with one unit per stratum designs where the two
designs may have different stratifications. Ernst (1998) proposed a
procedure for sample co-ordination problem with no restriction on the
number of sample units per stratum, but the stratification must be
identical. Both of these procedures proposed by Ernst (1996, 1998)
uses the controlled selection algorithm of Causey, Cox and Ernst
(1985) and can be used for simultaneous as well as sequential sample
33
surveys. Ernst and Paben (2002) proposed a new methodology for
sample co-ordination problem, which is based on the procedure of
Ernst (1996, 1998). This procedure has no restriction on the number
of sample units selected per stratum and also does not require that the
two designs have identical stratification. Recently Matei and Tillé
(2006) proposed a methodology for sample co-ordination problem for
two sequential sample surveys. They proposed an algorithm, based on
iterative proportional fitting (IPF), to compute the probability
distribution of a bi-design. Their methods can be applied to any type
of sampling design for which it is possible to compute the probability
distribution for both samples.
1.3.4 Fuzzy Logic Approach:
The concept of Fuzzy logic was introduced in 1965 by Lotfi
Zadeh. Zadeh published his seminal work as "Fuzzy Sets", in which
he described the mathematics of fuzzy set theory, and by extension
fuzzy logic. In fuzzy set theory, Zadeh proposed a method of making
the membership function (or the values False and True) operates over
the range of real numbers [0.0, 1.0]. Zadeh (1965), described that if A
is a fuzzy set and x is a relevant object then the proposition “x is a
34
member of A” is not necessarily either true or false, as required by
classical logic, but it may be true only to some degree, the degree to
which x is actually a member of A.
Albert (1978) has defined some basic concept of the algebra of
fuzzy logic. Frühwirth-Schnatter (1992) have used the fuzzy data on
statistical inference and applied it to the descriptive statistics.
Frühwirth-Schnatter (1993) again used the concept of fuzzy logic in
Bayesian inference.
Doherty, Driankov and Hellendoorn (1993)
described the fuzzy if-then unless rules and their implementation in
fuzzy logic. Azmi (1993) used some statistical and mathematical tools
to define the fuzzy approach in operations research. Various authors
have used the concept of fuzzy logic in different areas, such as,
Bellman and Zadeh (1970), Dockery and Murray (1987), Biswal
(1992) and Bit, Biswal and Alam (1992) etc.
1.4 ESTIMATES OF THE VARIANCES
One of the problems which need attention when dealing with
two or more stratification variables is variance estimation. Estimating
the variance of the estimator is necessary for practical purposes.
35
Various authors have proposed different procedures for obtaining the
estimates of the variances for controlled selection designs. We discuss
some of these in the following paragraphs.
To demonstrate the utility of controlled selection in reducing
the variance of the key estimates, Goodman and Kish (1950) drew
100 samples of 17 units each, using the method of controlled selection
for the population of North Central States of U.S.A. the individual
units within the selected groups (samples) are chosen with PPS
sampling. The mean for each of the 100 samples is calculated and the
variance among those 100 means is obtained. The variance of the
stratified random selection is calculated with the help of standard
formula for stratified sampling. Thus ‘between’ components of
variances are obtained. The ‘within’ components of variances are
obtained through using the appropriate formula for simple random
sample of n cases within each of the 17 first-stage units with the
assumption that the variance within each of these units is the same.
Out of the 8 items considered by the authors, for the first four items
(i.e. items 1,2,3 and 4) the between component is a crucial part of the
total variance but for the next four items (i.e. items 5,6,7 and 8) the
between component is relatively unimportant. The authors showed
36
that for the first four items, the use of controlled selection resulted
into significant reductions in the between variances and hence a
significant reduction in the total variances. But for next four items, for
which the between components are unimportant, the reductions in
total variances are marginal. Thus in Goodman and Kish’s point of
view, it may be concluded that the items for which the between
component of the variance plays an important role, the use of
controlled selection is highly justified.
Goodman and Kish (1950) proposed the procedures for
uncontrolled high entropy (meaning the absence of any detectable
pattern or ordering in the selected sample units) plan. The expression
∧
for variance of Y HT correct to O(N-2) using the procedure of
Goodman and Kish (1950) is given as
∧
V (Y HT ) GK =
1 
2
2
2
p A − (n − 1) ∑ pi Ai 
2 ∑ i i
nN  i∈S
i∈S

2
n −1 

 
3
2
2
2
2
2
−
2∑ pi Ai − ∑ p i ∑ pi Ai − 2 ∑ pi Ai  
nN 2  i∈S
i∈S
i∈S
 i∈S
 
where Ai =
Yi
−Y
pi
N
and Y = ∑ Yi
i =1
37
Jessen (1970) proposed Horvitz-Thompson estimator and used
the expressions given by Yates and Grundy (1953) for calculating the
variance of the estimator and for unbiased estimation of the variance
in the case of probability sampling with marginal constraints.
The sampling procedure with unequal probabilities and without
replacement in which the inclusion probability of ith unit in a sample
of size n is π i =npi ( pi being the probability of selecting the ith unit of
the population at the first draw) is known as IPPS (inclusion
probability proportional to size) sampling. The estimator used in such
situations is due to Horvitz and Thompson (1952), and as such,
known as Horvitz-Thompson estimator (H-T estimator).
To estimate the population mean ( Y ) based on a sample s of
size n, the H-T unbiased estimator can be defined as
∧
Y HT = ∑
i∈s
Yi
Nπ i ,
where Yi is the value of the ith sample unit and N denotes the total
number of population units.
Sen
(1953)
and
Yates
and
Grundy
(1953)
showed
∧
independently that for fixed size sampling designs, Y HT has the
variance
38
∧
V (Y HT ) =
1
N2
N
∑ (π π
i < j =1
i
j
- π ij ) (
Yi
πi
−
Yj
πj
)2 ,
∧
and an unbiased estimator of V( Y HT ) is given as
1
2
V (Y HT ) = N
∧
∧
π i π j − π ij Yi
Yj 2
(
−
)
π ij
πi
πj
i < j =1
n
∑
where π ij denotes the inclusion probability of the units i and j.
Jessen (1973) examined the use of both H-T and YatesGrundy forms of variance estimator and found that these estimators
suffer from the drawback of providing negative estimates. Also, in
certain non-trivial circumstances the variance of these two variance
estimators is rather high. Moreover, they provide unbiased estimates
of variance only if π ij > 0 for all (i,j) in the population.
To overcome these difficulties, Jessen (1973) has suggested
the use of ‘split sample estimator’. To illustrate the expression for this
estimator, suppose that two units are selected from each row and
column of a two-way table so that the resultant sample can be split
into two parts, each containing one unit in every row and column. Let
these two parts be denoted by A and B. Then the population total can
39
be estimated by each half sample. Thus, from each half sample, we
obtain
∧
Yi
i =1 π i / 2
n/2
YA =∑
where n is sample size and π i / 2 is the probability of including ith unit
in the given half-sample. Then, a combined estimator of Y, known as
split sample estimator, is given by
∧
∧
∧
Y = (Y A + YB ) / 2
and its variance is estimated by
N
∧
∧
∧
∧
2 

Var (Y ) = (1 − n∑ pi ) (Y A − YB ) / 2


i −1
2
N
where (1 − n∑ pi 2 ) is an approximate finite population correction
i =1
factor.
The split sample estimator of Jessen (1973) is useful in the
situations where the stability condition of the H-T estimator or the
non-negativity condition of Yates-Grundy form of H-T estimator is
not satisfied. However, the Jessen’s split sample estimator is
negatively biased and biases are found to be quite high.
Tiwari and Nigam (1998) have proposed a method for variance
estimation for two dimensional controlled selection problems. To
40
describe their method, suppose two units are selected from each row
and column of an LxL array. Denoting by y i1 , y i 2 , i = 1,2,..., L , the two
observations from the ith row and let pi1 , pi 2 be the corresponding
probabilities of selection. Similarly, let y1i , y 2i , i = 1,2,..., L , be the
observations from ith column and p1i , p 2i be their corresponding
probabilities of selection. An unbiased estimator of population total is
given by
∧
L
Y = ∑(
i =1
y i1
y
y
y
+ i 2 + 1i + 2i )
2 p i1 2 pi 2 2 p1i 2 p 2i
and its variance is estimated by
N
L
∧
∧
 y

y
y
y
2
Var (Y ) = (1 − n∑ pi )(1 / 4)∑ ( i1 − i 2 ) 2 + ( 1i − 2i ) 2 
pi 2
p1i p 2i 
i =1
i =1  p i1
where N = L2.
This estimator of variance was found to be positively biased
but the bias were quite low in comparison to split sample estimator of
Jessen (1973).
Recently, Brewer and Donadio (2003) derived the πij-free
formula for high entropy variance of HT estimator. They showed that
the performance of this variance estimator, under conditions of high
entropy, was reasonably good for all populations. Their expression for
41
the variance of the HT estimator is given by
∧
V (Y HT ) BD =
1
N2
∑π
i∈S
−1
i
(1 − ci π i )(Yi π i − Y n −1 ) 2
where ci is taken from formula (18) of Brewer and Donadio (2003) as
this value of ci appears to perform better than the other values of ci
suggested by Brewer and Donadio.
1.4
FRAME-WORK OF THE THESIS
The present thesis consists of seven chapters, including the one
on introduction.
In chapter 2, using quadratic programming and the concept of
nearest proportional to size sampling design of Gabler (1987), we
have defined an optimal controlled sampling procedure for onedimensional controlled selection problems. The proposed procedure
ensures the probabilities of selecting the non-preferred samples
exactly equal to zero, rather than minimizing it. The variance
estimation for the proposed optimal controlled sampling design using
the Yates-Grundy form of Horvitz-Thompson estimator is also
discussed.
42
In chapter 3, we have extended the procedure discussed in
chapter 2, for multi-dimensional controlled selection problems. Since
it is difficult to satisfy the non-negativity condition of H-T estimator
for multi-dimensional controlled selection problems, therefore we
have defined an estimator for estimating the variance in twodimensional controlled selection problems. A random group method
has been suggested for variance estimation in two dimensional
controlled selection problems.
In chapter 4, a method is suggested for the problem of
disclosure control. Using the technique of random rounding, we have
introduced a new methodology for protecting the confidential
information of tabular data with minimum loss of information. The
tables obtained through the proposed method consist of unbiasedly
rounded values, are additive and have specified level of
confidentiality protection.
In chapter 5, we have proposed a new methodology for the
sample co-ordination problem. The proposed methodology not only
selects the sample in a controlled way but also maximizes or
minimizes the overlap of sampling units for the two sample surveys.
The two surveys can be conducted simultaneously or sequentially.
43
Variance estimation is also possible with the proposed procedure as
the proposed procedure satisfies the non-negativity condition of
Horvitz-Thompson (H-T) estimator for variance estimation and in
those situations where the non-negativity condition of HorvitzThompson (H-T) estimator is not satisfied, alternative method of
variance estimation can be used.
In chapter 6, using fuzzy logic approach, we have defined a
new methodology for assigning the initial selection probabilities to the
different population units. The proposed methodology utilizes all the
auxiliary information related with the population units, in assigning
the probability to these population units. Superiority of the proposed
procedure over the PPS sampling is also discussed.
In chapter 7, a brief description of the work done in the
preceding chapters has been given.
44
CHAPTER II
ON AN OPTIMAL CONTROLLED NEAREST
PROPORTIONAL TO SIZE SAMPLING SCHEME
2.1 INTRODUCTION
In many field situations, all the possible samples are not
equally preferable from the operational point of view, as some
samples may be undesirable due to factors such as administrative
inconvenience,
long
distance,
similarity
of
units
and
cost
considerations. Such samples are termed as non-preferred samples and
the technique for avoiding these samples, as far as possible, is known
as ‘controlled selection’ or ‘controlled sampling’. This technique,
originated by Goodman and Kish (1950) has received considerable
attention in recent years due to its practical importance.
The technique of controlled sampling is most appropriate for
sampling situations when financial or other considerations make it
necessary to select a small number of large first stage units, such as
hospitals, firms, schools etc., for inclusion in the study. The main
purpose of controlled selection is to increase probability of sampling a
45
preferred combination beyond that possible with stratified sampling,
whilst simultaneously maintaining the initial selection probabilities of
each unit of the population, thus preserving the property of a
probability sample. This situation generally arises in field experiments
where the practical considerations make some units undesirable but
the theoretical compulsions make it necessary to follow probability
sampling. Controls may be imposed to secure proper distribution
geographically or otherwise and to insure adequate sample size for
some domains (subgroups) of the population. Goodman and Kish
(1950) considered the reduction of sampling variances of the key
estimates as the principal objective of controlled selection, but they
also cautioned that this might not always be attained. Besides the
aspects of long distance, administrative inconvenience, similarity of
units and cost considerations, the need for controls may arise because
various kinds of information may be desirable from the same survey.
A real problem emphasizing the need for controls beyond
stratification was also discussed by Goodman and Kish (1950, p.354)
with the objective of selecting 21 primary sampling units to represent
the North Central States. Hess and Srikantan (1966) used the data for
1961 universe of nonfederal, short-term general medical hospitals in
46
the United States to illustrate the applications of estimation and
variance formulae for controlled selection. In his study, Waterton
(1983) used the data available from a postal survey of Scottish school
leavers carried out in 1977, to describe the advantages of controlled
selection and compare the efficiency of controlled selection with
multiple proportionate stratified random sampling.
Three different approaches have been advanced in the recent
literature to implement controlled sampling. These are (i) using
typical experimental designs configurations, (ii) method of emptying
boxes and (iii) using linear programming approaches. While some
researchers have used simple random sampling designs to construct
the controlled sampling designs, one of the more popular strategies is
the use of IPPS (inclusion probability proportional to size) sampling
designs in conjunction with Horvitz-Thompson (1952) estimator. To
introduce this strategy, we assume that a known positive quantity xi is
associated with the ith unit of the population (yi) and there is reason to
believe that yi’s are approximately proportional to xi’s. Here xi is
assumed to be known for all units in the population and yi is to be
collected for sampled units. In IPPS sampling designs πi, the
47
probability of including the ith unit in a sample of size n, is npi, where
pi is the probability of selecting ith unit in the population, given by
N
pi = xi / ∑ x j , i = 1, 2,…, N.
j =1
To construct controlled simple random sampling designs,
Chakrabarti (1963) and Avadhani and Sukhatme (1973) proposed the
use of balanced incomplete block (BIB) designs with parameters v =
N, k = n and λ. Wynn (1977) and Foody and Hedayat (1977) used the
BIB designs with repeated blocks for situations where non-trivial BIB
designs do not exist. Gupta et al. (1982) studied controlled sampling
designs with inclusion probabilities proportional to size and used
balanced incomplete block designs in conjunction with the HorvitzThompson estimator of the population total Y. Nigam et al. (1984)
used some configurations of different types of experimental designs,
including BIB designs, to obtain controlled IPPS sampling plans with
the additional property c πi πj ≤ πij ≤ πi πj for all i ≠ j = 1,…,N and
some positive constant c such that 0 < c < 1, where πi and πij denote
first and second order inclusion probabilities, respectively. Hedayat
and Lin (1980) and Hedayat et al. (1989) used the method of
‘emptying boxes’ to construct controlled IPPS sampling designs with
48
the additional property 0 < πij ≤ πi πj, i < j = 1, …, N. Srivastava and
Saleh (1985) and Mukhopadhyay and Vijayan (1996) suggested the
use of ‘t-designs’ to replace simple random sampling without
replacement (SRSWOR) designs to construct controlled sampling
designs.
All the methods of controlled sampling discussed in the
previous paragraph may be carried out by hand with varying degrees
of laboriousness, but none take advantage of the power of modern
computing. Using the simplex method in linear programming, Rao
and Nigam (1990, 1992) proposed the optimal controlled sampling
designs that minimize the probability of selecting the non-preferred
samples, while retaining certain properties of an associated
uncontrolled plan. Utilizing the approach of Rao and Nigam (1990,
1992), Sitter and Skinner (1994) and Tiwari and Nigam (1998) used
the simplex method in linear programming to solve
multi-way
stratification problems with ‘controls beyond stratification’.
In the present chapter, we use quadratic programming to
propose an optimal controlled sampling design that ensures the
probability of selecting non-preferred samples exactly equal to zero,
rather than minimizing it, without sacrificing the efficiency of
49
Horvitz-Thompson estimator based on an associated uncontrolled
IPPS sampling plan. The idea of ‘nearest proportional to size
sampling designs’, introduced by Gabler (1987), is used to construct
the proposed design. The Microsoft Excel Solver of Microsoft Office
2000 package has been used to solve the quadratic programming
problem. The applicability of Horvitz-Thompson estimator to the
proposed design is discussed. The variance of the estimate for the
proposed design has been compared with the variances of alternative
optimal controlled designs of Rao and Nigam (1990, 1992) and
uncontrolled high entropy selection procedures of Goodman and Kish
(1950) and Brewer and Donadio (2003). In Section 2.3, some
examples have been considered to demonstrate the utility of the
proposed procedure by comparing the probabilities of non-preferred
samples and variances of the estimates.
2.2 THE OPTIMAL CONTROLLED SAMPLING DESIGN
In this section, we use the concept of ‘nearest proportional to
size sampling designs’ to propose an optimal controlled IPPS
sampling design that matches the original πi values, satisfies the
50
sufficient condition πij ≤ πi πj for non-negativity of the Yates-Grundy
(1953) form of the Horvitz-Thompson (1952) estimator of the
variance and also ensures the probability of selecting non-preferred
samples exactly equal to zero.
Consider a population of N units. Suppose a sample of size n is
to be selected from this population. Let S and S1 denote respectively,
the set of all possible samples and the set of non-preferred samples.
Denoting by pi, the initial probabilities associated with the population
units, that is, the single draw selection probabilities, we obtain an
IPPS design p(s) appropriate for the set of initial probabilities under
consideration. In the present discussion, we begin with the MidzunoSen (1952, 1953) IPPS design to demonstrate our procedure, as it is
relatively easy to compute the probability of drawing every potential
sample under this scheme. However, if the conditions of Midzuno-Sen
scheme are not satisfied, we demonstrate that other IPPS sampling
without replacement procedures, such as Sampford (1967) procedure,
may also be used to obtain the initial IPPS design. In what follows,
we first describe the Midzuno-Sen IPPS scheme and then discuss the
Sampford’s design for obtaining the original IPPS design p(s).
51
2.2.1 The Midzuno-Sen and Sampford IPPS Designs:
The Midzuno-Sen (MS) (1952, 1953) scheme has a restriction
that the initial probabilities (pi’s) must satisfy the condition
1 n-1
1
.
≤ pi ≤
,
n N-1
n
∀i,
(1)
If (1) is satisfied for the sampling plan under consideration, we apply
the MS scheme to get an IPPS plan with the revised normal size
measures pi*’s given by
pi* =
n pi .
N-1
N-n
n-1
N-n
,
i = 1,2,...,N.
(2)
Now, supposing that the sth sample consists of units i1,i2,…,in, the
probability of including these units in the sth sample under MS scheme
is given by
p(s) = π i1i2 ...in =
1
( pi1* + p j2* + ... + pin * )
 N-1 


 n − 1
(3)
However, due to restriction (1), the MS plan limits the
applicability of the method to units that are rather similar in size.
Therefore, when the initial probabilities do not satisfy the condition of
MS plan, we suggest the use of Sampford (1967) plan to obtain the
initial IPPS design p(s).
52
Using Sampford’s scheme, the probability of including n units
in the sth sample is given by
n
p(s) = π i1i2 ...in = n K n λi1 λi 2 ...λi n (1 - ∑ piu )
(4)
u =1
n
where K n = (∑
t =1
tLn −t −1
) , λi = pi /(1-pi) and for a set S(m) of m≤ N
nt
different units, i1,i2,…im, Lm is defined as
L0 = 1, Lm =
∑λ λ
i1
S (m)
i2
... λim
(1≤ m ≤ N ).
2.2.2 The Proposed plan:
The idea behind the proposed plan is to get rid of the nonpreferred samples S1 by confining ourselves to the set S–S1 by
introducing a new design p0(s) which assigns zero probability of
selection to each of the non-preferred samples belonging to S1, given
by
p 0(s) =
{
p(s)
1 - ∑ p(s)
s ∈ S1
0
,
,
for s ∈ S-S1
,
otherwise
where p(s) is the initial uncontrolled IPPS sampling plan.
53
(5)
Consequently, p0(s) is no longer an IPPS design. So, applying
the idea of Gabler (1987), we are interested for the ‘nearest
proportional to size sampling design’ p1(s) in the sense that p1(s)
minimizes the directed distance D from the sampling design p0(s) to
the sampling design p1(s), defined as
=
D(p0 , p1 )
 p1

- 1

 p0

E p0
2
=
∑
s
2
p1 (s)
- 1
p0(s)
(6)
subject to the following constraints:
(i)
( ii )
p1(s) ≥ 0
∑ p (s)
s ∈ S - S1
( iii )
∑ p (s)
s∋i
(iv)
1
= 1
= πi
∑ p (s)
>
∑ p (s)
≤
s ∋ i,j
(v)
1
s ∋ i,j
1
1
(7)
0
πi π j
The ordering of the above five constraints is carried out in
accordance with their necessity and desirability. Constraints (i) and
(ii) are necessary for any sampling design. Constraint (iii), which
requires that the selection probabilities in the old and new schemes
54
remain unchanged, is the requirement for IPPS design, which ensures
that the resultant design will be IPPS. This constraint is a very strong
constraint and it affects the convergence properties of the proposed
plan to a great extent. Constraint (iv) is highly desirable because it
ensures unbiased estimation of variance. Constraint (v) is desirable as
it ensures the sufficient condition for non-negativity of the YatesGrundy estimator of variance.
The solution to the above quadratic programming problem,
viz., minimizing the objective function (6) subject to the constraints
(7), provides us with the optimal controlled IPPS sampling plan that
ensures zero probability of selection for the non-preferred samples.
The proposed plan is as near as possible to the controlled design p0(s)
defined in (5) and at the same time it achieves the same set of first
order inclusion probabilities πi, as for the original uncontrolled IPPS
sampling plan. Due to the constraints (iv) and (v) in (7), the proposed
plan also ensures the conditions πij > 0 and πij ≤ πi πj for YatesGrundy estimator of the variance to be stable and non-negative.
The distance measure D(p0 , p1 ) defined in (6) is similar to the
χ2-statistics often employed in related problems and is also used by
Cassel and Särndal (1972) and Gabler (1987). Few other distance
55
measures are also discussed by Takeuchi et al. (1983). An alternative
distance measure for the present discussion may be defined as
D( p0 , p1 ) = ∑
s
( p 0 − p1 ) 2
( p0 + p1 )
(8)
Minimization of (8) subject to the constraints (7) using
fractional programming will provide us the desired controlled IPPS
sampling plan. However, on the basis of different numerical problems
considered by us, it was found that the distance function defined in (8)
performs almost similar to (6) on convergence as well as on efficiency
point of view. Therefore, in this paper we restrict ourselves to the
distance function (6) only.
It may be noted that while all the other controlled sampling
plans discussed by earlier authors attempt to minimize the selection
probabilities of the non-preferred samples, the proposed plan
completely excludes the possibility of selecting non-preferred samples
by ensuring zero probability for them and at the same time it also
ensures the non-negativity of Yates-Grundy estimator of the variance.
However, in some situations a feasible solution to the quadratic
programming problem, satisfying all the constraints in (7), may not
exist. In such situations, some of the constraint in (7) may be relaxed.
56
The relaxation on the constraints may be carried out on the basis of
their necessity and desirability. As discussed earlier, the first three
constraints are necessary for any IPPS design, so they cannot be
relaxed. Constraint (iv) is highly desirable for unbiased estimation of
variance whereas constraint (v) ensures the sufficient condition for
non-negativity of variance estimator. Therefore, if all the constraints
are not satisfied for a particular problem, the constraint (v) may be
relaxed. This may not guarantee us the non-negativity of the YatesGrundy form of variance estimator. However, since the condition πij ≤
πi πj is sufficient for non-negativity of Yates-Grundy estimator of the
variance but not necessary for n > 2, as pointed out by Singh (1954),
there will still be a possibility of obtaining a non-negative estimator of
the variance. After relaxing the constraint (v) in (7), if the YatesGrungy estimator of the variance comes out to be negative, an
alternative variance estimator may be used. This has been
demonstrated in Example 5 in Section 3, where the constraint (v) in
(7) has been relaxed to obtain a feasible solution of the quadratic
programming problem. If even after relaxing the constraint (v), a
feasible solution of the quadratic programming problem is not found,
the constraint (iv) may also be relaxed and consequently an alternative
57
variance estimator may be considered for use. The affect of relaxing
these constraints on efficiency of the proposed design is difficult to
study. For some problems the variance is reduced after relaxing the
constraint (v) [eg., in the case of Example 2 in Section 2.3 ] while for
other problems it is increased [ eg., in the case of Example 1 in
Section 2.3 ]. This has been discussed at the end of Section 2.3.
The proposed method may also be considered superior to the
earlier methods of optimal controlled selection in the sense that
setting some samples to have zero selection probability, as in the
proposed method, is different from associating a cost with each
sample and then trying to minimize the cost, the technique used in
earlier approaches of controlled selection, which is a crude approach
giving some samples very high cost and others very low.
One limitation of the proposed plan is that it becomes
impractical when N C n is very large, as the process of enumeration of
all possible samples and formation of objective function and
constraints becomes quite tedious. This limitation also holds for the
optimum approach of Rao and Nigam (1990, 1992) and other
controlled sampling approaches discussed in Section 2.1. However,
with the advent of faster computing techniques and modern statistical
58
packages, there may not be much difficulty in using the proposed
procedure for moderately large populations. Moreover, the practical
importance of the proposed method is that it may be used to select a
small number of first-stage units from each of a large number of
strata. This involves a solution of a series of quadratic programming
problems, each of a reasonable size, provided the set of non-preferred
samples is specified separately in each stratum.
Some discussion on the convergence properties of the proposed
procedure may be desirable. As in the case of linear programming,
there is no guarantee of convergence of a quadratic programming
problem. Kuhn and Tucker (1951) have derived some necessary
conditions for optimum solution of a quadratic programming
algorithm but no sufficient conditions exist for convergence.
Therefore unless the Kuhn-Tucker conditions are satisfied in advance,
there is no way of verifying whether a quadratic programming
algorithm converges to an absolute (or global) or relative (or local)
optimum. Also, there is no way to predict in advance that the solution
of a quadratic programming problem exists or not.
59
2.2.3 Comparison of variance of the estimate:
To estimate the population mean ( Y ) based on a sample s of
size n, we use the HT estimator of Y defined as
∧
Y HT = ∑
i∈s
Yi
Nπ i
(9)
Sen (1953) and Yates and Grundy (1953) showed independently that
∧
for fixed size sampling designs Y HT has the variance
∧
V (Y HT ) =
1
N2
N
∑ (π π
i < j =1
i
j
- π ij ) (
Yi
πi
−
Yj
πj
)2
(10)
∧
and an unbiased estimator of V( Y HT ) is given as
∧
∧
V (Y HT ) =
1
N2
π i π j − π ij Yi
Yj 2
(
−
)
π ij
πi
πj
i < j =1
n
∑
(11)
As discussed in Section 2.2, the proposed optimal controlled
plan satisfies the sufficient condition for Yates-Grundy estimator of
variance to be non-negative. Therefore, the non-negativity of the
variance estimator (11) is also ensured by the proposed plan.
To demonstrate the utility of the proposed procedure in terms
of precision of the estimate, the variance of estimate for the proposed
procedure obtained through (10) is compared with variance of the HT
60
estimator using the optimal controlled plan of Rao and Nigam (1990,
1992). Moreover, these variances are also compared with those of two
uncontrolled high entropy (meaning the absence of any detectable
pattern or ordering in the selected sample units) procedures of
Goodman and Kish (1950) and Brewer and Donadio (2003). In what
follows, we reproduce the expressions for the variances of these two
high entropy procedures.
∧
The expression for variance of Y HT correct to O (N-2) using the
procedure of Goodman and Kish (1950) is given as
∧
V (Y HT ) GK =
1 
2
2
2
p A − (n − 1) ∑ pi Ai 
2 ∑ i i
nN  i∈S
i∈S

n −1 
3
2
2
−
2∑ pi Ai − ∑ pi
2
nN  i∈S
i∈S
where Ai =


2
pi Ai − 2 ∑ pi Ai 
∑
i∈S
 i∈S

2
2
2



(12)
N
Yi
− Y and Y = ∑ Yi .
pi
i =1
Recently, Brewer and Donadio (2003) derived the πij-free
formula for high entropy variance of HT estimator. They showed that
the performance of this variance estimator, under conditions of high
entropy, was reasonably good for all populations. Their expression for
the variance of the HT estimator is given by
61
∧
V (Y HT ) BD =
1
N2
∑π
i∈S
−1
i
(1 − ci π i )(Yi π i − Y n −1 ) 2
(13)
where ci is taken from formula (18) of Brewer and Donadio (2003) as
this value of ci appears to perform better than the other values of ci
suggested by Brewer and Donadio.
2.3 EXAMPLES
In this section, we consider some numerical examples to
demonstrate the utility of the proposed procedure and compare it with
the existing procedures of optimal controlled sampling. The variance
of the estimate under proposed plan is also compared with the existing
procedures of optimal controlled selection and uncontrolled high
entropy selection procedures.
Example 1: Let us consider a population consisting of six villages,
borrowed from Hedayat and Lin (1980) and we have to draw a sample
of 3 villages from these 6 villages. The set S of all possible samples
consists of 20 samples each of size n = 3 and are given as follows
123;
124;
125;
126;
134;
135;
136;
145;
146;
156;
234;
235;
236;
245;
246;
256;
345;
346;
356;
456;
62
Due to the considerations of travel, organization of fieldwork
and cost considerations, Rao and Nigam (1990) identified the
following 7 samples as non-preferred samples:
123;
126;
136;
146;
234;
236;
246;
(a): Let the Yi and pi values associated with the six villages of the
population are:
Yi :
12
15
17
24
17
19
pi :
0.14
0.14
0.15
0.16
0.22
0.19
Since the pi values satisfy the condition (1), we apply MS scheme to
get an IPPS plan with the revised normal size measures pi*’s. Using
(2), the values of pi*’s are given as follows
p1*’s=.0333;
p2*’s=.0333;
p3*’s=.0833;
p4*’s=.1333;
p5*’s=.4333;
p6*’s=.2833;
Now we calculate the values of p(s), which gives the
probability of including the units i, j, k in the sample. Since we are
applying the MS scheme, thus using (3) we get the values of p(s) as
follows.
p(s1) = .015;
p(s2) = .020;
p(s3)= .05;
p(s4) = .035;
p(s5) = .025;
p(s6)= .055;
p(s7) = .04;
p(s8) = .060;
p(s9) = .045;
p(s10) =.075;
p(s11) = .025;
p(s12) =.055;
p(s13) = .04;
p(s14) = .06;
p(s15) = .045;
p(s16) = .075;
63
p(s17) = .065;
p(s18) = .05;
p(s19) = .080;
p(s20) = .085;
Now in order to assigns the zero probability of selection to
each of the non-preferred sample belonging to S1, we use the design
p0(s) given by (5). After assigning the zero probability to nonpreferred samples, values of p0(s) for preferred sample combinations
are given as follows.
p0(s1)=.0265;
p0(s2)=.0663;
p0(s3)=.0332;
p0(s4)=.07285;
p0(s5)=.0795;
p0(s6)=.0994;
p0(s7)=.0729;
p0(s8)=.0795;
p0(s9)=.0993;
p0(s10)=.0861;
p0(s11)=.0662;
p0(s12)=.1060;
p0(s13)=.1126;
Since p0(s) is no longer an IPPS design, thus we apply the proposed
model to minimize the directed distance D from the sampling design
p0(s) to the sampling design p1(s) and also to satisfy the constraints in
model (7). The proposed model is given as follows
Minimize z = 37.75*p1(s)^2+15.1*p2(s)^2+30.2*p3(s)^2+13.7*p4(s)^2
+12.58*p5(s)^2+10.06*p6(s)^2+13.72*p7(s)^2+12.58*p8(s)^2+10.06*
p9(s)^2+11.61*p10(s)^2+15.1*p11(s)^2+9.43*p12(s)^2+8.88*p13(s)^2-1
Subject to the constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+
p11(s)+p12(s)+p13(s) = 1
64
2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s) = 0.42
3. p1(s)+p2(s)+p7(s)+p8(s)+p9(s) = 0.42
4. p3(s)+p4(s)+p7(s)+p10(s)+p11(s)+p12(s) = 0.45
5. p1(s)+p3(s)+p5(s)+p8(s)+p10(s)+p11(s)+p13(s) = 0.48
6. p2(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p12(s)+p13(s)=
0.66
7. p6(s)+p9(s)+p11(s)+p12(s)+p13(s) = 0.57
8. p1(s)+p2(s)≤ 0.1764
(14)
9. p3(s)+p4(s) ≤ 0.189
10. p1(s)+p3(s)+p5(s) ≤ 0.2016
11. p2(s)+p4(s)+p5(s)+p6(s) ≤ 0.2772
12. p6(s) ≤ 0.2394
13. p7(s) ≤ 0.189
14. p1(s)+p8(s) ≤ 0.2016
15. p2(s)+p7(s)+p8(s)+p9(s) ≤ 0.2772
16. p9 (s) ≤ 0.2394
17. p3(s)+p10(s)+p11(s) ≤ 0.216
18. p4(s)+p7(s)+p10(s)+p12(s) ≤ 0.297
19. p11(s)+p12(s) ≤ 0.2565
20. p5(s)+p8(s)+p10(s)+p13(s) ≤ 0.3168
21. p11(s)+p13(s) ≤ 0.2736
22. p6(s)+p9(s)+p12(s)+p13(s) ≤ 0.3762
23. pi ( s ) ≥ 0 ,
i = 1,2,…,13.
24. π ij ≥ 0 for ∀ i ≠ j = 1,2,...,13.
65
After solving the above quadratic programming problem
through the Microsoft Excel Solver of Microsoft Office 2000
package, we obtain the following controlled IPPS plan given in
Table1. The value of D(p0 , p1 ) comes out to be 0.997349. This plan
matches the original πi values, satisfies the condition πij ≤ πi πj and
ensures the probability of selecting non-preferred samples exactly
equal to zero. Obviously, due to the fulfillment of the condition πij ≤ πi
πj, we can apply the Yates-Grundy form of HT estimator for
estimating the variance of the proposed plan.
Table 1
Optimal controlled IPPS plan corresponding to Midzuno-Sen (MS)
and Sampford’s (SAMP) schemes for Example 1
s
p1(s)
[MS]
p1(s)
[SAMP]
s
p1(s)
[MS]
p1(s)
[SAMP]
124
.142800
.085500
245
.029996
.123331
125
.033600
.049500
256
.126206
.139478
134
0
0
345
.018801
.058944
135
.087301
.026864
346
.197199
.104500
145
.030104
.063669
356
.059301
.057500
156
.126195
.074467
456
.061099
.164056
235
.087398
.052191
66
We also solved the above example using plan (3) with
specified πij’s taken from Sampford’s plan [to be denoted by RN3]
and plan (4) [to be denoted by RN4] of Rao and Nigam (1990). Using
RN3 the probability of non- preferred samples (φ) comes out to be
0.155253 and using RN4 with c = .005, φ comes out to be zero,
whereas the proposed plan always ensures zero probability to nonpreferred samples.
∧
The value of V( Y HT ) for the proposed plan, RN3 plan, RN4
plan, the Randomized Systematic IPPS sampling plan of Goodman
and Kish (1950) [to be denoted by GK] and uncontrolled high entropy
sampling plan of Brewer and Donadio (2003) [to be denoted by BD ]
are produced in the first row of Table 2. It is clear from Table 2 that
the proposed plan yields almost same value of variance of the H-T
∧
estimator as yielded by RN4. The value of V ( Y HT ) for the proposed
plan is slightly higher than those obtained from RN3, GK and BD.
This increase in variance may be acceptable given the elimination of
undesirable samples by the proposed plan.
67
Table 2
∧
Values of V ( Y HT ) for the Proposed, RN3, RN4, GK and BD plans
∧
V ( Y HT )
Ex 1(a)
N=6, n=3
Ex 1(b)
N=6, n=3
Ex 2(a)
N=7, n=3
Ex 2(b)
N=7, n=3
Ex 3(a)
N=8, n=3
Ex 3(b)
N=8, n=3
Ex 4(a)
N=8, n=4
Ex 4(b)
N=8, n=4
Ex 5
N=7, n=4
RN3
RN4
GK
BD
2.9303
4.0241
3.0336
2.9186
Proposed
Plan
4.0570
4.7574
5.0690
4.8945
4.1540
4.7842
4.4759
5.0085
4.6105
4.4471
3.5635
11.9668
14.5196
12.2502
11.4426
9.4890
4.8539
4.2893
4.9573
4.8364
3.9023
7.2924
8.4286
7.7350
7.3716
8.1676
3.1854
3.4631
3.2266
3.1538
3.7450
2.4094
2.5266
2.5441
2.3845
2.2545
3.0756
3.9294
3.1215
3.0746
5.0996
(b): Now suppose that the initial pi values for the above population of
6 units are as follows:
pi :
0.10
0.15
0.10
0.20
0.27
0.18
Since these values of pi do not satisfy the condition (1) of
Midzuno-Sen plan, we apply Sampford (1967) plan to get the initial
p(s) values. Applying the method discussed in Section 2.2 and solving
the resultant quadratic programming problem, we obtain the
68
controlled IPPS plan given in Table 1. This plan again ensures zero
probability to non- preferred samples and satisfies the non-negativity
condition for Yates-Grundy form of HT variance estimator.
This example was also solved by RN3 and RN4 plans. The
value of φ for RN3 plan comes out to be 0.064135 and the value of φ
for RN4 with c= .005 comes out to be zero. However, the proposed
plan ensures zero probability to non-preferred samples.
∧
The values of V( Y HT ) for the proposed plan, RN3 plan, RN4
plan, GK plan and BD plan are produced in the second row of Table
2. The proposed plan appears to perform better than RN4 and GK and
quite close to other plans considered by us for this problem.
Example 2: We consider the following population borrowed from
Avadhani and Sukhatme (1973) consisting of seven villages. There
are 35 possible samples, each of size n = 3. For reasons of travel and
organization of fieldwork, Avadhani and Sukhatme (1973) considered
the following 7 samples as non-preferred in addition to 7 nonpreferred samples considered in Example 1:
137;
147;
167;
237;
69
247;
347;
467.
(a): Suppose that the values of Yi and their corresponding initial
probabilities of selection associated with the seven villages of the
population are as follows:
Yi:
12
15
17
24
17
19
25
pi :
0.12
0.12
0.13
0.14
0.20
0.15
0.14.
Since the pi values satisfy the condition (1), we apply MS
scheme to get an IPPS plan with the revised normal size measures
pi*’s, given by (2).
Applying the method discussed in Section 2.2, we obtain the
following controlled IPPS plan given in Table 3. This plan again
matches the original πi values, satisfies the condition πij ≤ πi πj and
ensures the probability of selecting non-preferred samples exactly
equal to zero.
Solving the above problem using RN3 the probability of nonpreferred samples (φ) comes out to be 0.064972 and using RN4 with
c= 0.5, φ comes out to be zero, whereas the proposed plan always
ensures zero probability to undesirable samples.
70
Table 3
Optimal controlled IPPS plan corresponding to Midzuno-Sen (MS)
and Sampford’s (SAMP) schemes for Example 2
s
124
125
127
134
135
145
156
157
235
245
256
p1(s)
[MS]
.054498
.024684
.050419
.060546
.038670
.036156
.050303
.044725
.037890
.042312
.049417
p1(s)
[SAMP]
.014364
.018896
.017956
.034880
.037400
.024912
.061530
.030061
.035531
.023802
.058697
s
257
267
345
346
356
357
367
456
457
567
p1(s)
[MS]
.024245
.076536
.024244
.079009
.017499
.053149
.078992
.064772
.058462
.033472
p1(s)
[SAMP]
.028750
.042004
.044368
.076730
.104590
.052435
.094066
.074300
.036644
.088083
∧
The value of V( Y HT ) for the proposed plan, RN3 plan, RN4
plan, GK plan and BD plan are produced in the third row of Table 2.
The proposed plan appears to perform better than the other plans
considered by us for this problem.
(b): Now consider the pi values for the 7 population units as follows:
pi :
0.08 0.08
0.16
0.11
0.24
0.20
0.13
Since these pi values do not satisfy the condition (1) of MS plan, we
apply Sampford (1967) plan to get the initial p(s) values.
Applying the method discussed in Section 2.2 and solving the
resultant quadratic programming problem, we obtain the controlled
71
IPPS plan given in Table 3. This plan again ensures zero probability
to non- preferred samples and satisfies the non-negativity condition
for Yates-Grundy form of HT variance estimator.
This example was also solved by RN3 and RN4 plans. The
value of φ for RN3 plan comes out to be 0.04511 and the value of φ
for RN4 with c = 0.5 comes out to be zero. However, the proposed
plan ensures zero probability to non-preferred samples.
∧
The values of V( Y HT ) for the proposed plan, RN3 plan, RN4
plan, GK plan and BD plan are produced in the fourth row of Table 2.
The proposed plan appears to perform better than the other plans
considered by us for this problem.
Example 3: We now consider a population with N = 8 and n = 3,
borrowed from Rao and Nigam (1990). The set of all possible samples
S contains 56 samples. Based on the considerations similar to those of
Avadhani and Sukhatme (1973), Rao and Nigam (1990) considered
the following 7 samples as non-preferred in addition to the 14 nonpreferred samples considered in Example 2:
128; 178; 248; 458; 468; 478; 578.
72
(a): Suppose the following Yi and pi values are associated with the
eight villages of the population:
Yi :
12
15
17
24
17
19
25
18
pi : 0.10
0.10
0.11
0.12
0.18
0.13
0.12
0.14
Since the pi values satisfy the condition (1), we apply the MS scheme
to get an IPPS scheme with revised normal size measures pi*’s
computed from (2).
Table 4
Optimal controlled IPPS plan corresponding to Midzuno-Sen (MS)
and Sampford’s (SAMP) schemes for Example 3
s
124
125
127
134
135
138
145
148
156
157
158
168
235
238
245
256
257
258
p1(s)
[MS]
.020748
.020145
.015713
.017626
.021201
.017288
.023252
.046374
.022315
.033000
.031268
.031070
.017266
.016833
.042378
.017710
.028683
.030255
p1(s)
[SAMP]
.020605
.003443
.012351
.004485
.020702
.017661
.001291
.041119
.001099
.005024
.010069
.012151
.029730
.053539
.034869
0
.005300
.007658
73
s
267
268
278
345
346
348
356
357
358
367
368
378
456
457
567
568
678
p1(s)
[MS]
.024882
.030459
.034927
.020322
.039436
.041417
.017750
.029863
.013255
.027937
.021454
.028354
.048615
.059832
.031326
.031564
.045482
p1(s)
[SAMP]
.017902
.031297
.053306
.016724
.094625
.154166
.015496
.043015
.008111
.051224
.014276
.076247
.031530
.050585
0
.015354
.045045
Applying the method discussed in Section 2.2 and solving the
resulting quadratic programming problem, we obtain the optimal
controlled IPPS plan demonstrated in Table 4. This plan also matches
the original πi values, satisfies the condition πij ≤ πi πj and excludes
the possibility of selecting the non-preferred samples.
Using RN3 the probability of non- preferred samples φ comes
out to be 0.121614 and using RN4 with c= .005, φ comes out to be
zero, whereas the proposed plan always ensures zero probability to
non- preferred samples.
∧
The values of V( Y HT ) for the proposed plan, RN3 plan, RN4
plan, GK plan and BD plan are produced in the fifth row of Table 2.
The proposed plan appears to perform better than the other plans
considered by us for this problem.
(b): Now suppose that the initial pi values for the above population of
8 units are as follows:
pi :
0.05
0.09
0.20
0.15
0.10
0.11
0.12
0.18
Since these values of pi do not satisfy the condition (1) of MS plan, we
apply Sampford (1967) plan to get the initial p(s) values.
74
Applying the method discussed in Section 2.2, we obtain the
controlled IPPS plan given in Table 4. This plan again ensures zero
probability to non- preferred samples and satisfies the non-negativity
condition for Yates-Grundy form of HT variance estimator.
This example was also solved by RN3 and RN4 plans. The
value of φ for RN3 plan comes out to be 0.166792 and the value of φ
for RN4 with c= .005 comes out to be zero. However, the proposed
plan again ensures zero probability to non-preferred samples.
∧
The values of V( Y HT ) for the proposed plan, RN3 plan, RN4
plan, GK plan and BD plan are produced in the sixth row of Table 2.
The proposed plan appears to perform better than RN4 and quite close
to other plans considered by us for this problem.
Example 4: Now we reconsider the population of 8 units and suppose
that a sample of size n = 4 is to be selected from this population. The
set of all possible samples contain 70 samples. Based on the
considerations similar to those of Avadhani and Sukhatme (1973),
suppose that the following 28 samples are non-preferred for reasons
of travel and organization of field work:
1234;
1236;
1238;
1246;
75
1248;
1268;
1346;
1348;
1357;
1456;
1468;
1567;
1568;
1678;
2345;
2346;
2456;
2468;
2567;
2568;
2678;
3456;
3468;
3567;
3678;
4567;
4678;
5678.
(a): Suppose that the following pi values are associated with the eight
villages of the population, the Yi’s being the same as considered for
Example 3:
pi:
0.11
0.11
0.12
0.13
0.17
0.12
0.11
0.13
As the pi values satisfy the condition (1), we apply the MS
scheme to obtain the initial p(s) values. Using the method discussed in
Section 2.2, the optimal controlled IPPS plan is demonstrated in Table
5. This plan again matches the original πi values, satisfies the
condition
πij ≤ πi πj and the probability of selecting non-preferred
samples is reduced to zero.
Using RN3 the probability of non- preferred samples (φ) comes
out to be 0.049625 and using RN4 plan with c = .005, φ comes out to
be zero, whereas the proposed plan always ensures zero probability to
non- preferred samples.
∧
The values of V( Y HT ) for the proposed plan, RN3 plan, RN4
plan, GK plan and BD plan are given in the seventh row of Table 2.
76
∧
The values of V( Y HT ) for the proposed plan is slightly higher than
the other plans considered by us for this problem. This may be
acceptable due to elimination of non-preferred samples by the
proposed plan.
Table 5
Optimal controlled IPPS plan corresponding to Midzuno-Sen (MS)
and Sampford’s (SAMP) schemes for Example 4
s
1235
1237
1245
1247
1256
1257
1258
1267
1278
1345
1347
1356
1358
1367
1368
1378
1457
1458
1467
1478
1578
p1(s)
[MS]
.002827
.003892
.043684
.004447
.059600
.009386
.008253
.016110
.011690
.042009
.007307
.024070
.004769
.018232
.018501
.011333
.039377
.042688
.035251
.014037
.022538
p1(s)
[SAMP]
0
0
.008430
0
.009182
.029193
0
.036769
.015483
.030784
0
.048805
0
.061419
.021215
.049737
.011047
0
.024208
0
.013725
77
s
2347
2348
2356
2357
2358
2367
2368
2378
2457
2458
2467
2478
2578
3457
3458
3467
3478
3568
3578
4568
4578
p1(s)
[MS]
.002772
.010933
.051364
.024836
.017733
.020084
.035003
.018755
.012706
.027376
.029039
.002203
.027308
.016542
.014911
.016465
.043945
.046681
.027036
.109600
.004708
p1(s)
[SAMP]
0
.019388
.051644
.039983
0
.057759
.023368
.005639
.015937
.003187
.022877
.003298
.017862
.042995
.010210
.108967
.069871
.024985
.053230
.068800
0
(b): Now suppose that the initial pi values for the above population of
8 units are as follows:
pi :
0.09
0.09
0.18
0.11
0.12
0.14
0.17
0.10
Since these values of pi do not satisfy the condition (1) of MS plan, we
apply Sampford (1967) plan to get the initial p(s) values.
Applying the method discussed in Section 2.2, we obtain the
controlled IPPS plan given in Table 5. This plan again ensures zero
probability to non- preferred samples and satisfies the non-negativity
condition for Yates-Grundy form of HT variance estimator.
This example was also solved by RN3 and RN4 plans. The
value of φ for RN3 plan comes out to be 0.13128 and the value of φ
for RN4 plan with c = .005 comes out to be zero. However, the
proposed plan again ensures zero probability to non-preferred
∧
samples. The value of V( Y HT ) for the proposed plan, RN3 plan, RN4
plan, GK plan and BD plan are produced in the 8th row of Table 2.
The proposed plan appears to perform better than the other plans
considered by us for this problem.
Example 5: We now consider one more example to demonstrate the
situation where the proposed plan fails to provide a feasible solution
78
satisfying all the constraints in (7). In such situations, we have to drop
a constraint in (7) to obtain a feasible solution of the related quadratic
programming problem.
Consider a population of seven villages. Suppose a sample of
size n = 4 is to be drawn from this population. There are 35 possible
samples, out of which the following 14 are considered as nonpreferred:
1234; 1236; 1246; 1346; 1357; 1456; 1567;
2345; 2346; 2456;
2567; 3456; 3567; 4567.
Suppose that the following pi values are associated with the 7 villages:
pi:
0.14
0.13
0.15
0.13
0.16
0.15
0.14.
Since the pi values satisfy the condition (1), we apply the MS
plan and solve the quadratic programming problem by the method
discussed in Section 2.2. However, no feasible solution of the related
quadratic programming problem exists in this case. Consequently, we
drop the constraint (v) in (7) for this particular problem to obtain a
feasible solution of the quadratic programming problem. The
controlled IPPS plan obtained for this problem is given in Table 6.
This plan also matches the original πi values and ensures the
probability of selecting the non-preferred samples exactly equal to
zero. However, due to non-fulfillment of the condition πij ≤ πi πj for
79
this example, the non-negativity of the Yates-Grundy estimator of the
variance is not ensured.
Table 6
Optimal Controlled IPPS Plan Corresponding to Midzuno-Sen
Scheme of Example 5
s
1235
1245
1247
1256
1257
1267
1345
p1(s)
.02985
.050183
.012319
.083296
.01466
.030735
.067399
s
1347
1356
1367
1457
1467
2347
2356
p1(s)
.018881
.104753
.041471
.047525
.058928
.021368
.104519
s
2357
2367
2457
2467
3457
3467
p1(s)
.026409
.044308
.047515
.054839
.063891
.077151
Solving this problem using RN3 the probability of nonpreferred samples φ comes out to be 0.297746 and using RN4 with c=
0.5, φ comes out to be 0.1008, whereas the proposed plan ensures zero
probability to non- preferred samples.
∧
The values of V( Y HT ) for the proposed plan, RN3 plan, RN4
plan, the GK plan and BD plan are produced in the last row of Table
∧
2. The value of V( Y HT ) for the proposed plan does not appears to be
satisfactory for this problem due to the non-fulfillment of the
80
condition πij ≤ πi πj. For this type of problems, we suggest the use of
some another estimator in place of HT estimator.
To check the affect of dropping the constraint (v) in (7) on
efficiency of the estimator, we have also solved Example 1 and
Example 2 without this constraint. These values are obtained as
4.30617 and 4.81743 for Example 1 (a) and (b) respectively and
3.51507 and 8.82203 for Example 2 (a) and (b) respectively. This
∧
shows that while for Example 1, the values of V (Y HT ) without the
∧
constraint (v) are greater than the corresponding value of V (Y HT ) with
this constraint, they are lesser than the corresponding value of
∧
V( Y HT ) with this constraint in the case of Example 2.
81
APPENDIX 2.0
Example 1(b): Here we have N=6 and n=3. Values of Yi and pi are
given as follows:
Yi :
12
pi : 0.10
15
17
24
17
19
0.15
0.10
0.20
0.27
0.18
All possible combinations are as follows:
123;
124;
125;
126;
134;
135;
136;
145;
146;
156;
234;
235;
236;
245;
246;
256;
345;
346;
356;
456;
Suppose following 6 sample combinations are considered as
non-preferred sample combinations:
123;
126;
136;
146;
234;
236;
246.
To find out the initial p(s) values, we apply the Sampford
(1967) plan and get the following values of p(s) (only of preferred
sample combinations).
p(s1) = .0189;
p(s2) = .0469;
p(s3) = .011;
p(s4) = .0271;
p(s5) = .077;
p(s6) = .063;
p(s7) = .0469;
p(s8) = .1299;
p(s9) = .107;
p(s10) =.077;
p(s11) = .0256;
p(s12) =.063;
p(s13) = .1717.
Now to assign the zero probability of selection to non-preferred
samples we use the design p0(s) and get the values as follows
p0(s1)=.0218;
p0(s2)=.0542;
p0(s3)=.0124;
p0(s4)=.0313;
p0(s5)=.089;
p0(s6)=.0729;
p0(s7)=.054;
p0(s8)=.1501;
p0(s9)=.1237;
p0(s10)=.089;
p0(s11)=.0296;
p0(s12)=.0729;
p0(s13)=.1984.
82
After getting the values of p0(s), we apply the proposed model.
The objective function and the constraints for this example are given
as follows
Minimize z = 45.75*p1(s)^2+18.44*p2(s)^2+80.07*p3(s)^2+31.89*p4
(s)^2+11.23*p5(s)^2+13.71*p6(s)^2+18.44*p7(s)^2+6.65*p8(s)^2+8.0
8*p9(s)^2+11.23*p10(s)^2+33.73*p11(s)^2+13.71*p12(s)^2+5.03*p13(s
)^2-1
Subject to the constraints defined in (14), with the change in the
values of right hand side as follows
1, 0.3, 0.45, 0.3, 0.6, 0.81, 0.54, 0.135, 0.09, 0.18, 0.243, 0.162,
0.135, 0.27, 0.3645, 0.243, 0.18, 0.243, 0.162, 0.486, 0.324, 0.4374.
After solving the above model, we get the desired results
displayed in table 1, with the value of the D(p0 , p1 ) as .42146.
Example 2(a): Let us consider a population consisting of 7 villages
and we have to draw a sample of 3 villages. The set S of all possible
samples consists of 35 samples and are given as follows
123;
124;
125;
126;
127;
134;
135;
136;
137;
145;
146;
147;
156;
157;
167;
234;
235;
236;
237;
245;
246;
247;
256;
257;
267;
345;
346;
347;
356;
357;
367;
456;
457;
467;
567.
Suppose the non-preferred sample combinations are:
123;
126;
136;
137;
146;
237;
246;
247;
347;
467;
147;
167;
234;
236;
and the values of Yi and pi are:
Yi :
12
15
17
24
17
19
25
pi :
.12
.12
.13
.14
.20
.15
.14
83
Since the pi values satisfy the condition (1), we apply MS
scheme to get an IPPS plan with the revised normal size measures
pi*’s. Using (2), the values of pi*’s are given as follows
p1*’s=.04;
p2*’s=.04;
p3*’s=.085;
p5*’s=.4;
p6*’s=.175;
p7*’s=.13.
p4*’s=.13;
The values of p(s) for preferred sample combinations are given as
follows
p(s1) = .014;
p(s2) = .032;
p(s3) = .014;
p(s4) = .017;
p(s5) = .035;
p(s6) = .038;
p(s7) = .041;
p(s8) = .038;
p(s9) = .035;
p(s10) =.038;
p(s11) = .041;
p(s12) =.038;
p(s13) = .023;
p(s14) =.041;
p(s15) = .026;
p(s16) = .044;
p(s17) =.041;
p(s18) = .026;
p(s19) =.047;
p(s20) = .044;
p(s21) = .047.
Now to assign zero probability of selection to non-preferred
samples we use the design p0(s) and get the values as follows
p0(s1)=.0194;
p0(s2)=.0444;
p0(s3)=.0194;
p0(s4)=.0236;
p0(s5)=.0486;
p0(s6)=.0527;
p0(s7)=.0569;
p0(s8)=.0527;
p0(s9)=.0486;
p0(s10)=.0527;
p0(s11)=.0569;
p0(s12)=.0527;
p0(s13)=.0319;
p0(s14)=.0569;
p0(s15)=.0361;
p0(s16)=.0611;
p0(s17)=.0569;
p0(s18)=.0361;
p0(s19)=.0652;
p0(s20)=.0611
p0(s21)=.0652.
After getting the values of p0(s), we apply the proposed model.
The objective function and the constraints for this example are given
as follows
84
Minimize z = 51.42*p1(s)^2+22.5*p2(s)^2+51.42*p3(s)^2+42.35*p4(s
)^2+20.57*p5(s)^2+18.94*p6(s)^2+17.56*p7(s)^2+18.94*p8(s)^2+20.
57*p9(s)^2+18.94*p10(s)^2+17.56*p11(s)^2+18.94*p12(s)^2+31.3*p13(
s)^2+17.56*p14(s)^2+27.69*p15(s)^2+16.36*p16(s)^2+17.56*p17(s)^2+
27.69*p18(s)^2+15.31*p19(s)^2+16.36*p20(s)^2+15.31* p21(s)^2-1
Subject to the constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(
s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+
p21(s) = 1
2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s) = 0.36
3. p1(s)+p2(s)+p3(s)+p9(s)+p10(s)+p11(s)+p12(s)+p13(s) = 0.36
4. p4(s)+p5(s)+p9(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s) = 0.39
5. p1(s)+p4(s)+p6(s)+p10(s)+p14(s)+p15(s)+p19(s)+p20(s) = 0.42
6. p2(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+p12(s)+p14(s)+
p16(s)+p17(s)+p19(s)+p20(s)+p21(s) = 0.6
7.
p7(s)+p11(s)+p13(s)+p15(s)+p16(s)+p18(s)+p19(s)+p21(s) = 0.45
8. p3(s)+p8(s)+p12(s)+p13(s)+p17(s)+p18(s)+p20(s)+p21(s) = 0.42
9. p1(s)+p2(s)+p3(s) ≤ 0.1296
10. p4(s)+p5(s)≤0.1404
(15)
11. p1(s)+p4(s)+p6(s) ≤ 0.1512
12. p2(s)+p5(s)+p6(s)+p7(s)+p8(s) ≤ 0.216
13. p7(s) ≤ 0.162
14. p3(s)+p8(s) ≤ 0.1512
15. p9(s) ≤ 0.1404
16. p1(s)+p10(s) ≤ 0.1512
85
17. p2(s)+p9(s)+p10(s)+p11(s)+p12(s) ≤ 0.216
18. p11(s)+p13(s) ≤ 0.162
19. p3(s)+p12(s)+p13(s) ≤ 0.1512
20. p4(s)+p14(s)+p15(s) ≤ 0.1638
21. p5(s)+p9(s)+p14(s)+p16(s)+p17(s) ≤ 0.234
22. p15(s)+p16(s)+p18(s) ≤ 0.1755
23. p17(s)+p18(s) ≤ 0.1638
24. p6(s)+p10(s)+p14(s)+p19(s)+p20(s) ≤ 0.252
25. p15(s)+p19(s) ≤ 0.189
26. p2(s) ≤ 0.1764
27. p7(s)+p11(s)+p16(s)+p19(s)+p21(s) ≤ 0.27
28. p8(s)+p12(s)+p17(s)+p20(s)+p21(s) ≤ 0.252
29. p13(s)+p18(s)+p21(s) ≤ 0.189
30. pi ( s ) ≥ 0 ,
i = 1,2,…,21.
31. π ij ≥ 0 for ∀ i ≠ j = 1,2,...,21.
After solving the above model, we get the desired results
displayed in table 3, with the value of the D(p0 , p1 ) as 0.439125.
Example 2(b): For this example the population size is N=7 and the
sample size is n=3, thus the set S of all possible samples and the set of
non-preferred samples will remain the same as in part (a) of this
example. The Yi and pi values associated with the 7 villages of the
population are:
Yi :
12
15
17
24
pi :
.08
.08
.16
.11
86
17
19
25
.24
.2
.13
To find out the initial p(s) values we apply the Sampford
(1967) plan and get the following values of p(s) (only of preferred
sample combinations).
p(s1) = .0032;
p(s2) = .0135;
p(s3) = .0040;
p(s4) = .0082;
p(s5) = .0343;
p(s6) = .0201;
p(s7) = .0515;
p(s8) = .0251;
p(s9) = .0343;
p(s10) =.0201;
p(s11) = .0515;
p(s12) =.0251;
p(s13) = .0157;
p(s14) =.0504;
p(s15) = .0318;
p(s16) = .1254
p(s17) =.0628;
p(s18) = .0398;
p(s19) =.0753;
p(s20) = .0371;
p(s21) = .0934.
Now to assign zero probability of selection to non-preferred
samples we use the design p0(s) and get the values as follows.
p0(s1)=.1835;
p0(s2)=.0225;
p0(s3)=.0066;
p0(s4)=.0137;
p0(s5)=.0570;
p0(s6)=.0334;
p0(s7)=.0855;
p0(s8)=.0418;
p0(s9)=.0570;
p0(s10)=.0334;
p0(s11)=.0855;
p0(s12)=.0418;
p0(s13)=.0261;
p0(s14)=.0838;
p0(s15)=.0529;
p0(s16)=.2084;
p0(s17)=.1044;
p0(s18)=.0661;
p0(s19)=.1251;
p0(s20)=.0616;
p0(s21)=.1552.
Now the objective function and the constraints are given as follows
Minimize z = 5.45*p1(s)^2+44.42*p2(s)^2+150.98*p3(s)^2+73.23*p4
(s)^2+17.53*p5(s)^2+29.98*p6(s)^2+11.69*p7(s)^2+23.93*p8(s)^2+1
7.53*p9(s)^2+29.98*p10(s)^2 +11.69*p11(s)^2+23.93*p12(s)^2+38.25*
p13(s)^2+11.93*p14(s)^2+18.91*p15(s)^2+4.8*p16(s)^2+9.58*p17(s)^2+
15.14*p18(s)^2+7.99*p19(s)^2+16.23*p20(s)^2+6.45* p21(s)^2-1
87
Subject to the constraints defined in (15), with the change in the
values of right hand side as follows
1, 0.24, 0.24, 0.48, 0.33, 0.72, 0.60, 0.39, 0.0576, 0.1152, 0.0792,
0.1728, 0.144, 0.0936, 0.1152, 0.0792, 0.1728, 0.144, 0.0936, 0.1584,
0.3456, 0.288, 0.1872, 0.2376, 0.198, 0.1287, 0.432, 0.2808, 0.234
After solving the above model, we get the desired results
displayed in table 3, with the value of the D(p0 , p1 ) as 0.274255
Example 3(a): Consider a population consisting of 8 villages and we
have to draw a sample of 3 villages. The set S of all possible samples
consists of 56 samples and are given as follows
123;
124;
125;
126;
127;
128;
134;
135;
136;
137;
138;
145;
146;
147;
148;
156;
157;
158;
167;
168;
178;
234;
235;
236;
237;
238;
245;
246;
247;
248;
256; 257;
258;
267;
268;
278;
345;
346;
347;
348;
356;
357;
358;
367;
368;
378;
456;
457;
458;
467;
468;
478;
567; 568;
578; 678.
The non-preferred sample combinations for this population are:
123;
126;
128;
136;
137;
146;
147;
167;
178;
248;
234;
236;
237;
246;
247;
347;
458;
467;
468;
478;
578.
The values of Yi and pi are given as follows:
Yi :
12
pi :
.10
15
.10
17
.11
24
.12
88
17
19
25
18
.18
.13
.12
.14
Since the pi values satisfy the condition (1), we apply MS scheme to
get an IPPS plan with the revised normal size measures pi*’s. Using
(2), the values of pi*’s are given as follows
p1*’s=.02;
p2*’s=.02;
p3*’s=.062;
p4*’s=.104;
p5*’s=.356;
p6*’s=.146;
p7*’s=.104;
p8*’s=.188.
The values of p(s) for preferred sample combinations are given as
follows.
p(s1) = .0069;
p(s2) = .0189;
p(s3) = .0069;
p(s4) = .0089;
p(s5) = .0209;
p(s6) = .0129;
p(s7) = .0229;
p(s8) = .0149;
p(s9) = .0249;
p(s10) =.0229;
p(s11) = .0269;
p(s12) =.0169;
p(s13) = .0209;
p(s14) =.0129;
p(s15) = .0229;
p(s16) = .0249;
p(s17) =.0229;
p(s18) = .0269;
p(s19) =.0129;
p(s20) = .0169;
p(s21) = .0149;
p(s22) =.0249;
p(s23) = .0149;
p(s24) =.0169;
p(s25) = .0269;
p(s26) = .0249;
p(s27) =.0289;
p(s28) = .0149;
p(s29) =.0189;
p(s30) = .0169;
p(s31) = .0289;
p(s32) =.0269;
p(s33) = .0289;
p(s34) =.0329;
p(s35) = .0209.
Now to assign zero probability of selection to non-preferred
samples we use the design p0(s) and get the values as follows.
p0(s1)=.0097;
p0(s2)=.027;
p0(s3)=.0097;
p0(s4)=.0125;
p0(s5)=.0295;
p0(s6)=.0182;
p0(s7)=.0324;
p0(s8)=.0210;
p0(s9)=.0352;
p0(s10)=.0323;
p0(s11)=.0380;
p0(s12)=.0238;
p0(s13)=.0295;
p0(s14)=.0182;
p0(s15)=.0323;
p0(s16)=.0352;
p0(s17)=.0323;
p0(s18)=.0380;
p0(s19)=.0182;
p0(s20)=.0238;
p0(s21)=.0210;
p0(s22)=.0352;
p0(s23)=.0210;
p0(s24)=.0238;
89
p0(s25)=.0380;
p0(s26)=.0352;
p0(s27)=.0408;
p0(s28)=.0210;
p0(s29)=.026;
p0(s30)=.0238;
p0(s31)=.0408;
p0(s32)=.0380;
p0(s33)=.0408;
p0(s34)=.0465;
p0(s35)=.0295.
Now the objective function and the constraints for this example
are given as follows
Minimize z = 102.96*p1(s)^2+37.44*p2(s)^2+102.96*p3(s)^2+79.71*
p4(s)^2+33.85*p5(s)^2+54.91*p6(s)^2+30.89*p7(s)^2+47.52*p8(s)^2+
28.40*p9(s)^2+30.89*p10(s)^2+26.29*p11(s)^2+41.88*p12(s)^2+33.85
*p13(s)^2+54.91*p14(s)^2+30.89*p15(s)^2+28.40*p16(s)^2+30.89*p17(
s)^2+26.29*p18(s)^2+54.91*p19(s)^2+41.88*p20(s)^2+47.52*p21(s)^2+
28.40*p22(s)^2+47.52*p23(s)^2+41.88*p24(s)^2+26.29*p25(s)^2+28.40
*p26(s)^2+24.47*p27(s)^2+47.52*p28(s)^2+37.44*p29(s)^2+41.88*p30(
s)^2+24.47*p31(s)^2+26.29*p32(s)^2+24.47*p33(s)^2+2.49*p34(s)^2+3
3.85*p35(s)^2-1
Subject to the constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s
)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+ p19(s)+p20(s)+p
21(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p30(s)
+p31(s)+p32(s)+p33(s)+p34(s)+p35(s) = 1
2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s
)+p12(s) = 0.3
3. p1(s)+p2(s)+p3(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+
p20(s)+p21(s) = 0.3
4. p4(s)+p5(s)+p6(s)+p13(s)+p14(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+
p27(s)+p28(s)+p29(s)+p30(s) = 0.33
90
5. p1(s)+p4(s)+p7(s)+p8(s)+p15(s)+p22(s)+p23(s)+p24(s)+p31(s)+p32(s)=
0.36
6. p2(s)+p5(s)+p7(s)+p9(s)+p10(s)+p11(s)+p13(s)+p15(s)+p16(s)+p17(s)+
p18(s)+p22(s)+p25(s)+p26(s)+p27(s)+p31(s)+p32(s)+p33(s)+p34(s)= .54
7. p9(s)+p12(s)+p16(s)+p19(s)+p20(s)+p23(s)+p25(s)+p28(s)+p29(s)+p31(s)
+p33(s)+p34(s)+p35(s) = 0.39
8. p3(s)+p10(s)+p17(s)+p19(s)+p21(s)+p26(s)+p28(s)+p30(s)+p32(s)+p33(s)
+p35(s) = 0.36
9. p6(s)+p8(s)+p11(s)+p12(s)+p14(s)+p18(s)+p20(s)+p21(s)+p24(s)+p27(s)
+p29(s)+p30(s)+p34(s)+p35(s) = 0.42
10. p1(s)+p2(s)+p3(s)≤ 0.09
11. p4(s)+p5(s)+p6(s)≤ 0.099
(16)
12. p1(s)+p4(s)+p7(s)+p8(s) ≤ 0.108
13. p2(s)+p5(s)+p7(s)+p9(s)+p10(s)+p11(s) ≤ 0.162
14. p9(s)+p12(s) ≤ 0.117
15. p3(s)+p10(s) ≤ 0.108
16. p6(s)+p8(s)+p11(s)+p12(s) ≤ 0.126
17. p13(s)+p14(s) ≤ 0.099
18. p1(s)+p15(s) ≤ 0.108
19. p2(s)+p13(s)+p15(s)+p16(s)+p17(s)+p18(s) ≤ 0.162
20. p16(s)+p19(s)+p20(s) ≤ 0.117
21. p3(s)+p17(s)+p19(s)+p21(s) ≤ 0.108
22. p14(s)+p18(s)+p20(s)+p21(s) ≤ 0.126
23. p4(s)+p22(s)+p23(s)+p24(s) ≤ 0.1188
24. p5(s)+p13(s)+p22(s)+p25(s)+p26(s)+p27(s) ≤ 0.1782
25. p23(s)+p25(s)+p28(s)+p29(s) ≤ 0.1287
91
26. p26(s)+p28(s)+p30(s) ≤ 0.1188
27. p6(s)+p14(s)+p24(s)+p27(s)+p29(s)+p30(s) ≤ 0.1386
28. p7(s)+p15(s)+p22(s)+p31(s)+p32(s) ≤ 0.1944
29. p23(s)+p31(s) ≤ 0.1404
30. p32(s) ≤ 0.1296
31. p8(s)+p24(s) ≤ 0.1512
32. p9(s)+p16(s)+p25(s)+p31(s)+p33(s)+p34(s) ≤ 0.2106
33. p10(s)+p17(s)+p26(s)+p32(s)+p33(s) ≤ 0.1944
34. p11(s)+p18(s)+p27(s)+p34(s) ≤ 0.2268
35. p19(s)+p28(s)+p33(s)+p35(s) ≤ 0.1404
36. p12(s)+p20(s)+p29(s)+p34(s)+p35(s) ≤ 0.1638
37. p21(s)+p30(s)+p35(s) ≤ 0.1512
38. pi ( s) ≥ 0 ,
i = 1,2,…,35.
39. π ij ≥ 0 for ∀ i ≠ j = 1,2,...,35.
After solving the above model, we get the desired results
displayed in table 4, with the value of the D(p0 , p1 ) as 0.195194.
Example 3(b): The population size and the sample size for this
example is same as part (a), i.e. N=8 and n=3, thus the set S of all
possible samples and the set of non-preferred samples will remain the
same as in part (a) of this example. The Yi and pi values associated
with the 8 villages of the population are:
Yi :
12
15
17
pi :
.05
.09
.2
24
17
19
25
18
.15
.10
.11
.12
.18
92
To find out the initial p(s) values we apply the Sampford
(1967) plan and get the values of p(s) (only of preferred sample
combinations) as follows.
p(s1) = .0043;
p(s2) = .0024;
p(s3) = .0031;
p(s4) = .0146;
p(s5) = .0083;
p(s6) = .0199;
p(s7) = .0049;
p(s8) = .0118;
p(s9) = .0031;
p(s10) =.0035;
p(s11) = .0067;
p(s12) =.0076;
p(s13) = .0163;
p(s14) =.0388;
p(s15) = .0096;
p(s16) = .0061;
p(s17) =.0069;
p(s18) = .0132;
p(s19) =.0078;
p(s20) = .0149;
p(s21) = .0167;
p(s22) =.0325;
p(s23) = .0366;
p(s24) =.0761;
p(s25) = .0210;
p(s26) = .0235;
p(s27) =.0441;
p(s28) = .0266;
p(s29) =.0497;
p(s30) = .0556;
p(s31) = .0124;
p(s32) =.0139;
p(s33) = .0089;
p(s34) =.0169;
p(s35) = .0215.
Now to assign zero probability of selection to non-preferred
samples we use the design p0(s) and get the values as follows.
p0(s1)=.0065;
p0(s2)=.0036;
p0(s3)=.0046;
p0(s4)=.0221;
p0(s5)=.0125;
p0(s6)=.0301;
p0(s7)=.0073;
p0(s8)=.0178;
p0(s9)=.0046;
p0(s10)=.0052;
p0(s11)=.0101;
p0(s12)=.0114;
p0(s13)=.0246;
p0(s14)=.0587;
p0(s15)=.0145;
p0(s16)=.0093;
p0(s17)=.0104;
p0(s18)=.0199;
p0(s19)=.0118;
p0(s20)=.0225;
p0(s21)=.0253;
p0(s22)=.0492;
p0(s23)=.0555;
p0(s24)=.1152;
p0(s25)=.0317;
p0(s26)=.0356;
p0(s27)=.0667;
p0(s28)=.0403;
p0(s29)=.0752;
p0(s30)=.0842;
p0(s31)=.0188;
p0(s32)=.0211;
p0(s33)=.0135;
p0(s34)=.0257;
p0(s35)=.0326.
93
Now the objective function and the constraints are given as follows
Minimize z = 154.95*p1(s)^2+276.35*p2(s)^2+216.24*p3(s)^2+45.2
1*p4(s)^2+79.67*p5 (s)^2+33.16*p6(s)^2+135.63*p7(s)^2+55.9*p8(s)^
2+213.13*p9(s)^2+189.18*p10(s)^2+98.76*p11(s)^2+87.24*p12(s)^2+4
0.5*p13(s)^2+17.02*p14(s)^2+68.63*p15(s)^2+107.5*p16(s)^2+95.49*p
17(s)^2+50.11*p18(s)^2+84.31*p19(s)^2+44.31*p20(s)^2+39.43*p21(s)^
2+20.3*p22(s)^2+17.99*p23(s)^2+8.67*p24(s)^2+31.44*p25(s)^2+28.0
1*p26(s)^2+14.97*p27(s)^2+24.8*p28(s)^2+13.28*p29(s)^2+11.86*p30(
s)^2+53.15*p31(s)^2+47.28*p32(s)^2+73.85*p33(s)^2+38.86*p34(s)^2+
30.61*p35(s)^2-1
Subject to the constraints defined in (16), with the change in the
values of right hand side as follows
1, 0.15, 0.27, 0.6, 0.45, 0.3, 0.33, 0.36, .54, 0.0405, 0.09, 0.0675, 0.045,
0.0495, 0.054, 0.081, 0.162, 0.1215, 0.081,0 .0891, 0.0972, 0.1458, 0.27,
0.18, 0.198, 0.216, 0.324, 0.135, 0.1485, 0.162, 0.243, 0.099, 0.108, 0.162,
0.1188, 0.1782, 0.1944
After solving the above model, we get the desired results
displayed in table 4, with the value of the D(p0 , p1 ) as 0.441567.
Example 4(a): Consider a population consisting of 8 villages and we
have to draw a sample of 4 villages. The set S of all possible samples
consists of 70 samples and are given as follows
1234; 1235; 1236; 1237; 1238; 1245; 1246; 1247; 1248;
1256; 1257; 1258; 1267; 1268; 1278; 1345; 1346; 1347;
1348; 1356; 1357; 1358; 1367; 1368; 1378; 1456; 1457;
1458; 1467; 1468; 1478; 1567; 1568; 1578; 1678; 2345;
2346; 2347; 2348; 2356; 2357; 2358; 2367; 2368; 2378;
94
2456; 2457; 2458; 2467; 2468; 2478; 2567; 2568; 2578;
2678; 3456; 3457; 3458; 3467; 3468; 3478; 3567; 3568;
3578; 3678; 4567; 4568; 4578; 4678; 5678.
The non-preferred sample combinations are 28 and are already
defined in example 4(a).
The values of Yi and pi are given as follows:
Yi :
12
15
17
24
17
pi :
.11
.11
.12
.13
.17
19
.12
25
18
.11
.13
Since the pi values satisfy the condition (1), we apply MS scheme to
get an IPPS plan with the revised normal size measures pi*’s. Using
(2), the values of pi*’s are given as follows
p1*’s=.02;
p5*’s=.44;
p2*’s=.02;
p3*’s=.09;
p4*’s=.16;
p6*’s=.09;
p7*’s=.02;
p8*’s=.16.
The values of p(s) for preferred sample combinations are given as
follows
p(s1) = .0163;
p(s2) = .0043;
p(s3) = .0183;
p(s4) = .0063;
p(s5) = .0163;
p(s6) = .0143;
p(s7) = .0183;
p(s8) = .0043;
p(s9) = .0063;
p(s10) =.0203;
p(s11) = .0083;
p(s12) =.0183;
p(s13) = .0203;
p(s14) =.0063;
p(s15) = .0103;
p(s16) = .0083;
p(s17) =.0183;
p(s18) = .0223;
p(s19) =.0083;
p(s20) = .0103;
p(s21) = .0183;
p(s22) =.0083;
p(s23) = .0123;
p(s24) =.0183;
p(s25) = .0163;
p(s26) = .0203;
p(s27) =.0063;
p(s28) = .0103;
p(s29) =.0083;
p(s30) = .0183;
p(s31) = .0223;
p(s32) =.0083;
p(s33) = .0103;
p(s34) =.0183;
p(s35) = .0203;
p(s36) = .0243;
p(s37) =.0103;
p(s38) = .0123;
p(s39) =.0223;
p(s40) = .0203;
p(s41) = .0243;
p(s42) =.0223.
95
Now to assign zero probability of selection to non-preferred
samples we use the design p0(s) and get the values as follows.
p0(s1)=.0268;
p0(s2)=.007;
p0(s3)=.0301;
p0(s4)=.0103;
p0(s5)=.0268;
p0(s6)=.0235;
p0(s7)=.0301;
p0(s8)=.007;
p0(s9)=.0103;
p0(s10)=.0333;
p0(s11)=.0136;
p0(s12)=.0301;
p0(s13)=.0333;
p0(s14)=.0103;
p0(s15)=.0169;
p0(s16)=.0136;
p0(s17)=.0301;
p0(s18)=.0366;
p0(s19)=.0136;
p0(s20)=.0169;
p0(s21)=.0300;
p0(s22)=.0136;
p0(s23)=.0202;
p0(s24)=.0301;
p0(s25)=.0267;
p0(s26)=.033;
p0(s27)=.0103;
p0(s28)=.0169;
p0(s29)=.0136;
p0(s30)=.0301;
p0(s31)=.0366;
p0(s32)=.0136;
p0(s33)=.0169;
p0(s34)=.0301;
p0(s35)=.0333;
p0(s36)=.0399;
p0(s37)=.0169;
p0(s38)=.0202;
p0(s39)=.0366;
p0(s40)=.0333;
p0(s41)=.0399;
p0(s42)=.0366.
Now the objective function and the constraints for this example
are given as follows
Minimize z = 37.33*p1(s)^2+141.86*p2(s)^2+33.25*p3(s)^2+96.72*p
4(s)^2+37.33*p5(s)^2+42.56*p6(s)^2+33.25*p7(s)^2+141.86*p8(s)^2+
96.72*p9(s)^2+29.97*p10(s)^2+73.37*p11(s)^2+33.25*p12(s)^2+29.97
*p13(s)^2+96.72*p14(s)^2+59.11*p15(s)^2+73.37*p16(s)^2+33.25*p17(
s)^2+27.28*p18(s)^2+73.37*p19(s)^2+59.11*p20(s)^2+33.25*p21(s)^2+
73.37*p22(s)^2+49.48*p23(s)^2+33.25*p24(s)^2+37.33*p25(s)^2+29.97
*p26(s)^2+96.72*p27(s)^2+59.11*p28(s)^2+73.37*p29(s)^2+33.25*p30(
s)^2+27.28*p31(s)^2+73.37*p32(s)^2+59.11*p33(s)^2+33.25*p34(s)^2+
96
29.97*p35(s)^2+25.03*p36(s)^2+59.11*p37(s)^2+49.48*p38(s)^2+27.28
*p39(s)^2+29.97*p40(s)^2+25.03*p41(s)^2+27.28*p42(s)^2-1
Subject to the following constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s
)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+p2
1(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p30(s)
+p31(s)+p32(s)+p33(s)+p34(s)+p35(s)+p36(s)+p37(s)+p38(s)+p39(s)+p40
(s)+p41(s)+p42(s) = 1
2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s
)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+p2
1(s)=
0.44
3. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+22(s)+p23(s)
+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p30(s)+p31(s)+p32(s)+p33
(s)+p34(s) = 0.44
4. p1(s)+p2(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p22(s)
+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p35(s)+p36(s)+p37
(s)+p38(s)+p39(s)+p40(s)= 0.48
5. p3(s)+p4(s)+p10(s)+p11(s)+p17(s)+p18(s)+p19(s)+p20(s)+p22(s)+p23(s)
+p30(s)+p31(s)+p32(s)+p33(s)+p35(s)+p36(s)+p37(s)+p38(s)+p41(s)+p42
(s)= 0.52
6. p1(s)+p3(s)+p5(s)+p6(s)+p7(s)+p10(s)+p12(s)+p13(s)+p17(s)+p18(s)+p
21(s)+p24(s)+p25(s)+p26(s)+p30(s)+p31(s)+p34(s)+p35(s)+p36(s)+p39(s)
+p40(s)+p41(s)+p42(s) = 0.68
97
7. p5(s)+p8(s)+p12(s)+p14(s)+p15(s)+p19(s)+p24(s)+p27(s)+p28(s)+p32(s)
+p37(s)+p39( s)+p41(s) = 0.48
8. p2(s)+p4(s)+p6(s)+p8(s)+p9(s)+p11(s)+p14(s)+p16(s)+p17(s)+p19(s)+p
20(s)+p21(s)+p22(s)+p25(s)+p27(s)+p29(s)+p30(s)+p32(s)+p33(s)+p34(s)
+p35(s)+p37(s)+p38(s)+p40(s)+p42(s) = 0.44
9. p7(s)+p9(s)+p13(s)+p15(s)+p16(s)+p18(s)+p20(s)+p21(s)+p23(s)+p26(s)
+p28(s)+p29(s)+p31(s)+p33(s)+p34(s)+p36(s)+p38(s)+p39(s)+p40(s)+p41
(s)+ p42(s) = 0.52
10. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)≤ 0.1936
11. p1(s)+p2(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)≤
0.2112
12. p3(s)+p4(s)+p10(s)+p11(s)+p17(s)+p18(s)+p19(s)+p20(s) ≤ 0.2288
13. p1(s)+p3(s)+p5(s)+p6(s)+p7(s)+p10(s)+p12(s)+p13(s)+p17(s)+p18(s)+p
21(s)
≤ 0.2992
14. p5(s)+p8(s)+p12(s)+p14(s)+p15(s)+p19(s) ≤ 0.2112
15. p2(s)+p4(s)+p6(s)+p8(s)+p9(s)+p11(s)+p14(s)+p16(s)+p17(s)+p19(s)+p
20(s)+p21(s)≤
0.1936
16. p7(s)+p9(s)+p13(s)+p15(s)+p16(s)+p18(s)+p20(s)+p21(s) ≤ 0.2288
17. p1(s)+p2(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)
≤ 0.2112
18. p3(s)+p4(s)+p22(s)+p23(s)+p30(s)+p31(s)+p32(s)+p33(s) ≤ 0.2288
19. p1(s)+p3(s)+p5(s)+p6(s)+p7(s)+p24(s)+p25(s)+p26(s)+p30(s)+p31(s)+p
34(s)
≤ 0.2992
20. p5(s)+p8(s)+p24(s)+p27(s)+p28(s)+p32(s) ≤ 0.2112
21. p2(s)+p4(s)+p6(s)+p8(s)+p9(s)+p22(s)+p25(s)+p27(s)+p29(s)+p30(s)+p
32(s)+p33(s)+p34(s)
≤ 0.1936
98
22. p7(s)+p9(s)+p23(s)+p26(s)+p28(s)+p29(s)+p31(s)+p33(s)+p34(s)≤ .2288
23. p10(s)+p11(s)+p22(s)+p23(s)+p35(s)+p36(s)+p37(s)+p38(s) ≤ 0.2496
24. p1(s)+p10(s)+p12(s)+p13(s)+p24(s)+p25(s)+p26(s)+p35(s)+p36(s)+p39(s)
+p40(s)≤ 0.3264
(17)
25. p12(s)+p14(s)+p15(s)+p24(s)+p27(s)+p28(s)+p37(s)+p39(s) ≤ 0.2304
26. p2(s)+p11(s)+p14(s)+p16(s)+p22(s)+p25(s)+p27(s)+p29(s)+p35(s)+p37(s)
+p38(s)+p40 (s) ≤ 0.2112
27. p13(s)+p15(s)+p16(s)+p23(s)+p26(s)+p28(s)+p29(s)+p36(s)+p38(s)+p39(s
)+p40(s)≤ 0.2496
28. p3(s)+p10(s)+p17(s)+p18(s)+p30(s)+p31(s)+p35(s)+p36(s)+p41(s)+p42(s)
≤ 0.3536
29. p19(s)+p32(s)+p37(s)+p41(s) ≤ 0.2496
30. p4(s)+p11(s)+p17(s)+p19(s)+p20(s)+p22(s)+p30(s)+p33(s)+p35(s)+p37(s)
+p38(s)+p42 (s) ≤ 0.2288
31. p18(s)+p20(s)+p23(s)+p31(s)+p33(s)+p36(s)+p38(s)+p41(s)+p42(s)≤
0.2704
32. p5(s)+p12(s)+p24(s)+p39(s)+p41(s) ≤ 0.3264
33. p6(s)+p17(s)+p21(s)+p25(s)+p30(s)+p34(s)+p35(s)+p40(s)+p42(s)≤
0.2992
34. p7(s)+p13(s)+p18(s)+p21(s)+p26(s)+p31(s)+p34(s)+p36(s)+p39(s)+p40(s)
+p41(s)+p42(s) ≤ 0.3536
35. p8(s)+p14(s)+p19(s)+p27(s)+p32(s)+p37(s) ≤ 0.2112
36. p15(s)+p28(s)+p39(s)+p41(s) ≤ 0.2496
37. p9(s)+p16(s)+p20(s)+p21(s)+p29(s)+p33(s)+p34(s)+p38(s)+p40(s)+p42(s)
≤ 0.2288
38. pi ( s) ≥ 0 ,
i = 1,2,…,42.
99
39. π ij ≥ 0 for ∀ i ≠ j = 1,2,...,42.
After solving the above model, we get the desired results
displayed in table 5, with the value of the D(p0 , p1 ) as 0.405197
Example 4(b): Again consider a population with N=8 and n=3. The
set S of all possible samples and the set of non-preferred samples will
remain the same as in part (a) of this example. The Yi and pi values
associated with the 8 villages of the population are:
Yi :
12
15
17
24
17
19
25
18
pi :
.09
.09
.18
.11
.12
.14
.17
.10
To find out the initial p(s) values we apply the Sampford (1967)
plan and get the values of p(s) (only of preferred sample
combinations) as follows.
p(s1) = .0084;
p(s2) = .0175;
p(s3) = .0029;
p(s4) = .0061;
p(s5) = .0045;
p(s6) = .0070;
p(s7) = .0025;
p(s8) = .0093;
p(s9) = .0052;
p(s10) =.0112;
p(s11) = .0233;
p(s12) =.0171;
p(s13) = .0097;
p(s14) =.0353;
p(s15) = .0129;
p(s16) = .0202;
p(s17) =.0095;
p(s18) = .0033;
p(s19) =.0125;
p(s20) = .0071;
p(s21) = .0082;
p(s22) =.0233;
p(s23) = .0084;
p(s24) =.0171;
p(s25) = .0268;
p(s26) = .0097;
p(s27) =.0353;
p(s28) = .0129;
p(s29) =.0202;
p(s30) = .0095;
p(s31) = .0033;
p(s32) =.0125;
p(s33) = .0071;
p(s34) =.0082;
p(s35) = .0357;
p(s36) = .0130;
p(s37) =.0469;
p(s38) = .0270;
p(s39) =.0199;
p(s40) = .0310;
p(s41) = .0070;
p(s42) =.0110.
100
Now to assign zero probability of selection to non-preferred
samples we use the design p0(s) and get the values as follows
p0(s1)=.0135;
p0(s2)=.0281;
p0(s3)=.0046;
p0(s4)=.0098;
p0(s5)=.0072;
p0(s6)=.0113;
p0(s7)=.0040;
p0(s8)=.0150;
p0(s9)=.0085;
p0(s10)=.0181;
p0(s11)=.0375;
p0(s12)=.0276;
p0(s13)=.0157;
p0(s14)=.0568;
p0(s15)=.0208;
p0(s16)=.0326;
p0(s17)=.0152;
p0(s18)=.0054;
p0(s19)=.0202;
p0(s20)=.0114;
p0(s21)=.0132;
p0(s22)=.0375;
p0(s23)=.0136;
p0(s24)=.0276;
p0(s25)=.0431;
p0(s26)=.0157;
p0(s27)=.0568;
p0(s28)=.0208;
p0(s29)=.0326;
p0(s30)=.0152;
p0(s31)=.0054;
p0(s32)=.0202
p0(s33)=.0114;
p0(s34)=.0132;
p0(s35)=.0575;
p0(s36)=.0210;
p0(s37)=.0756;
p0(s38)=.0435;
p0(s39)=.0320;
p0(s40)=.0500;
p0(s41)=.0112;
p0(s42)=.0177.
Now the objective function and the constraints for this example
are given as follows.
Minimize z = 74.02*p1(s)^2+35.57*p2(s)^2+213.54*p3(s)^2+101.34
*p4(s)^2+138.88*p5(s)^2+87.89*p6(s)^2+247.47*p7(s)^2+66.24*p8(s)
^2+117.27*p9(s)^2+55.11*p10(s)^2+26.6*p11(s)^2+36.19*p12(s)^2+63
.68*p13(s)^2+17.59*p14(s)^2+48.07*p15(s)^2+30.67*p16(s)^2+65.39*p
17(s)^2+183.27*p18(s)^2+49.36*p19(s)^2+87.12*p20(s)^2+75.58*p21(s)
^2+26.6*p22(s)^2+73.38*p23(s)^2+36.19*p24(s)^2+23.15*p25(s)^2+63
.68*p26(s)^2+17.59*p27(s)^2+48.07*p28(s)^2+30.67*p29(s)^2+65.39*p
30(s)^2+183.27*p31(s)^2+49.36*p32(s)^2+87.12*p33(s)^2+75.58*p34(s)
101
^2+17.36*p35(s)^2+47.45*p36(s)^2+13.22*p37(s)^2+22.95*p38(s)^2+3
1.2*p39(s)^2+19.99*p40(s)^2+88.64*p41(s)^2+56.27*p42(s)^2-1
Subject to the constraints defined in (17), with the change in the
values of right hand side as follows
1, 0.36, 0.36, 0.72, 0.44, 0.48, 0.56, 0.68, .4, 0.1296, 0.2592, 0.1584,
0.1728, 0.2016, 0.2448, 0.144, 0.2592, 0.1584, 0.1728, 0.2016, 0.2448,
0.144, 0.3168, 0.3456, 0.4032, 0.4896, 0.288, 0.2112, 0.2464, 0.2992,
0.176, 0.2688, 0.3264, 0.192, 0.3808, 0.224, 0.272
After solving the above model, we get the desired results
displayed in table 5, with the value of the D(p0 , p1 ) as 0.579257
Example 5: Consider a population consisting of 7 villages and we
have to draw a sample of 4 villages. The set S of all possible samples
consists of 35 samples and are given as follows
1234; 1235; 1236; 1237; 1245; 1246; 1247; 1256; 1257;
1267; 1345; 1346; 1347; 1356; 1357; 1367; 1456; 1457;
1467; 1567; 2345; 2346; 2347; 2356; 2357; 2367; 2456;
2457; 2467; 2567; 3456; 3457; 3467; 3567; 4567.
The non-preferred sample combinations are 14 and are already
defined in example 5.
The values of Yi and pi are given as follows:
Yi :
12
15
17
24
17
19
25
pi :
.14
.13
.15
.13
.16
.15
.14
Since the pi values satisfy the condition (1), we apply MS scheme.
The values of pi*’s are given as follows
102
p1*’s=.12;
p2*’s=.04;
p3*’s=.2;
p5*’s=.28;
p6*’s=.2;
p7*’s=.12;
p4*’s=.04;
The values of p(s) for preferred sample combinations are given as
follows
p(s1) = .032;
p(s2) = .024;
p(s3) = .024;
p(s4) = .016;
p(s5) = .032;
p(s6) = .028;
p(s7) = .024;
p(s8) = .032;
p(s9) = .024;
p(s10) =.004;
p(s11) = .032;
p(s12) =.028;
p(s13) = .024;
p(s14) =.02;
p(s15) = .036;
p(s16) = .032;
p(s17) =.028;
p(s18) = .024;
p(s19) =.02;
p(s20) = .032;
p(s21) = .02.
Now to assign zero probability of selection to non-preferred
samples we use the design p0(s) and get the values as follows
p0(s1)=.0559;
p0(s2)=.0419;
p0(s3)=.0419;
p0(s4)=.0279;
p0(s5)=.0559;
p0(s6)=.0489;
p0(s7)=.0419;
p0(s8)=.0559;
p0(s9)=.0419;
p0(s10)=.0699;
p0(s11)=.0559;
p0(s12)=.0489;
p0(s13)=.0419;
p0(s14)=.0349;
p0(s15)=.0629;
p0(s16)=.0559;
p0(s17)=.0489;
p0(s18)=.0419;
p0(s19)=.0349;
p0(s20)=.0559;
p0(s21)=.0349.
Now the objective function and the constraints for this example
are given as follows.
Minimize z = 18.13*p1(s)^2+24.17*p2(s)^2+24.17*p3(s)^2+36.25*p4
(s)^2+18.13*p5(s)^2+20.71*p6(s)^2+24.17*p7(s)^2+18.13*p8(s)^2+2
4.17*p9(s)^2+14.5*p10(s)^2+18.13*p11(s)^2+20.71*p12(s)^2+24.17*p1
103
3(s)^2+29*p14(s)^2+16.11*p15(s)^2+18.13*p16(s)^2+20.71*p17(s)^2+2
4.17*p18(s)^2+29*p19(s)^2+18.13*p20(s)^2+20.71*p21(s)^2-1
Subject to the following constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s
)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+p2
1(s)
=1
2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s
)+p12(s)+p13(s) = 0.56
3. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p14(s)+p15(s)+p16(s)+p17
(s)+p18(s)+p19(s) = 0.52
4. p1(s)+p2(s)+p8(s)+p9(s)+p10(s)+p11(s)+p14(s)+p15(s)+p16(s)+p17(s)+
p20(s)+p21(s) = 0.6
5. p3(s)+p4(s)+p8(s)+p9(s)+p12(s)+p13(s)+p14(s)+p18(s)+p19(s)+p20(s)+
p21(s) = 0.52
6. p1(s)+p3(s)+p5(s)+p6(s)+p8(s)+p10(s)+p12(s)+p15(s)+p16(s)+p18(s)+p
20(s)=
0.64
7. p5(s)+p7(s)+p10(s)+p11(s)+p13(s)+p15(s)+p17(s)+p19(s)+p21(s) = 0.6
8. p2(s)+p4(s)+p6(s)+p7(s)+p9(s)+p11(s)+p12(s)+p13(s)+p14(s)+p16(s)+p
17(s)+p18(s)+p19(s)+p20(s)+p21(s)
= 0.56
9. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)≤ 0.2912
10. p1(s)+p2(s)+p8(s)+p9(s)+p10(s)+p11(s) ≤ 0.336
11. p3(s)+p4(s)+p8(s)+p9(s)+p12(s)+p13(s) ≤ 0.2912
12. p1(s)+p3(s)+p5(s)+p6(s)+p8(s)+p10(s)+p12(s) ≤ 0.3584
13. p5(s)+p7(s)+p10(s)+p11(s)+p13(s) ≤ 0.336
104
14. p2(s)+p4(s)+p6(s)+p7(s)+p9(s)+p11(s)+p12(s)+p13(s) ≤ 0.3136
15. p1(s)+p2(s)+p14(s)+p15(s)+p16(s)+p17(s) ≤ 0.312
16. p3(s)+p4(s)+p14(s)+p18(s)+p19(s) ≤ 0.2704
17. p1(s)+p3(s)+p5(s)+p6(s)+p15(s)+p16(s)+p18(s) ≤ 0.3328
18. p5(s)+p7(s)+p15(s)+p17(s)+p19(s) ≤ 0.312
19. p2(s)+p4(s)+p6(s)+p7(s)+p14(s)+p16(s)+p17(s)+p18(s)+p19(s)≤ 0.2912
20. p8(s)+p9(s)+p14(s)+p20(s)+p21(s) ≤ 0.312
21. p8(s)+p10(s)+p15(s)+p16(s)+p20(s) ≤ 0.384
22. p10(s)+p11(s)+p15(s)+p17(s)+p21(s) ≤ 0.36
23. p2(s)+p9(s)+p11(s)+p14(s)+p16(s)+p17(s)+p20(s)+p21(s) ≤ 0.336
24. p3(s)+p8(s)+p12(s)+p18(s)+p20(s) ≤ 0.3328
25. p13(s)+p19(s)+p21(s) ≤ 0.312
26. p4(s)+p9(s)+p12(s)+p13(s)+p14(s)+p18(s)+p19(s)+p20(s)+p21(s)≤ .2912
27. p5(s)+p10(s)+p15(s) ≤ 0.384
28. p6(s)+p12(s)+p16(s)+p18(s)+p20(s) ≤ 0.3584
29. p7(s)+p11(s)+p13(s)+p17(s)+p19(s)+p21(s) ≤ 0.336
30. pi ( s) ≥ 0 ,
i = 1,2,…,21.
31. π ij ≥ 0 for ∀ i ≠ j = 1,2,...,21.
After solving the above model, we get the desired results
displayed in table 6, with the value of the D(p0 , p1 ) as .228878.
105
CHAPTER III
TWO DIMENSIONAL OPTIMAL CONTROLLED
NEAREST PROPORTIONAL TO SIZE SAMPLING
DESIGN USING QUADRATIC PROGRAMMING
3.1 INTRODUCTION
Controlled selection, originated by Goodman and Kish (1950),
may be described as a method of sampling from finite universe which
permits multiple stratification beyond what is possible by stratified
random sampling, while conforming strictly to the requirements of
probability sampling. In many practical situations, some combinations
of units may be too expensive, less prominent or even undesirable to
be included in the sample. The samples containing these undesirable
combinations of units are termed as non-preferred samples. Controlled
selection in such cases would either exclude the possibility of
including such combinations of units or assign them a low probability
of selection. Controls may be imposed to secure proper distribution
geographically or otherwise and to increase adequate sample size for
some domains of the population. In fact any departure from simple
106
random sampling may be regarded as a control, which increases the
selection probability of preferred combinations by eliminating or
reducing the non-preferred combinations. This situation generally
arises in field experiments where the practical considerations make
some units undesirable, but the theoretical compulsion make it
necessary to follow the probability sampling.
An important area where controlled selection can be effectively
used is sampling in two or more dimensions. Multi-dimensional
sampling problems often arise in social research dealing with highly
variable populations requiring stratification in several directions.
Bryant (1961), Hess and Srikantan (1966), Moore et. al. (1974) and
Jessen (1975) demonstrated the need for multi-way stratification in
different real life situations. This multiple stratification often leads to
more strata cells than can be accommodated in a sampling design. For
example, in Jessen’s (1975) study with 12 geographic areas and 12
income classes, there will be 144 strata cells but the funds were
available to sample only 24 of the cells. Similarly in Bryant’s (1961)
study, with 5 locations, two times of a day, 4 seasons and two types of
days, there will be 80 strata cells but only 46 cells could be covered
within the budget of the study. This leads to the need for stratification
107
techniques, which could permit fewer cells to be sampled than the
total number of strata cells, without sacrificing the requirements of
probability sampling. Controlled selection has been effectively used
by different researchers to deal with such situations. Multidimensional controlled selection is highly useful when the number of
strata cells exceeds the permissible sample size.
Another situation that needs attention arises when stratification
cannot fully exploit the gains of controls, leading to the need for
‘controls beyond stratification’. Controls beyond stratification further
enhance the probabilities of preferred combinations, by eliminating or
reducing the undesirable combinations because they violate defined
control classes. A real problem emphasizing the need for ‘controls
beyond stratification’ was discussed by Goodman and Kish (1950, p.
354). Tiwari and Nigam (1998,p. 92) also demonstrated the utility of
‘controls beyond stratification’ with the help of an example borrowed
from Jessen (1978).
Two-way and Multi-way stratification has been discussed by
Jessen (1970, 1973, 1975, 1978) under the title of ‘lattice sampling’
and ‘multi stratification’. Goodman and Kish (1950), Hess and
Srikantan (1966) and Waterton (1983) also discussed multi
108
dimensional controlled selection and proposed different methods for
achieving the controls. All these methods of multi dimensional
controlled selection are quite arbitrary, involving a lot of trial and
error for selecting the samples and in many situations they even fail to
produce a solution.
Causey et.al. (1985) were the first to use the transportation
theory to solve the two- dimensional controlled selection problem.
Their method is efficient but complex to implement. Rao and Nigam
(1990, 1992) used the simplex method in linear programming to solve
the one dimensional controlled selection problem. Taking an
inspiration from Rao and Nigam (1990, 1992), Sitter and Skinner
(1994) extended the linear programming approach to multi-way
stratification. Tiwari and Nigam (1998) also used simplex method in
linear programming to solve the two dimensional optimal controlled
selection problem with ‘controls beyond stratification’ and considered
the related estimation problems. The plan of Tiwari and Nigam (1998)
is best suited to problems with integer marginals, whereas the method
of Sitter and Skinner is best suited for non-integer marginals. Here,
‘marginals’ are the totals corresponding to each row, column and
grand total in a two-dimensional table. Extending the linear
109
programming approach of Sitter and Skinner (1994), Lu and Sitter
(2002) developed some methods to reduce the amount of computation
so that very large problems became feasible using the linear
programming approach.
Recently, using quadratic programming,
Tiwari et al. (2007) applied the idea of ‘nearest proportional to size
sampling design’ originated by Gabler (1987), to one- dimensional
optimal controlled selection designs which fully exclude the nonpreferred combination of units from the selected samples.
In this chapter we extend the idea of Tiwari et al. (2007) to the
multi-dimensional controlled selection problems with ‘controls
beyond stratification’. The proposed plan appears to be superior to the
earlier two-dimensional controlled selection plans, as it ensures zero
probability to non-preferred samples. The greatest difficulty with the
multi-dimensional controlled selection problems is that due to
increased problem magnitude and complexity, the process of
enumeration of all possible samples becomes quite tedious. The
methodological modification in multi-dimensional approach over the
one dimensional approach is that instead of taking all the
N
Cn
combinations as the set of all possible samples, we consider only a
sub-set of the NCn combinations which satisfy the marginal constraints
110
of the given multi-dimensional problem. With multi-dimensional
controlled selection problems, the potential technical difficulty lies in
the fact that the non-negativity condition of the Yates-Grundy (1953)
form of the Horvitz-Thompson (1952) variance estimator is not
satisfied. This leads to the omission of this constraint from the plan
and introduction of an alternative variance estimator.
Another problem that needs attention is of variance estimation
in multi-dimensional controlled selection problems. As also pointed
out by Tiwari and Nigam (1998), a practical difficulty while dealing
with the multi-dimensional controlled selection problems is that they
generally do not satisfy the non-negativity condition of the YatesGrundy (1953) form of the Horvitz-Thompson (1952) variance
estimator. To overcome this difficulty, a random group method for
variance estimation in two dimensional controlled selection problems
has been suggested. The proposed method appears to perform better
than the ‘split sample’ method of Jessen (1975) and the ‘half-sample’
method of Tiwari and Nigam (1998).
In Section 3.2, the proposed design has been discussed. In
Section 3.3, the proposed design has been illustrated with the help of
some numerical examples. In Section 3.4, we suggest a random group
111
method for variance estimation in two-dimensional controlled
selection problems and demonstrate its utility with the help of
examples.
3.2 THE TWO DIMENSIONAL OPTIMAL CONTROLLED
NEAREST PROPORTIONAL TO SIZE SAMPLING
DESIGN
In what follows, we use the idea of ‘nearest proportional to size
sampling designs’, originated by Gabler (1987), to propose a two
dimensional optimal controlled IPPS sampling design that matches
the original inclusion probabilities ( π i ’s) of each unit in the
population and ensures zero probability to non-preferred samples.
Let us consider a two-dimensional population of y’s consisting
of N elements and let x’s be their measures of size. The selection
probabilities of these N units of the population (pi’s) are known and
are given by pi =
N
xi
, where X = ∑ xi . Suppose a sample of size n is to
X
i =1
be drawn from this population. We denote the inclusion probability of
the i th unit in the sample by π i , where π i = n pi. Let S and S1 denote
respectively, the set of all possible samples and the set of nonpreferred samples.
112
In the proposed plan, using the given selection probabilities for
N units of the population (pi’s), we first obtain an appropriate
uncontrolled IPPS design p(s), such as Sampford (1967) or MidzunoSen (1952, 1953) design. In this discussion, we make use of
Sampford’s (1967) IPPS design to obtain our initial uncontrolled IPPS
design (p(s)), as this design imposes only one restriction on initial
probabilities ( pi’s), that is, pi ≤ 1/n, whereas the other IPPS designs
impose more stringent restrictions on initial probabilities. For
instance, the Midzuno-Sen (1952, 1953) IPPS scheme has the
restriction that (n-1)/{n(N-1)} ≤
pi
≤
1/n, which limits the
applicability of the method to units that are rather similar in size.
Using Sampford’s scheme, the probability of including n units in the
s th sample is given by
n


p( s ) = π i1i2 ...in = nk n λi1 λi2 ...λin 1 − ∑ piu 
 u =1

−1
where
 n tLn −t 
kn =  ∑ t  ,
 t =1 n 
λi =
pi
and for a set S (m) of
1 − pi
m ≤ N different units, i1, i2, …im, Lm is defined as
113
(3.2.1)
L0 = 1 and Lm =
∑λ λ
S (m)
i1
i2
...λim
(1 ≤ m ≤ N ) .
After obtaining the initial IPPS design p(s), the idea behind the
proposed plan is to get rid of the non-preferred samples S1 by
restricting ourselves to the set S-S1. To attain this objective, following
the idea of Tiwari et al. (2007), we introduce a new design p0(s) given
by
p(s )
, for s ∈ S − S1 ,
1 − ∑ p (s )
,
{
p 0 (s ) =
s∈S1
0,
(3.2.2)
otherwise
where p(s) is the initial uncontrolled IPPS sampling plan.
The design p0(s) assigns zero probability to the non-preferred
samples. Due to practical considerations, one would like to perform
the sampling design p0(s). However, the design p0(s) is probably no
longer an IPPS design and it may be desirable to have an IPPS design
due to theoretical considerations. Therefore, applying the idea of
Gabler (1987), we are interested for an IPPS design p1(s) which is as
near as possible to the design p0(s). To achieve the design p1(s) which
is as near as possible to p0(s) and also satisfies the condition of an
IPPS design, we minimize the directed distance D from the sampling
design p1(s) to p0(s) defined as
114
2
p
p (s)
−1,
D ( p 0 , p1 ) = E p0 ( 1 − 1) 2 = ∑ 1
p0
p0 (s)
s
(3.2.3)
subject to the following constraints:
(i) p1 ( s ) ≥ 0
∑ p (s) = 1
(ii)
s∈S − S1
(iii)
∑ p (s) = π
s ∋i
(3.2.4)
1
1
i
, for p1(s) to be an IPPS design.
The minimization of the objective function (3.2.3) subject
to the constraints (3.2.4) is achieved through quadratic programming
using Microsoft Excel Solver of Microsoft Office 2000 package.
The constraints (i) and (ii) in (3.2.4) are necessary for any
sampling design and the constraint (iii) assures that the resultant
design p1(s) is an IPPS design. We also tried to add one more
constraint ∑ p1 ( s) ≤ π i π j , i<j=1,…,N to (3.2.4), to ensure the nons ∋ i,j
negativity of the Yates-Grundy form of H-T variance estimator and
applied it to all the two-dimensional problems considered by us.
However, in no case did it yield a solution, implying that this
condition is quite stringent to be satisfied in any two-dimensional
controlled selection problem. Consequently, this constraint was
115
dropped and an alternative variance estimator for two-dimensional
controlled selection is suggested in Section 3.4.
The distance measure D(p0 , p1) defined in (3.2.3) is similar to
the χ2-statistics often employed in related problems and is also used
by Cassel and Särndal (1972) and Gabler (1987). Few other distance
measures are also discussed by Takeuchi et al. (1983). Two
alternative distance measures for the present discussion may be
defined as:
D( p 0 , p1 ) = ∑ p 0 ( s ) − p1 ( s )
(3.2.5)
s
D ( p 0 , p1 ) = ∑
s
( p 0 ( s ) − p1 ( s )) 2
.
( p 0 ( s ) + p1 ( s ))
(3.2.6)
These distance measures when applied on different numerical
problems considered by us, we found that the use of (3.2.5) and
(3.2.6) gave similar results to (3.2.3) in convergence and efficiency
and so we will give results using (3.2.3) as the distance measure, as it
is a widely used distance measure for similar problems. However, the
other distance measures may also be used as per the convenience of
the investigator.
116
While all the other two dimensional optimal controlled
selection plans discussed by earlier authors attempt to minimize the
selection probabilities of the non-preferred samples, the proposed plan
completely eliminates the non-preferred samples by assigning zero
probabilities to them. The proposed plan is superior to the plans of
Sitter and Skinner (1994) and Tiwari and Nigam (1998) in the sense
that it assures zero probability to non-preferred samples and is much
nearer to the controlled design (p0(s)), which we wanted to achieve
due to practical considerations. Moreover, the proposed plan also
incorporates the possibility of ‘controls beyond stratification’, which
was not considered by Sitter and Skinner (1994).
One limitation of the proposed plan is that it becomes quite time
consuming for drawing samples from large populations. It can,
however, work very well for sampling in small populations particularly
in field experimentation. This has been demonstrated with the help of a
real life example where two-dimensional stratification is required in
plot sampling in field experiments [see, Example 2; Section 3.3]. The
proposed method may also be used to select a small number of firststage units from each of a large number of strata. This involves a
117
solution of a series of quadratic programming problems each of a
reasonable size.
3.3. EXAMPLES
In this Section, we consider some examples to demonstrate the
utility of the proposed procedure over the existing optimal controlled
selection methods.
Example 1: We first demonstrate the proposed method on a
hypothetical example borrowed from Bryant et al. (1960) given in
Table 1. The desired sample size of n = 10 is less than the total
number of cells, 15.
The integer parts of npi’s are known as certainty proportions.
For example, in cell (1,1) with npi = 1.0, the certainty proportion is 1
which indicates that one unit is to be selected from this cell with
certainty. Similarly, in cell (4,2) with npi = 1.8, the certainty
proportion is 1. The term ‘certainty proportion’ was introduced by
Jessen (1978, p. 396) and was further used by Tiwari and Nigam
(1998, p. 92). To reduce the computation and also to satisfy the
condition that the probability should lie between 0 and 1, we initially
118
remove these certainty proportions and replace them at their original
positions after the set of feasible samples is obtained. This may be
noted here that the value of npi may also be greater than one for a
particular cell, which indicates that more than one unit is to be
selected from that cell. After removing the certainty proportions, we
get the following two-way array given in Table 2.
Table 1
Expected sample cell counts (npi) under proportionate stratification
with n = 10
Type of Community
Regions
Urban
Rural
Metropolitan
Total
1
2
3
4
5
1.0
0.2
0.2
0.6
1.0
0.5
0.3
0.6
1.8
0.8
0.5
0.5
1.2
0.6
0.2
2.0
1.0
2.0
3.0
2.0
Total
3.0
4.0
3.0
10.0
119
Table 2
Expected sample cell counts (npi) for 5x3 population after removing
certainty proportions
∑
Expected Sample Cell Counts (npi)
0.0
0.2
0.2
0.6
0.0
1.0
∑
0.5
0.3
0.6
0.8
0.8
3.0
0.5
0.5
0.2
0.6
0.2
2.0
1.0
1.0
1.0
2.0
1.0
6.00
Now the problem is reduced into selecting 6 units from the
above array. The set of all possible samples consists of 15 C 6 samples,
out of which 4989 samples do not satisfy the marginal constraints of
the 5x3 population. Thus, the set of preferred combinations consists of
only 16 samples, demonstrated in Table 3.
Case I:
To compare our procedure with that of Sitter and Skinner
(1994), we first consider that there are no controls beyond
stratification and therefore there are only 4989 non-preferred samples,
which arise due to marginal constraints of the 5x3 population. To find
out the initial p(s) values we apply the Sampford (1967) plan and get
the following values of p(s).
p(s1) =.000254;
p(s2) = .000633;
120
p(s3) = .003275;
p(s4) = .000080;
p(s5) = .002271;
p(s6) = .001048;
p(s7) = .000168;
p(s8) = .000080;
p(s9) = .000421;
p(s10) =.000917;
p(s11) = .004716;
p(s12) =.002271;
p(s13) = .00042;
p(s14) =.002189;
p(s15) = .001048;
p(s16) =.011527.
Now to assign the zero probability of selection to nonpreferred samples we use the design p0(s) and get the values as
follows
p0(s1)=.0081;
p0(s2)=.0202;
p0(s3)=.01046;
p0(s4)=.0025;
p0(s5)=.0725;
p0(s6)=.0335;
p0(s7)=.0054;
p0(s8)=.0025;
p0(s9)=.0134;
p0(s10)=.0293;
p0(s11)=.1506;
p0(s12)=.0725;
p0(s13)=.0134;
p0(s14)=.0699;
p0(s15)=.0335;
p0(s16)=.3681.
Now we apply the proposed plan as follows.
Min z = 123.39*p1(s)^2+49.46*p2(s)^2+9.56*p3(s)^2+393.77*p4(s)^
2+13.79*p5(s)^2+29.88*p6(s)^2+185.94*p7(s)^2+393.77*p8(s)^2+74
.39*p9(s)^2+34.15*p10(s)^2+6.64*p11(s)^2+13.79*p12(s)^2+74.37*p13
(s)^2+14.3*p14(s)^2+29.88*p15(s)^2+2.71*p16(s)^2-1
Subject to the constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p1
1(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)=1
2. p1(s)+p2(s)+p4(s)+p5(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+p12(s)=
0.5
3. p3(s)+p6(s)+p13(s)+p14(s)+p15(s)+p16(s)=0.5
4. p1(s)+p2(s)+p3(s)=0.2
121
5. p4(s)+p6(s)+p7(s)+p8(s)+p9(s)+p13(s)+p14(s)+p15(s)=0.3
6. p5(s)+p10(s)+p11(s)+p12(s)+p16(s)=0.5
7. p4(s)+p5(s)+p6(s)=0.2
8. p1(s)+p3(s)+p7(s)+p10(s)+p11(s)+p13(s)+p14(s)+p16(s)=0.6
9. p2(s)+p8(s)+p9(s)+p12(s)+p15(s)=0.2
10. p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+
p16(s)=0.6
11. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p8(s)+p10(s)+p12(s)+p13(s)+
p15(s)+p16(s)=0.8
12. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p9(s)+p11(s)+p14(s)=
0.6
13. p2(s)+p3(s)+p5(s)+p6(s)+p9(s)+p11(s)+p12(s)+p14(s)+p15(s)+p16(s)
= 0.8
14. p1(s)+p4(s)+p7(s)+p8(s)+p10(s)+p13(s)=0.2
15. pi ( s) ≥ 0
for i = 1,2,…16.
After solving above model, we get the selection probabilities
of the 16 preferred samples, given in Table 3, with the value of
D( p 0 , p1 ) as 0.611986.
Sitter and Skinner (1994) also solved the above 5x3
population and obtained the following results:
p(1) = 0.2;
p(2) = 0;
p(3) = 0;
p(4) =0;
p(5) = 0;
p(6) = 0.2;
p(7) = 0;
p(8) = 0;
p(9) = 0;
p(10) = 0;
p(11)=0.1;
p(12) = 0.2;
p(13) = 0;
p(14)=0.1;
p(15) = 0;
p(16)=0.2 and φ = the probability of non-preferred samples = 0.
122
Table 3
Selection probabilities (p1(s)) of the preferred samples for the 5x3
population using the proposed plan
S. No.
2
3
4
5
6
Included 1 1 0
1 1 0
1 0 1
1 1 0
1 1 0
1 0 1
elements 1 0 0
1 0 0
1 0 0
0 1 0
0 0 1
0 1 0
0 1 1
0 0 2
0 1 1 1 0 1
1 0 1
1 0 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
1 0 1
1 1 0
1 1 0 1 0 1
1 1 0
1 1 0
0.03043
0.03931 0.13026 0.01166
p1(s)
S. No.
1
0.06967
8
9
10
11
12
Included 1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
Elements 0 1 0
0 1 0
0 1 0
0 0 1
0 0 1
0 0 1
0 1 1
0 0 2
0 0 2
0 1 1
0 1 1
0 0 2
1 1 1
1 2 0
1 1 1
1 2 0
1 1 1
1 2 0
1 0 1
1 0 1
1 1 0
1 0 1
1 1 0
1 1 0
0.01898
0.0102
0.02311
0.08411
0.08658 0.07696
p1(s)
S. No.
7
0.11867
14
15
16
Included 1 0 1
1 0 1
1 0 1
1 0 1
elements 0 1 0
0 1 0
0 1 0
0 0 1
0 1 1
0 1 1
0 0 2
0 1 1
1 2 0
1 1 1
1 2 0
1 2 0
1 0 1
1 1 0
1 1 0
1 1 0
0.04461
0.07133
0.050429
0.133691
p1(s)
13
123
Here we find that the probability of non-preferred samples is
zero for the Sitter and Skinner’s (1994) method also, but many of the
preferred samples such as sample number 2, 3, 4, etc. have also been
assigned zero probability by their method. Moreover, on substituting
the values of p1(s)’s obtained by the method of Sitter and Skinner
(1994) in D( p 0 , p1 ) , we get the distance between p0(s) and the IPPS
plan proposed by Sitter and Skinner (1994) as 6.001075, which is
larger than the value of D( p 0 , p1 ) for the proposed plan (that is,
0.611986). This shows that when there are no controls beyond
stratification, although both the plans achieve the marginal
constraints, the proposed IPPS plan appears to perform better than the
plan of Sitter and Skinner (1994), as it is much nearer to the ideal
controlled plan (p0(s)). Moreover, the proposed plan guarantees zero
probability to non-preferred samples, while the plan of Sitter and
Skinner (1994) only attempts to minimize it.
Case II:
Now we consider the situation of ‘controls beyond
stratification’. Based on the considerations similar to those of Tiwari
and Nigam (1998) and Avadhani and Sukhatme (1973), suppose that
if all the three units 4th, 8th and 12th or 6th, 8th and 10th do not appear in
a sample, then the sample is a non-preferred sample. Thus the set of
124
preferred combinations consists of only 10 samples i.e. the sample
numbers 1, 3, 5, 7, 9, 10, 11, 13, 14 and 16. Now applying the
proposed plan to the modified problem, we get the following results:
p(1)= 0.0;
p(3)=0.2;
p(5)=0.2;
p(7)=0.0;
p(9)=0.2;
p(10)=0.1;
p(11)=0.0;
p(13)=0.1;
p(14)=0.0;
p(16)=0.2.
This ensures zero probability to non-preferred samples with
D( p 0 , p1 ) = 3.262855.
Solution of the above problem using the method of Tiwari and Nigam
(1998) gives the following results:
p(1)=0.1;
p(3)=0.1;
p(5)=0.2;
p(7)=0.0;
p(9)=0.2;
p(10)=0.0;
p(11)=0.0;
p(13)=0.1;
p(14)=0.0;
p(16)=0.3;
and the probability of non-preferred samples (φ) is zero.
After replacing the values of p1(s) obtained by the method of
Tiwari and Nigam (1998) in D( p 0 , p1 ) we get the distance between
p0(s) and the IPPS plan proposed by Tiwari and Nigam (1998) as
D( p 0 , p1 ) = 3.882095, which is larger than that obtained for the
proposed plan. This shows that the proposed IPPS plan is nearer to
p0(s) and therefore appears to perform better than the IPPS plan of
Tiwari and Nigam (1998). Moreover, the proposed plan always
ensures zero probability to non-preferred samples, whereas the plan of
Tiwari and Nigam (1998) only attempts to minimize the probability of
non-preferred samples.
125
Example 2: Now we produce an example in which the probability of
non-preferred samples (φ) is not equal to zero for Tiwari and Nigam
(1998) plan, while the proposed plan always gives φ =0.
Consider a 4x3 hypothetical population given in Table 4. The
desired sample size of n = 8 is less than the total number of cells, 12.
Table 4
The expected sample cell counts (npi) for the 4x3 population
Expected cell counts (npi) Total
Total
0.8
0.5
0.7
2.0
0.8
0.9
0.3
2.0
0.7
0.7
0.6
2.0
0.7
0.9
0.4
2.0
3.0
3.0
2.0
8.0
The set of all possible samples consists of 12 samples
demonstrated in Table 5. Let the set of non-preferred samples consists
of those samples that do not contain all the three units 1st, 5th and 9th
or 3rd, 5th and 7th. Thus the sample numbers 6th and 9th will become the
non-preferred samples.
Applying the proposed plan to this population, we get the
following results:
p(1) = 0.0462;
p(2) = 0.0829;
p(3) = 0.0121;
p(4) = 0.0589;
p(5) = 0.1000;
p(6) = 0;
p(7) = 0.0417;
p(8) = 0.1583;
126
p(9) = 0;
p(10) = 0.1171;
p(11) = 0.2538; p(12) = 0.1290;
with D(p0 , p1) = 1.751808 and the probability of selecting nonpreferred samples,
(φ)= p(6) + p(9) = 0.
Table 5
The set of all possible samples for the 4x3 population
S. No.: 1
2
3
4
5
6
Inc.
x x .
x x .
x x .
x x .
x x .
x x .
Ele.
x x .
x x .
. x x
. x x
x . x
x . x
. x x
x . x
x x .
x . x
x x .
. x x
x . x
. x x
x . x
x x .
. x x
x x .
S. No.:
7
8
9
10
11
12
Inc. . x x
. x x
. x x
x . x
x . x
x . x
x x .
x x .
x . x
x x .
x x .
. x x
Ele. x x .
x . x
x x .
x x .
. x x
x x .
x . x
x x .
x x .
. x x
x x .
x x .
_____________________________________________________________________
We have also solved this problem using the method of Tiwari
and Nigam (1998) and obtained the following results:
p(1) = 0.1;
p(2) = 0.2;
p(3) = 0;
p(4) = 0;
p(5) = 0;
p(6) = 0;
p(7) = 0;
p(8) = 0.1;
p(9) = 0.1;
p(10) = 0.1;
p(11) = 0.2; p(12) = 0.2;
and (φ)= p(6) + p(9) = 0.1.
127
Thus for this example, the method of Tiwari and Nigam (1998)
assigns a probability of 0.1 to non-preferred samples, whereas the
proposed method always assures zero probability to non-preferred
samples.
Example 3: Now we consider a real life application where twodimensional stratification is required in plot sampling in field
experiments.
Consider the yields (in tons) of wheat given in Table 6 for an
experiment involving 4 blocks (B1, B2, B3 and B4) and 4 treatments
(T1, T2, T3 and T4).
Table 6
Yield (in tons) for the 4x4 experiment
T1
T2
T3
T4
Total
B1
11.25
18.75
13.75
6.25
50.00
B2
16.25
3.75
6.25
23.75
50.00
B3
16.50
6.50
18.75
8.25
50.00
B4
6.00
21.00
11.25
11.75
50.00
50.00
50.00
50.00
200.00
Total 50.00
128
Table 7
Expected Sample Cell Counts (npi) for 4x4 population
Expected Sample Cell Counts (npi)
∑
∑
0.45
0.75
0.55
0.25
2.0
0.65
0.15
0.25
0.95
2.0
0.66
0.26
0.75
0.33
2.0
0.24
0.84
0.45
0.47
2.0
2.0
2.0
2.0
2.0
8.0
A sample of size n = 8 was selected from this population. The
expected sample cell counts (npi) for this 4x4 population are
demonstrated in Table 7. The set of all possible samples satisfying the
marginal requirements of the 4x4 array consists of 90 samples.
Suppose due to considerations of travel and organization of fieldwork,
the set of non-preferred samples consists of the samples that contain
three or more diagonal elements in them. Thus the set of nonpreferred combinations consists of 57 samples. The remaining 33
preferred samples are demonstrated in Table 8.
129
Table 8
The Set of preferred combinations with their selection probabilities
(p1(s)) using the proposed plan for 4x4 population
S. No.
1
2
3
4
5
6
Included 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
1 1 0 0
0 1 1 0
elements 0 0 1 1 0 0 1 1 0 0 1 1 1 0 0 1
1 0 1 0
1 1 0 0
1 1 0 0 0 0 1 1 1 0 0 1 0 0 1 1
0 1 0 1
1 0 0 1
0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0
0 0 1 1
0 0 1 1
p1(s)
0.023925 0.073749 0.051726 0.114597 0.000641 0.003188
S. No.
7
8
9
10
11
12
Included 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
0 1 1 0
Elements 0 1 1 0 0 0 1 1 1 0 0 1
1 0 0 1 1 0 0 1
1 0 0 1
p1(s)
S. No
1 0 0 1
1 0 0 1 1 1 0 0
0 1 1 0 0 0 1 1
1 0 0 1
1 0 0 1
1 1 0 0 0 0 1 1
1 0 0 1 1 1 0 0
0 1 1 0
0.000384 0.007685
13
0.03233
0.061156
0.0000
0.0000
15
16
17
18
14
0 1 1 0 0 1 1 0 0 1 1 0
0 1 1 0 0 0 1 1 0 0 1 1
Included 1 0 0 1 1 0 0 1 1 0 1 0
0 1 0 1
1 1 0 0 1 1 0 0
Elements 1 0 1 0 0 1 0 1 1 0 0 1
1 0 0 1
1 1 0 0 0 0 1 1
0 1 0 1
p1(s)
1 0 1 0 0 1 0 1 1 0 1 0
0.257761 0.000339
0.0049
130
0 0 1 1
1 1 0 0
0.010486 0.000425
0.001588
S. No.
19
20
21
22
23
24
Included 0 0 1 1 0 0 1 1 0 0 1 1 1 0 0 1
1 0 0 1
1 0 1 0
elements 1 1 0 0 1 0 0 1 0 1 0 1
0 1 1 0
1 0 0 1
0 0 1 1
1 0 0 1 1 1 0 0 1 0 1 0 1 0 0 1
0 1 1 0
1 1 0 0
0 1 1 0
0 1 1 0
0 1 0 1
p1(s)
0 1 1 0 1 1 0 0 0 1 1 0
0.001867 0.00327 0.057485 0.000456 0.077772
S. No.
25
26
27
28
0.050686
29
30
Included 1 0 1 0 1 0 1 0 1 0 1 0
1 0 1 0
0 1 0 1
0 1 0 1
elements 1 0 0 1 1 0 1 0 0 1 0 1
0 1 0 1
1 1 0 0
1 0 0 1
0 1 0 1 0 1 0 1 1 0 0 1
0 1 0 1
0 0 1 1
1 0 1 0
0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0
1 0 1 0
0 1 1 0
p1(s)
S. No.
0.007147 0.001349 0.046992 0.00096
31
32
33
Included 0 1 0 1 0 1 0 1
0 1 0 1
elements 1 0 1 0
1 0 1 0
0 1 0 1
1 0 0 1
1 0 1 0
1 0 1 0
0 1 1 0
0 1 0 1
1 0 1 0
p1(s)
0.000702 0.04646
0.001242 0.033256
0.025465
Applying the proposed plan to this 4x4 population, the selection
probabilities of these 33 preferred combinations are also demonstrated
in Table 8. The distance between the proposed plan (p1(s)) and the
controlled plan (p0(s)) was obtained as D(p0 , p1) = 5.373178.
131
This problem was also solved by the method of Tiwari and
Nigam (1998). Incidentally, the probability of selecting non-preferred
samples (φ) comes out to be zero for their method also and the
distance between their plan and the controlled plan (p0(s)) was
obtained as 200.8496. Thus, the proposed method always guarantees
zero probability to non-preferred samples and is also nearer to the
desired controlled plan p0(s).
Further examples were considered to analyze the performance
of the proposed plan. Detail description of these examples has been
given in appendix. The probabilities of selecting the non-preferred
samples (φ) and the distance between the controlled and the resultant
IPPS plan [D(p0, p1)] for the proposed plan and the plan of Tiwari and
Nigam (1998) are given in Table 9.
Table 9 shows that while the plan of Tiwari and Nigam (1998)
only attempts to minimize the probability of non-preferred samples,
the proposed plan always ensures zero probability to non-preferred
samples. The values of D(p0, p1) are also found to be lesser for the
proposed plan for all the examples, which shows that the proposed
plan is much nearer to the controlled plan which we wished to achieve
due to practical considerations.
132
Table 9
The probabilities of selecting the non-preferred samples (φ) and the
distance between the controlled and the resultant IPPS plan [D(p0, p1)]
for the proposed plan and the plan of Tiwari and Nigam (1998) [T-N]
Proposed Plan
T-N plan
Example 4(3x3
population, N=9,
n=6)
Example
5(4x4population,
N=16, n=8)
Example 6 (8x3
population,
N=24, n=10)
Example 7 (3x3
population,
N=9, n=3)
φ
D(p0, p1)
φ
D(p0,p1)
0
23.178
0
22.87
0
132.0628
0
0.70611
0
9.296
0
2.1929
.1
.257972
0
.195885
3.4 VARIANCE ESTIMATION FOR THE PROPOSED
PLAN
In this section, using the idea of random group method for
variance
estimation
originally
propounded
by
Mahalanobis
(1939,1946), we propose an estimator for two-dimensional controlled
selection problems, which appears to perform better than the split
sample estimator of Jessen (1973) and the estimator proposed by
133
Tiwari and Nigam (1998). This has been demonstrated with the help
of few examples.
As an alternative to H-T estimator, Jessen (1973) has
suggested the use of split sample estimator. The split sample estimator
of Jessen (1973) is useful in the situations where the stability
condition of the H-T estimator or the non-negativity condition of Y-G
form of H-T estimator is not satisfied. However, the Jessen’s split
sample estimator is negatively biased and bias is found to be quite
high. To overcome this difficulty, Tiwari and Nigam (1998) have
proposed a method for variance estimation for two-dimensional
controlled selection problems. This variance estimator was found to
be positively biased but the bias were quite low in comparison to split
sample estimator of Jessen (1973). Another limitation of the variance
estimators suggested by Jessen (1973) and Tiwari and Nigam (1998)
is that exactly two units from each row and column are required to
obtain an estimator of population total and variance of the estimator.
If two units from each row and column are not available, we cannot
use any of the above two methods. To overcome this difficulty, we
suggest an alternative method for variance estimation that can be used
134
even for the situations where exactly two units are not available from
each row and column.
In the proposed method of variance estimation, we first form
the random groups of the selected units in the sample. The random
group must not be formed in a purely arbitrary fashion. Random
groups should be formed so that each random group has essentially
the same sampling design as the parent sample. There are different
rules for forming the random groups for different sampling designs.
For detail about these rules, one may refer to Walter (1985).
To construct the random groups from a sample of size n from
a population of N units, let us suppose that each random group is of
size m (m ≥ 2). Then the number of random groups to be formed will
be k = n/m. To select the first random group, we draw a simple
random sample with out replacement (SRSWOR) of size m from the
parent sample of size n. To obtain the second random group, we
again draw a SRSWOR of size m from the remaining n-m units in the
sample. This process is repeated until the k random groups are drawn.
If n/k is not an integer i.e. n=mk+q; with 0<q<k, then the q excess
units may be left out and only k random groups are considered for
135
further estimation purposes. To reduce the computation, we consider
the random groups of size m = 2 for the present discussion.
Using the proposed method, an unbiased estimator of
population total is given by
∧
k
y i1
i =1
π i1
Y RG = ∑ (
+
yi 2
π i2
),
(3.4.1)
where yi1 and yi2 are the observations from the i th random group and
π i1 and
π i 2 are their corresponding inclusion probabilities. An
∧
estimator of the variance of Y is given by
∧
N
∧
k
y i1
i =1
π i1
Var (Y RG ) = (1 − n∑ pi )(1 / 4)∑ (
2
i =1
−
yi 2
π i2
)2 ,
(3.4.2)
N
where (1 − n∑ pi 2 ) is an approximate finite population correction
i =1
factor and k is the number of random groups.
The proposed method of variance estimation can be used for
square as well as for rectangular two-way populations and works
equally well even for the situations where the units selected from each
row and column are not fixed and equal.
To demonstrate the utility of the proposed variance estimator
and compare it with the Jessen’s (1973) split sample estimator and the
136
estimator proposed by Tiwari and Nigam (1998), we consider the
following examples.
Example 4.1: Let us first consider a 3x3 population, demonstrated in
∧
Table 12 in Appendix. Values of Var ( Y ) obtained by the split sample
(S-S) estimator, the estimator proposed by Tiwari and Nigam (1998)
(T-N) and the proposed method are produced in Table 10.
The actual value of Y for this population is 123/20. From
Table 10, we have
5
∧
∧
E (Y RG ) = ∑ p( s )(Y ) s = 123 / 20 = Y
i =1
Thus
∧
∧
Y RG is an unbiased estimate of Y. The expected value of
∧
Var(Y RG ) for proposed estimator is
∧
∧
E [ Var(Y RG ) ] =
5
∧
∧
∑ p(s) Var(Y )
i =1
s
= 0.0596319 .
∧
The true value of Var ( Y ) for this population is 0.0581, which
shows that the proposed estimator is positively biased. The bias of the
proposed estimator is lowest among the three estimators, showing that
the proposed estimator performs better than the Jessen’s (1973) split
137
sample estimator and the estimator proposed by Tiwari and Nigam
(1998).
Table 10
∧
Comparison of different estimators of Var ( Y ) for 3x3 population
∧
∧
Var(Y )
∧
p(s)
( Y )s
1
0.3
19/3
0.034445
0.03875
0.064583
2
0.1
35/6
0.008611
0.11625
0.075347
3
0.2
13/2
0.008611
0.025833
0.02368
4
0.2
6
0.137777
0.090417
0.051667
5
0.2
35/6
0.077500
0.099028
0.088264
Sample
∧
∧
S-S
T-N
Proposed Estimator
E [ Var(Y ) ]
0.0559722 0.0663056
0.0596319
Bias
-0.0021278 0.0082056
0.0015319
Var [ Var(Y ) ]
0.0022430 0.0011353
0.0004672
(Bias)2
0.0000045 0.0000673
0.0000023
0.0022475 0.0012025
0.0004695
0.0020022 0.0559667
0.0048988
∧
∧
∧
∧
MSE [ Var(Y ) ]
∧
∧
(Bias)2 / MSE [ Var(Y ) ]
Example 4.2: To further evaluate the utility of the proposed variance
estimator, we consider a 4x4 population demonstrated in Table 13 of
138
∧
∧
Appendix. The values of Var ( Y ) for the three estimators are
presented in Table 11.
From Table 11, we get
20
∧
∧
E (Y RG ) = ∑ p( s )(Y ) s = 10 = the true value of Y.
i =1
∧
Thus Y RG is an unbiased estimate of Y. The expected value of
∧
∧
Var(Y RG ) for the proposed estimator is
∧
∧
E [ Var(Y RG ) ] =
20
∧
∧
∑ p(s) Var(Y )
i =1
s
= 0.255927 .
∧
The true value of Var ( Y ) for this population is 0.24375,
which again shows that the proposed estimator is positively biased
and the bias is lowest for the proposed estimator among the three
estimators considered by us.
139
Table 11
∧
∧
Comparison of different values of the Var ( Y ) for the 4x4 population
∧
∧
Var(Y )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
∧
p(s)
Sample
( Y )s
0.028116
0.153002
0.061919
0.054647
0.008153
0.013501
0.219054
0.026101
0.027504
0.008004
0.045592
0.003484
0.017874
0.112939
0.039660
0.042060
0.003414
0.107901
0.001931
0.025146
∧
10.250
9.8750
9.3750
9.7500
9.7500
9.2500
9.2500
10.375
10.250
9.8750
10.250
10.750
10.250
10.750
10.375
9.7500
10.875
10.875
10.750
10.750
∧
E [ Var(Y ) ]
Bias
∧
∧
Var [ Var(Y ) ]
(Bias)2
∧
∧
MSE [ Var(Y ) ]
∧
∧
(Bias)2 / MSE [ Var(Y ) ]
S-S
T-N
Proposed Estimator
0.578125
0.052031
0.005781
0.208125
0.023125
0.023125
0.023125
0.005781
0.023125
0.052031
0.208125
0.578125
0.023125
0.578125
0.283281
0.023125
0.144531
0.052031
0.578125
0.023125
0.242813
0.199453
0.199453
0.196563
0.208125
0.196563
0.23125
0.095391
0.254375
0.182109
0.439375
0.41625
0.450938
0.404688
0.268828
0.23125
0.268828
0.291953
0.370000
0.346875
0.329531
0.342539
0.221133
0.167656
0.052031
0.248594
0.040469
0.383008
0.398906
0.319414
0.375781
0.387344
0.271719
0.352656
0.359883
0.387344
0.296289
0.307852
0.375781
0.260156
0.139939
-0.103811
0.263878
0.020128
0.255927
0.012177
0.038379
0.010777
0.006944
0.000405
0.016581
0.000148
0.049156
0.007349
0.016729
0.219236
0.0551193
0.0088585
140
The results of the above two examples demonstrate that the
proposed variance estimator may perform better than the split-sample
estimator of Jessen (1973) and the estimator proposed by Tiwari and
Nigam (1998), as it provides lower biases and is also applicable in the
situations where these two estimators can not be applied.
141
APPENDIX 3.0
Example 1 Case II: We have given the following population, with
N= 15 and n=6.
∑
Expected Sample Cell Counts (npi)
∑
0.0
0.5
0.5
1.0
0.2
0.3
0.5
1.0
0.2
0.6
0.2
1.0
0.6
0.8
0.6
2.0
0.0
0.8
0.2
1.0
1.0
3.0
2.0
6.00
All possible samples and the set of preferred samples for the
above population are already defined.
The set of preferred
combinations consists of only 10 samples which are the sample
numbers 1, 3, 5, 7, 9, 10, 11, 13, 14 and 16. Before applying the
proposed plan, we have to find out the values of p(s) and p0(s). These
values are given as follows.
p(s1) =.000254;
p(s2) = .003275;
p(s3) = .002271;
p(s4) = .000168;
p(s5) = .000421;
p(s6) = .000917;
p(s7) = .004716;
p(s8) = .000421;
p(s9) = .002189;
p(s10) =.011527;
Values of p0(s) are given as follows
p0(s1)=.0097;
p0(s2)=.1251;
142
p0(s3)=.0868;
p0(s4)=.0064;
p0(s5)=.0161;
p0(s6)=.035;
p0(s9)=.0837;
p0(s10)=.4406;
p0(s7)=.1802;
p0(s8)=.0160;
The objective function and the constraints of the proposed
model are given as follows.
Min z = 103.07*p1(s)^2+7.98*p2(s)^2+11.5*p3(s)^2+155.32*p4(s)^2+
62.12*p5(s)^2+28.52*p6(s)^2+5.54*p7(s)^2+62.12*p8(s)^2+11.94*p9
(s)^2+2.26*p10(s)^2-1
Subject to the constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=1
2. p1(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)=.5
3. p2(s)+p8(s)+p9(s)+p10(s)=.5
4. p1(s)+p2(s)=.2
5. p4(s)+p5(s)+p8(s)+p9(s)=.3
6. p3(s)+p6(s)+p7(s)+p10(s)=.5
7. p3(s)=.2
8. p1(s)+p2(s)+p4(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=.6
9. p5(s)=.2
10. p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=.6
11. p1(s)+p2(s)+p3(s)+p6(s)+p8(s)+p10(s)=.8
12. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p7(s)+p9(s)=.6
13. p2(s)+p3(s)+p5(s)+p7(s)+p9(s)+p10(s)=.8
14. p1(s)+p4(s)+p6(s)+p8(s)=.2
15. pi ( s) ≥ 0 for i = 1,2,…,10.
143
After solving above model, we get the desired results, which
are already given in example 1.
Example 2: Consider a 4x3 hypothetical population given in
following table, with N= 12 and n=8.
Expected cell counts (npi)
Total
0.8
0.5
0.7
2.0
0.8
0.9
0.3
2.0
0.7
0.7
0.6
2.0
0.7
0.9
0.4
2.0
Total 3.0
3.0
2.0
8.0
The set of all possible samples consists of 12 samples, already
demonstrated in Table 5. As considered earlier sample number 6 and
9 are non-preferred samples. Values of p(s) for the preferred samples
are given as follows.
p(s1) =.000203;
p(s2) = .000721;
p(s3) = .00003;
p(s4) = .000293;
p(s5) = .00006;
p(s6) = .000184;
p(s7) = .00135;
p(s8) = .002291;
p(s9) = .004663;
p(s10) =.000941;
Values of p0(s) are given as follows
p0(s1)=.01886;
p0(s2)=.06715;
p0(s3)=.003627;
p0(s4)=.02728;
p0(s5)=.00601;
p0(s6)=.017114;
p0(s7)=.125676;
p0(s8)=.213269;
p0(s9)=.434154;
p0(s10)=.087592;
144
Now the proposed model can be written in the following form.
Min z = 53.1*p1(s)^2+14.9*p2(s)^2+275.91*p3(s)^2+36.68*p4(s)^2+1
66.5*p5(s)^2+58.47*p6(s)^2+7.96*p7(s)^2+4.69*p8(s)^2+2.3*p9(s)^2
+11.42*p10(s)^2-1
Subject to the constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=1
2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p8(s)+p9(s)+p10(s)=.8
3. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)=.5
4. p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=.7
5. p1(s)+p2(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)=.8
6. p1(s)+p2(s)+p3(s)+p4(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)=.9
7. p3(s)+p4(s)+p5(s)+p10(s)=.3
8. p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p10(s)=.7
9. p1(s)+p3(s)+p5(s)+p6(s)+p8(s)+p9(s)+p10(s)=.7
10. p1(s)+p2(s)+p4(s)+p7(s)+p9(s)=.6
11. p1(s)+p3(s)+p4(s)+p6(s)+p7(s)+p9(s)+p10(s)=.7
12. p2(s)+p4(s)+p5(s)+p7(s)+p8(s)+p9(s)+p10(s)=.9
13. p1(s)+p2(s)+p3(s)+p5(s)+p6(s)+p8(s)=.4
14. pi ( s) ≥ 0 for i = 1,2,…,10.
After solving above model, we get the results displayed in example 2.
Example 3: Consider the following population. We have to select a
sample of size n=8 from N=16 population units. The set of all possible
samples consists of 90 samples. The set of non-preferred samples are
145
already defined in example 3. The set of preferred samples consists of
33 samples demonstrated in table 8.
∑
Expected Sample Cell Counts (npi)
∑
0.45
0.75
0.55
0.25
2.0
0.65
0.15
0.25
0.95
2.0
0.66
0.26
0.75
0.33
2.0
0.24
0.84
0.45
0.47
2.0
2.0
2.0
2.0
2.0
8.0
Values of p(s) for the preferred samples are given as follows.
p(s1) =.00003;
p(s2) = .000139;
p(s3) = .000224;
p(s4) = .001646;
p(s5) = .0000008;
p(s6) = .000003;
p(s7) = .0000002;
p(s8) = .000134;
p(s9) = .000221;
p(s10) =.000137;
p(s11) =.000986;
p(s12) = .001586;
p(s13) = .008835;
p(s14) = .00002;
p(s15) = .00003;
p(s16) = .00001;
p(s17) = .0000003;
p(s18) = .000001;
p(s19) = .000002;
p(s20) =.000151;
p(s21) =.00005;
p(s22) = .0000003;
p(s23) = .000157;
p(s24) = .00007;
p(s25) = .00009;
p(s26) = .000002;
p(s27) = .00005;
p(s28) = .0000007;
p(s29) = .0000005;
p(s30) =.002522;
p(s31) =.00001;
p(s32) = .00006;
p(s33) = .00002;
Values of p0(s) are given as follows
p0(s1)=.0018;
p0(s2)=.0081;
p0(s3)=.013;
p0(s4)=.0956;
p0(s5)=.0001;
p0(s6)=.0002;
p0(s7)=.0001;
p0(s8)=.0078;
p0(s9)=.0128;
p0(s10)=.0079;
p0(s11)=.0572;
p0(s12)=.0921;
146
p0(s13)=.5130;
p0(s14)=.0013;
p0(s15)=.0022;
p0(s16)=.0007;
p0(s17)=.0001;
p0(s18)=.0001;
p0(s19)=.0001;
p0(s20)=.0088;
p0(s21)=.0029;
p0(s22)=.0001;
p0(s23)=.0091;
p0(s24)=.0044;
p0(s25)=.0057;
p0(s26)=.0001;
p0(s27)=.0031;
p0(s28)=.0001;
p0(s29)=.0001;
p0(s30)=.1464;
p0(s31)=.0005;
p0(s32)=.0034;
p0(s33)=.0011;
Now the proposed model can be written in the following form
Min z = 562.5*p1(s)^2+124.2*p2(s)^2+76.8*p3(s)^2+10.5*p4(s)^2+19
429.42*p5(s)^2+4892.4*p6(s)^2+61258.7*p7(s)^2+128.8*p8(s)^2+77.
9*p9(s)^2+126*p10(s)^2+17.4*p11(s)^2+10.8*p12(s)^2+1.9*p13(s)^2+7
36.4*p14(s)^2+460.2*p15(s)^2+1366.8*p16(s)^2+54010.8*p17(s)^2+11
733.6*p18(s)^2+7203.03*p19(s)^2+113.9*p20(s)^2+341.7*p21(s)^2+53
461*p22(s)^2+109.84*p23(s)^2+226.6*p24(s)^2+176.3*p25(s)^2+7768.
5*p26(s)^2+326.5*p27(s)^2+23495*p28(s)^2+29359*p29(s)^2+6.8*p30(
s)^2+1675.6*p31(s)^2+286.8*p32(s)^2+848.8*p33(s)^2-1
Subject to the constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11
(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+
p21(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p30
(s)+p31(s)+p32(s)+p33(s)=1
2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+
p27(s)+p28(s)=.45
147
3. p1+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+
p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p29(s)+p30(s)+p31(s)+p32(s)+p33
(s)=.75
4. p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+
p16(s)+p17(s)+p18(s)+p19(s)+p20(s)+p21(s)+p24(s)+p25(s)+p26(s)+p27
(s)+p28(s)=.55
5. p17(s)+p18(s)+p19(s)+p20(s)+p21(s)+p22(s)+p23(s)+p29(s)+p30(s)+p31
(s)+p32(s)+p33(s)=.25
6. p4(s)+p5(s)+p6(s)+p9(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+
p17(s)+p18(s)+p19(s)+p20(s)+p23(s)+p25(s)+p26(s)+p29(s)+p30(s)+p31
(s)+p32(s)=.65
7. p6(s)+p7(s)+p16(s)+p17(s)+p18(s)+p19(s)+p21(s)+p22(s)+p27(s)+p28(s)
+p29(s)+p33(s)=.15
8. p1(s)+p2(s)+p3(s)+p5(s)+p7(s)+p8(s)+p15(s)+p22(s)+p24(s)+p26(s)+
p31(s)+p32(s)=.25
9. p1(s)+p2(s)+p3(s)+p4(s)+p8(s)+p9(s)+p10(s)+p11(s)+p12(s)+p13(s)+
p14(s)+p16(s)+p20(s)+p21(s)+p23(s)+p24(s)+p25(s)+p27(s)+p28(s)+p30
(s)+p33(s)=.95
10. p1(s)+p3(s)+p6(s)+p7(s)+p8(s)+p9(s)+p12(s)+p13(s)+p15(s)+p16(s)+
p17(s)+p19(s)+p20(s)+p21(s)+p22(s)+p24(s)+p27(s)+p30(s)+p31(s)+p32
(s)+p33(s)=.66
11. p1(s)+p5(s)+p9(s)+p10(s)+p14(s)+p17(s)+p20(s)+p23(s)+p24(s)+p25(s)
+p26(s)+p28(s)=.26
12. p2(s)+p4(s)+p10(s)+p11(s)+p13(s)+p18(s)+p21(s)+p23(s)+p29(s)+p30(s)
+p32(s)+p33(s)=.75
148
13. p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p11(s)+p12(s)+p14(s)+
p15(s)+p16(s)+p18(s)+p19(s)+p22(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29
(s)+p31(s)=.33
14. p2(s)+p7(s)+p8(s)+p10(s)+p11(s)+p14(s)+p16(s)+p18(s)+p21(s)+p28(s)
+p29(s)+p33(s)=.24
15. p2(s)+p3(s)+p4(s)+p8(s)+p11(s)+p12(s)+p13(s)+p15(s)+p18(s)+p19(s)+
p20(s)+p21(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p30(s)+p31
(s)+p32(s)=.84
16. p1(s)+p3(s)+p4(s)+p5(s)+p6(s)+p9(s)+p12(s)+p14(s)+p16(s)+p17(s)+
p19(s)+p20(s)+p22(s)+p23(s)+p25(s)+p27(s)+p28(s)+p29(s)+p30(s)+p31
(s)+p33(s)=.45
17. p1(s)+p5(s)+p6(s)+p7(s)+p9(s)+p10(s)+p13(s)+p15(s)+p17(s)+p24(s)+
p26(s)+p32(s)=.47
18. pi ( s) ≥ 0 for i = 1,2,…,33.
After solving above model, we get the results displayed in table 8.
Example 4: Consider the population given in Table 12 consisting of 9
units in a 3x3 population borrowed from Jessen (1970; p. 778).
Table 12
Basic data for 3x3 population
πi= npi = 6pi
i
∑
∑
Yi
1 2 3
0.8
0.5
0.7
2.0
5.6
3.0
6.3
14.9
4 5 6
0.7
0.8
0.5
2.0
4.2
3.2
4.0
11.4
7 8 9
0.5
0.7
0.8
2.0
1.5
3.5
5.6
10.6
∑
2.0
2.0
2.0
6.0
11.3
9.7
15.9
36.9
149
∑
A sample of size 6 is to be drawn from this population.
The set of all possible samples are given as follows.
x . x
x x .
. x x
x x .
x . x
. x x
(1)x x .
(2). x x
(3)x . x
(4) x . x
(5). x x
(6)x x .
. x x
x . x
x x .
. x x
x x .
x . x
Suppose the samples in which all the three units 1st, 5th and 9th
do not appear are non-preferred samples. Thus the sample number (4)
is non-preferred. Now the values of p(s) are given as follows.
p(s1) =.062811;
p(s2) = .006922;
p(s4) = .008973;
p(s5) = .008973;
p(s3) = .008973;
Values of p0(s) are given as follows
p0(s1)=.649866;
p0(s2)=.0716183;
p0(s4)=.09284;
p0(s5)=.09284.
p0(s3)=.09284;
The objective function and the constraints of the proposed
model for this example are given as follows.
Min z = 1.53*p1(s)^2+13.96*p2(s)^2+10.77*p3(s)^2+10.77*p4(s)^2+1
0.77*p5(s)^2-1
Subject to the constraints
1.
p1(s)+p2(s)+p3(s)+p4(s)+p5(s)=1
2.
p1(s)+p2(s)+p3(s)+p4(s)=.8
3.
p2(s)+p3(s)+p5(s)=.5
150
4.
p1(s)+p4(s)+p5(s)=.7
5.
p1(s)+p3(s)+p5(s) =.7
6.
p1(s)+p2(s)+p4(s)+p5(s)=.8
7.
p2(s)+p3(s)+p4(s)=.5
8.
p2(s)+p4(s)+p5(s)=.5
9.
p1(s)+p3(s)+p4(s)=.7
10. p1(s)+p2(s)+p3(s)+p5(s)=.8
pi ( s) ≥ 0
11.
for i = 1,2,…,5.
After solving above model, we get following results
p(1)= 0.3;
p(2)=0.1;
p(3)=0.2;
p(4)=0.2;
p(5)=0.2
with D( p 0 , p1 ) =.57.
Example 5: Now let us consider a 4x4 population consisting of 16
units borrowed from Jessen (1978; p. 375). A sample of size 8 is to be
drawn from this population. Basic data for this population is given in
Table 13.
Table 13
Basic data for the 4x4 population
πi= npi = 8pi
i
∑
Yi
∑
1 2 3 4
0.0 0.6 1.0 0.4
2.0
0.0 1.05 2.25 0.8 4.10
5 6 7 8
0.8 0.4 0.4 0.4
2.0
0.6 0.70 0.65 0.4 2.35
9 10 11 12
0.6 0.2 0.4 0.8
2.0
0.6 0.20 0.50 0.6 1.90
13 14 15 16
0.6 0.8 0.2 0.4
2.0
0.3 0.80 0.25 0.3 1.65
∑
2.0 2.0 2.0 2.0
8.0 ∑ 1.5 2.75 3.65 2.1 10.00
151
After removing certainty proportions, we get following two way table
∑
∑
0.0
0.8
0.6
0.6
0.6
0.4
0.2
0.8
0.0
0.4
0.4
0.2
0.4
0.4
0.8
0.4
1.0
2.0
2.0
2.0
2.0
2.0
1.0
2.0
7.0
All possible combinations are given as follows.
. x . .
. x . .
. x . ..
x . . .
. x . .
x x . .
x x . .
x . x .
x . x .
x . . x
(1)x . . x
(2). . x x
(3)x . . x
(4). x . x
(5)x . x .
. . x x
x . . x
. x . x
x . . x
. x . x
. x . .
. x . .
. x . .
. x . .
. x . .
x . . x
x . . x
x . . x
x . . x
x . . x
(6)x . . x
(7)x x . .
(8). x x .
(9). x . x
(10). . x x
. x x .
. . x x
x . . x
x . x .
x x . .
. x . .
. x . .
. x . .
. x . .
. x . .
. x x .
. x . x
. x . x
. . x x
. . x x
(11)x . . x
(12)x . x .
(13)x . . x
x . . x
x . . x
x . x .
x . . x
x x . .
. . . x
. . . x
. . . x
. . . x
. . . x
x x . .
xx . .
x x . .
x x . .
x x . .
(16). . x x
(17)x x . .
(18). x x .
(19). x . x
(20)x . . x
x . . x
x . x .
. x x .
x x . .
. . x x
152
(14)x x . . (15)x . . x
. . . x
. . . x
. . . x
. . . x
. . . x
x x . .
x . x .
x . x .
x . . x
x . . x
(21)x . x .
(22)x x . .
(23). x . x
(24)x x . .
(25). x x .
. x . x
. x . x
x x . .
. x x .
x x . .
. . . x
. . . x
. . . x
. . . x
. . . x
. x x .
. x x .
. x . x
. x . x
. . x x
(26)x x . .
x . . x
(27)x . . x
(28)x x . .
x x . .
x . x .
(29)x . x .
x x . .
(30)x x . .
x x . .
Suppose the samples that contain either all the diagonal
elements or does not contain any of the diagonal elements are nonpreferred samples. Thus the sample number 2,3,4,7,8,10,11,12,13,14,1
6,17,18,19,20,21,22,25,27 and 28 are preferred samples. The values of
p(s) are given as follows.
p(s1) =.0015;
p(s2) = .0105;
p(s3) = .0015;
p(s4) = .0038;
p(s5) = .0001;
p(s6) = .0003;
p(s7) = .0195;
p(s8) = .0015;
p(s9) = .0006;
p(s10) =.0001;
p(s11) =.0094;
p(s12) = .0001;
p(s13) = .0003;
p(s14) = .0038;
p(s15) = .0007;
p(s16) = .00071;
p(s17) = .0001;
p(s18) = .0038;
p(s19) = .0001;
p(s20) =.0007;
Values of p0(s) are given as follows
p0(s1)=.0222;
p0(s2)=.2880;
p0(s3)=.0222;
p0(s4)=.0554;
p0(s5)=.0016;
p0(s6)=.0042;
p0(s7)=.2880;
p0(s8)=.0222;
p0(s9)=.0088;
p0(s10)=.0016;
p0(s11)=.1378;
p0(s12)=.000
153
p0(s13)=.0042;
p0(s14)=.0554;
p0(s15)=.0105;
p0(s16)=.0105
p0(s17)=.0008;
p0(s18)=.0554;
p0(s19)=.0003;
p0(s20)=.0105;
The proposed model for this example can be defined as follows.
Min z = 45.14*p1(s)^2+3.47*p2(s)^2+45.14*p3(s)^2+18.05*p4(s)^2+6
08.25*p5(s)^2+240.76*p6(s)^2+3.47*p7(s)^2+45.14*p8(s)^2+113.3*p9
(s)^2+608.25*p10(s)^2+7.25*p11(s)^2+1300.14*p12(s)^2+240.76*p13(s
)^2+18.05*p14(s)^2+95.59*p15(s)^2+95.59*p16(s)^2+1300.14*p17(s)^
2+18.05*p18(s)^2+3301.95*p19(s)^2+95.59*p20(s)^2-1
Subject to the constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+
p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20
(s)=1
2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)= 0.6
3. p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+
p20(s)=0.4
4. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p11(s)+p12(s)+p13(s)
+p14(s)+p15(s)+p16(s)=0.8
5. p1(s)+p8(s)+p9(s)+p11(s)+p12(s)+p13(s)+p14(s)+p17(s)+p18(s)+p19(s)
+p20( s)=0.4
6. p2(s)+p3(s)+p8(s)+p10(s)+p15(s)+p17(s)+p18(s)=0.4
7. p4(s)+p5(s)+p6(s)+p7(s)+p9(s)+p10(s)+p16(s)+p19(s)+p20(s)=0.4
8. p1(s)+p2(s)+p4(s)+p5(s)+p8(s)+p9(s)+p10(s)+p12(s)+p14(s)+p15
(s)+p17(s) +p18(s)+p19(s)+p20(s)=0.6
154
9. p3(s)+p5(s)+p6(s)+p10(s)+p12(s)+p13(s)+p15(s)+p16(s)+p17(s)+p19
(s)=0.2
10. p4(s)+p6(s)+p7(s)+p11(s)+p16(s)+p20(s)=0.4
11. p1(s)+p2(s)+p3(s)+p7(s)+p8(s)+p9(s)+p11(s)+p13(s)+p14(s)+p18(s)=.8
12. p3(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p11(s)+p13(s)+p16(s)+p17
(s)+p18(s) +p19(s)+p20(s)=0.6
13. p2(s)+p4(s)+p7(s)+p11(s)+p14(s)+p15(s)+p16(s)+p18(s)+p20(s) =0.8
14. p1(s)+p5(s)+p9(s)+p12(s)+p13(s)+p14(s)+p19(s)=0.2
15. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p8(s)+p10(s)+p12(s)+p15(s)
+p17(s)= 0.4
16. pi ( s) ≥ 0 for i = 1,2,…,20.
After solving above model, we get following results
p(1)= 0.028116; p(2)=0.153002; p(3)=0.061919;
p(4)=0.0546;
p(5)= 0.008153; p(6)=0.013501; p(7)=0.219054;
p(8)=0.26101;
p(9)=0.27504;
p(10)= 0.008;
p(11)=0.04559;
p(12)=0.0034;
p(13)=0.0179;
p(14)=0.1129;
p(15)=0.0396;
p(16)=0.042;
p(17)=0.0034;
p(18)=0.1079;
p(19)=0.0019;
p(20)=0.02514;
with the value of D( p 0 , p1 ) as 0.70611.
Example 6: Now we consider an 8x3 population borrowed from
Causey et. al. (1985: p. 906) consisting of 24 units and a sample of
size n=10 is to be drawn from it. Basic data for this population is
reproduced in Table 14.
155
Table 14
Basic data for 8x3 population
πi= npi = 10pi
Unit (i)
1
4
7
10
13
16
19
22
2
5
8
11
14
17
20
23
3
6
9
12
15
18
21
24
∑
0.4
1.2
0.2
1.2
1.0
0.0
0.0
0.0
4.0
2.0
0.0
0.0
0.4
0.6
0.4
0.2
0.0
3.6
∑
0.0
1.0
0.0
0.2
0.2
0.4
0.4
0.2
2.4
2.40
2.20
0.20
1.80
1.80
0.80
0.60
0.20
10.00
After removing certainty proportions, we get following two way table.
∑
∑
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.4
0.6
0.4
0.2
0.0
1.6
0.0
0.0
0.0
0.2
0.2
0.4
0.4
0.2
1.4
0.4
0.2
0.2
0.8
0.8
0.8
0.6
0.2
4.0
The set of all possible combinations consists of 141 samples
out of which the samples having two consecutive units in a column
are considered as non-preferred. Thus the set of preferred sample
combination s are given as follows.
x . .
. . .
. . .
. x .
(1). . x
. x .
. . .
. . .
x . .
. . .
. . .
. x .
(2). . .
. x .
. . x
. . .
x . .
. . .
. . .
. x .
(3). . .
. x .
. . .
. . x
x . .
. . .
. . .
. x .
(4). . x
. . .
. x .
. . .
156
x . .
. . .
. . .
. x .
(5). . .
. . x
. x .
. . .
x . .
. . .
. . .
. x .
(6). . .
. . .
. x .
. . x
x . .
. . .
. . .
. . x
(7). x .
. . .
. x .
. . .
x . .
. . .
. . .
(13) . . x
. . .
. x .
. . x
. . .
x
.
.
.
(8).
.
.
.
. .
. .
. .
. .
x .
. x
x .
. .
x . .
x . .
x . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . x
. . x
(9). x . (10). x . (11). . .
. . .
. . x
. . x
. x .
. . .
. x .
. . x
. . .
. . .
x . .
. . .
. . .
. . x
(12). x .
. . .
. . x
. . .
x . .
x . .
x . .
x . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
(14). . x (15). . x (16). . x (17). x .
. x .
. . .
. . .
. . x
. . .
. x .
. . .
. . .
. . .
. . .
. x .
. . x
. . x
. . x
. . x
. . .
x . .
. . .
. . .
(18). . .
. . x
. x .
. . x
. . .
x . .
x . .
. . .
. . .
. . .
. . .
(19). x . (20). . .
. . x
. . x
. . .
. x .
. . .
. . .
. . x
. . x
. . .
. . .
x . .
x . .
. . .
. . .
(25). x . (26). x .
. . x
. . .
. x .
. x .
. . .
. . x
. . .
. . .
. . .
x . .
. . .
(31). . x
. x .
. . .
. x .
. . .
. . .
x . .
. . .
(32). . .
. x .
. . x
. x .
. . .
x . .
. . .
. . .
(21). . .
. . x
. . .
. x .
. . x
x
.
.
(23).
.
.
.
.
. .
x . .
. .
. . .
. .
. . .
. . (24) . . .
x.
. . .
. x
. . x
. .
. x .
. x
. . x
. .
. . .
. .
x . .
. .
. . .
x . (28). x .
. .
. . x
x .
. . .
. .
. x .
. x
. . .
.
x
.
(29).
.
.
.
.
. .
. . .
. .
x . .
. .
. . .
x . (30). x .
. .
. . .
. x
. . .
x .
. x .
. .
. . x
. . .
. . .
x . .
x . .
. . .
. . .
(33). . . (34). . x
. x .
. x .
. . .
. . x
. x .
. . .
. . x
. . .
.
x
.
(35).
.
.
.
.
. .
. .
. .
. x
. .
. x
x .
. .
.
x
.
(27).
.
.
.
.
x . .
. . .
. . .
(22). x .
. . .
. . x
. . .
. . x
157
. . .
x . .
. . .
(36). . x
. x .
. . .
. . x
. . .
. . .
x . .
. . .
(37). . x
. . .
.x .
. . x
. . .
.
x
.
(38).
.
.
.
.
. . .
x . .
. . .
. x .
(43). . x
. . .
. . .
. . x
. .
. .
x .
. x
(49). .
. x
. .
. .
. .
. .
. .
. x
x .
. .
. .
. x
. . .
. . .
x . .
x . .
. . .
. . .
(39). . x (40). . x
. . .
. . .
. x .
. . .
. . .
. x .
. . x
. . x
.
.
.
.
x
.
x
.
.
x
.
(42) .
.
.
.
.
. .
. .
. .
. .
. x
x .
. x
. .
. . .
x . .
. . .
. . .
(44). . x
. x .
. . .
. . x
. . .
. . .
. . .
x . .
x . .
x . .
. . .
. . .
. . .
. . .
.x .
. . .
(45). . x (46). . . (47). x .
. . .
. . x
. . x
. x .
. . .
. . .
. . x
. . x
. . x
.
x
.
.
(48).
.
.
.
. .
. .
. .
. .
. .
. x
x .
. x
.
.
.
.
x
.
.
.
. . .
. . .
x . .
. x .
(50). . .
. x .
. . x
. . .
. . .
. . .
. . .
. . .
x . .
x . .
. x .
. x .
(51). . . (52). . x
. x .
. . .
. . .
. x .
. . x
. . .
. . .
. . .
x . .
. x .
(53). . .
. . x
. x .
. . .
. . .
. . .
x . .
. x .
(54) . . .
. . .
. x .
. . x
. . .
. . .
x . .
. . x
(55). x .
. . .
. x .
. . .
. . .
. . .
x . .
. . .
(56). x .
. . x
. x .
. . .
. . .
. . .
. . .
. . .
x . .
x . .
. . .
. . x
(57). x . (58). x .
. . .
. . x
. x .
. . .
. . x
. . .
. . .
. . .
x . .
. . x
(59). . .
. . x
. x .
. . .
. . .
. . .
x . .
. . x
(60). x .
. . .
. . x
. . .
. . .
. . .
x . .
. . x
(61). . .
. x .
. . x
. . .
. . .
. . .
x . .
. . x
(62). . .
. x .
. . .
. . x
. . .
. . .
. . .
. . .
x . .
x . .
. . x
. x .
(63). . . (64). . x
. . .
. . .
. x .
. . x
. . x
. . .
. . .
. . .
x . .
. . .
(65). . x
. x .
. . x
. . .
. . .
. . .
x . .
. x .
(66). . x
. . .
. . .
. . x
158
.
x
.
(41).
.
.
.
.
.
.
.
x
.
.
.
.
.
.
x
.
(67).
.
.
.
. .
. .
. .
. .
. x
x .
. .
. x
. . .
. . .
x . .
. . .
(68). . x
. . .
. x .
. . x
. . .
. . .
. . .
. . .
x . .
x . .
. x .
. . .
(69). . . (70). x .
. . x
. . x
. . .
. . .
. . x
. . x
. . .
. . .
x . .
. . .
(71). . .
. . x
. x .
. . x
. . .
. . .
. . .
x . .
(72). x .
. . x
. x .
. . .
. . .
. . .
. . .
(73)x . .
. x .
. . .
. x .
. . x
. . .
. . .
. . .
(74)x . .
. . x
. x .
. . x
. . .
.
.
.
(75)x
.
.
.
.
. .
. .
. .
(77)x .
. x
. .
. .
. .
. . .
. . .
. . .
(78)x . .
. . .
. . x
. x .
. . x
. .
. . .
. .
. . .
. .
. . .
. . (76)x . .
. x
. . x
x .
. . .
. .
. x .
. x
. . x
.
.
.
.
.
x
.
x
The values of p(s) are given as follows.
p(s1) =.0020;
p(s2) = .0050;
p(s3) = .0020;
p(s4) = .0008;
p(s5) = .0020;
p(s6) = .0008;
p(s7) = .0017;
p(s8) = .0042;
p(s9) = .0017;
p(s10) =.0042;
p(s11) =.0008;
p(s12) = .0042;
p(s13) = .0020;
p(s14) = .0017;
p(s15) = .0008;
p(s16) = .0003;
p(s17) = .0020;
p(s18) = .0020;
p(s19) = .0008;
p(s20) =.0008;
p(s21) =.0003;
p(s22) = .0020;
p(s23) = .0042;
p(s24) = .0008;
p(s25) = .0008;
p(s26) = .0020;
p(s27) = .0008;
p(s28) = .0003;
p(s29) = .0008; p(s30) =.0003;
p(s31) =.0007;
p(s32) = .0017;
p(s33) = .0007;
p(s34) = .0017;
p(s35) = .0003;
p(s36) = .0017;
p(s37) = .0008;
p(s38) = .0007;
p(s39) = .0003;
p(s40) =.0001;
p(s41) =.0008;
p(s42) = .0008;
p(s43) = .0003;
p(s44) = .0003;
p(s45) = .0001;
p(s46) = .0008;
p(s47) = .0017;
p(s48) = .0003;
p(s51) =.0008;
p(s52) = .0003
p(s49) = .0008; p(s50) =.0020;
159
p(s53) = .0008; p(s54) = .0003;
p(s55) = .0007;
p(s56) = .0017;
p(s57) = .0007;
p(s58) = .0017;
p(s59) = .0003;
p(s60) =.0017;
p(s61) =.0008;
p(s62) = .0003;
p(s63) = .0001;
p(s64) = .0008;
p(s65) = .0008;
p(s66) = .0003;
p(s67) = .0003; p(s68) = .0001;
p(s69) = .0008;
p(s70) =.0017;
p(s71) =.0003;
p(s73) = .0007;
p(s74) = .0008;
p(s75) = .0003; p(s76) = .0001;
p(s77) = .0017;
p(s78) = .0003.
p(s72) = .0017;
Values of p0(s) are given as follows
p0(s1)=.0223;
p0(s2)=.0548;
p0(s3)=.0223;
p0(s4)=.009;
p0(s5)=.0223;
p0(s6)=.009;
p0(s7)=.0188;
p0(s8)=.0462;
p0(s9)=.0188;
p0(s10)=.0462;
p0(s11)=.009;
p0(s12)=.0462;
p0(s13)=.0223;
p0(s14)=.0188;
p0(s15)=.009;
p0(s16)=.0036;
p0(s17)=.0223;
p0(s18)=.0223;
p0(s19)=.009;
p0(s20)=.009
p0(s21)=.0036;
p0(s22)=.0223;
p0(s23)=.0462;
p0(s24)=.009
p0(s25)=.009;
p0(s26)=.0223;
p0(s27)=.009;
p0(s28)=.0036
p0(s29)=.009;
p0(s30)=.0036;
p0(s31)=.0076;
p0(s32)=.0188
p0(s33)=.0076;
p0(s34)=.0188;
p0(s35)=.0036;
p0(s36)=.0188
p0(s37)=.009;
p0(s38)=.0076;
p0(s39)=.0036;
p0(s40)=.0014
p0(s41)=.009;
p0(s42)=.009;
p0(s43)=.0036;
p0(s44)=.0036
p0(s45)=.0014;
p0(s46)=.009;
p0(s47)=.0188;
p0(s48)=.0036
p0(s49)=.009;
p0(s50)=.0223;
p0(s51)=.009;
p0(s52)=.0036
p0(s53)=.009;
p0(s54)=.0036;
p0(s55)=.0076;
p0(s56)=.0187
p0(s57)=.0077;
p0(s58)=.0188;
p0(s59)=.0036;
p0(s60)=.0187
p0(s61)=.009;
p0(s62)=.0036;
p0(s63)=.0014;
p0(s64)=.0089
160
p0(s65)=.009;
p0(s66)=.0036;
p0(s67)=.003
p0(s68)=.0014
p0(s69)=.009;
p0(s70)=.0188;
p0(s71)=.0036;
p0(s72)=.0187
p0(s73)=.0076;
p0(s74)=.009;
p0(s75)=.0036;
p0(s76)=.0014
p0(s77)=.0188;
p0(s78)=.0036
The proposed model for this example can be defined as follows.
Min z = 44.91*p1(s)^2+18.24*p2(s)^2+44.91*p3(s)^2+111.21*p4(s)^2
+44.91*p5(s)^2+111.21*p6(s)^2+53.22*p7(s)^2+21.62*p8(s)^2+53.22
*p9(s)^2+21.62*p10(s)^2+111.21*p11(s)^2+21.62*p12(s)^2+44.91*p13(
s)^2+53.22*p14(s)^2+111.21*p15(s)^2+276.79*p16(s)^2+44.91*p17(s)^
2+44.91*p18(s)^2+111.21*p19(s)^2+111.21*p20(s)^2+276.79*p21(s)^2
+44.91*p22(s)^2+21.62*p23(s)^2+111.21*p24(s)^2+111.21*p25(s)^2+4
4.91*p26(s)^2+111.21*p27(s)^2+276.79*p28(s)^2+111.21*p29(s)^2+27
6.79*p30(s)^2+131.8*p31(s)^2+53.22*p32(s)^2+131.8*p33(s)^2+53.22*
p34(s)^2+276.79*p35(s)^2+53.22*p36(s)^2+111.21*p37(s)^2+131.8*p38
(s)^2+276.79*p39(s)^2+691.98*p40(s)^2+111.21*p41(s)^2+111.21*p42
(s)^2+276.79*p43(s)^2+276.79*p44(s)^2+691.98*p45(s)^2+111.21*p46
(s)^2+53.22*p47(s)^2+276.79*p48(s)^2+111.21*p49(s)^2+44.91*p50(s)
^2+111.21*p51(s)^2+276.79*p52(s)^2+111.21*p53(s)^2+276.79*p54(s)
^2+131.8*p55(s)^2+53.22*p56(s)^2+131.8*p57(s)^2+53.22*p58(s)^2+2
76.79*p59(s)^2+53.22*p60(s)^2+111.21*p61(s)^2+276.79*p62(s)^2+69
1.98*p63(s)^2+111.21*p64(s)^2+111.21*p65(s)^2+276.79*p66(s)^2+27
6.79*p67(s)^2+691.98*p68(s)^2+111.21*p69(s)^2+53.22*p70(s)^2+276.
79*p71(s)^2+53.22*p72(s)^2+131.8*p73(s)^2+111.21*p74(s)^2+276.79
*p75(s)^2+691.98*p76(s)^2+53.22*p77(s)^2+276.79*p78(s)^2-1
161
Subject to the constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p1
1(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(
s)+p21(s)+p22(s)+p23(s)+p24(s)+p25(s)+p26(s)+p27(s)+p28(s)+p29(s)
+p30(s)+p31(s)+p32(s)+p33(s)+p34(s)+p35(s)+p36(s)+p37(s)+p38(s)+
p39(s)+p40(s)+p41(s)+p42(s)+p43(s)+p44(s)+p45(s)+p46(s)+p47(s)+p4
8(s)+p49(s)+p50(s)+p51(s)+p52(s)+p53(s)+p54(s)+p55(s)+p56(s)+p57(
s)+p58(s)+p59(s)+p60(s)+p61(s)+p62(s)+p63(s)+p64(s)+p65(s)+p66(s)
+p67(s)+p68(s)+p69(s)+p70(s)+p71(s)+p72(s)+p73(s)+p74(s)+p75(s)+
p76(s)+p77(s)+p78(s)=1
2. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p10(s)+p1
1(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p17(s)+p18(s)+p19(s)+p20(
s)+p21(s)+p22(s)+p23(s)+p24(s)=.4
3. p25(s)+p26(s)+p27(s)+p28(s)+p29(s)+p30(s)+p31(s)+p32(s)+p33(s)+p3
4(s)+p35(s)+p36(s)+p37(s)+p38(s)+p39(s)+p40(s)+p41(s)+p42(s)+p43(
s)+p44 (s)+p45(s)+p46(s)+p47(s)+p48(s)=.2
4. p49(s)+p50(s)+p51(s)+p52(s)+p53(s)+p54(s)+p55(s)+p56(s)+p57(s)+p5
8(s)+p59(s)+p60(s)+p61(s)+p62(s)+p63(s)+p64(s)+p65(s)+p66(s)+p67(
s)+p68 (s)+p69(s)+p70(s)+p71(s)=.2
5. p72(s)+p73(s)+p74(s)+p75(s)+p76(s)+p77(s)+p78(s)=.2
6. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)+p6(s)+p17(s)+p19(s)+p22(s)+p25(s)
+p26(s)+p27(s)+p28(s)+p29(s)+p30(s)+p41(s)+p43(s)+p46(s)+p49(s)+
p50(s)+ p51(s)+p52(s)+p53(s)+p54(s)+p64(s)+p66(s)+p69(s)=.4
162
7. p7(s)+p10(s)+p11(s)+p12(s)+p13(s)+p14(s)+p15(s)+p16(s)+p31(s)+p34
(s)+p35(s)+p36(s)+p37(s)+p38(s)+p39(s)+p40(s)+p55(s)+p58(s)+p59(s
)+p60 (s)+p61(s)+p62(s)+p63(s)=.2
8. p7(s)+p8(s)+p9(s)+p10(s)+p12(s)+p14(s)+p23(s)+p31(s)+p32(s)+p33(s
)+p34(s)+p36(s)+p38(s)+p47(s)+p55(s)+p56(s)+p57(s)+p58(s)+p60(s)
+p70(s)+p72(s)+p73(s)+p77(s)=.6
9. p1(s)+p4(s)+p17(s)+p18(s)+p19(s)+p20(s)+p21(s)+p25(s)+p28(s)+p41(
s)+p42(s)+p43(s)+p44(s)+p45(s)+p49(s)+p52(s)+p64(s)+p65(s)+p66(s)
+p67(s)+p68(s)+p74(s)+p75(s)+p76(s)=.2
10. p1(s)+p2(s)+p3(s)+p13(s)+p15(s)+p18(s)+p20(s)+p25(s)+p26(s)+p27(s
)+p37(s)+p39(s)+p42(s)+p44(s)+p49(s)+p50(s)+p51(s)+p61(s)+p62(s)
+p65(s)+p67(s)+p74(s)+p75(s)=.4
11. p5(s)+p8(s)+p10(s)+p11(s)+p22(s)+p23(s)+p24(s)+p29(s)+p32(s)+p34(
s)+p35(s)+p46(s)+p47(s)+p48(s)+p53(s)+p56(s)+p58(s)+p59(s)+p69(s)
+p70(s)+p71(s)+p72(s)+p77(s)+p78(s)=.4
12. p4(s)+p5(s)+p6(s)+p7(s)+p8(s)+p9(s)+p11(s)+p16(s)+p21(s)+p24(s)
+p28(s)+p29(s)+p30(s)+p31(s)+p32(s)+p33(s)+p35(s)+p40(s)+p45(s)+
p48(s)+p52(s)+p53(s)+p54(s)+p55(s)+p56(s)+p57(s)+p59(s)+p63(s)+p6
8(s)+p71(s)+p72(s)+p73(s)+p76(s)+p78(s)=.2
13. p2(s)+p12(s)+p13(s)+p17(s)+p18(s)+p26(s)+p36(s)+p37(s)+p41(s)+p42
(s)+p50(s)+p60(s)+p61(s)+p64(s)+p65(s)+p74(s)=.4
14. p3(s)+p6(s)+p9(s)+p14(s)+p15(s)+p16(s)+p19(s)+p20(s)+p21(s)+p22(s
)+p23(s)+p24(s)+p27(s)+p30(s)+p33(s)+p38(s)+p39(s)+p40(s)+p43(s)
+p44(s)+p45(s)+p46(s)+p47(s)+p48(s)+p51(s)+p54(s)+p57(s)+p62(s)+
p63(s)+p66(s)+p67(s)+p68(s)+p69(s)+p70(s)+p71(s)+p73(s)+p75(s)+p7
6(s)+p77(s)+
p78(s)=.2
163
15. pi ( s) ≥ 0 for i = 1,2,…,78.
After solving above model, we get following results
p(1)= 0.106451;
p(2)=0.100122;
p(8)=0.046253;
p(12)=0.101397
p(23)= 0.045779;
p(25)=0.046825;
p(26)=0.050308;
p(32)=0.02685;
p(36)=0.049532
p(47)= 0.026474;
p(49)=0.04672;
p(50)=0.049571;
p(56)=0.026453;
p(60)=0.049064
p(70)= 0.028191;
p(72)=0.100437;
p(77)=0.099552.
The value of p(s) for all other preferred sample combinations
comes out to be 0 with the value of D( p 0 , p1 ) as 2.1929.
Example 7: We consider a 3x3 hypothetical population with N = 9
and n = 3. The expected sample cell counts (nAi) for this population is
given in Table 15.
Table 15
Expected Sample Cell Counts (npi) for 3x3 population
∑
Expected Sample Cell Counts
(nAi)
∑
0.2
0.2
0.6
0.3
0.4
0.3
0.5
0.4
0.1
1.0
1.0
1.0
1.0
1.0
1.0
3.0
According to above population the set of all possible samples are
given as follows.
164
x . .
x . .
. x .
. x .
. . x
. . x
(1). x .
(2). . x
(3)x . .
(4) . . x
(5)x . .
(6). x .
. . x
. x .
. . x
x . .
. x .
x . .
The set of non-preferred samples consists of those samples in
which all the three units 1st, 5th and 9th appear together. Thus nonpreferred sample is sample number1. Values of p(s) for preferred
sample combinations are given as follows.
p(s1) =.07729;
p(s2) = .270527;
p(s3) = .05797;
p(s4) = .167469; p(s5) = .025764.
Values of p0(s) are given as follows
p0(s1)=.129037;
p0(s2)=.451631;
p0(s4)=.27958;
p0(s5)=.043012.
p0(s3)=.096778;
The objective function and the constraints for this population
are given as follows.
Min z = 3.58*p1(s)^2+7.75*p2(s)^2+2.21*p3(s)^2+23.25*p4(s)^2+10.
33*p5(s)^2-1
Subject to the constraints
1. p1(s)+p2(s)+p3(s)+p4(s)+p5(s)=1
2. p1(s)=.2
3. p2(s)+p3(s)=.3
4. p4(s)+p5(s)=.5
5. p2(s)+p4(s)=.2
165
6. p5(s)=.4
7. p1(s)+p3(s)=.4
8. p3(s)+p5(s)=.6
9. p1(s)+p4(s)=.3
10. p2(s)=.1
11. pi ( s) ≥ 0 for i = 1,2,…,5.
After solving above model, we get following results
p(1)= 0.2;
p(2)=0.1;
p(4)=0.1
p(5)= 0.4.
p(3)=0.2;
with the value of D( p 0 , p1 ) as 1.1944.
166
CHAPTER IV
ON STATISTICAL DISCLOSURE CONTROL USING
RANDOM ROUNDING AND CELL PERTURBATION
TECHNIQUES
4.1 INTRODUCTION
Statistical offices collect information about society. The most
common method of providing data to the public is through statistical
tables. Each entry in a table is called a cell. In some situations, it is
required that the statistical offices do not disclose in any way the
information provided by the individual respondent. The release of
statistical data inevitably reveals some information about individual
data subject. When confidential information is revealed disclosure
occurs. Thus statistical offices need to protect the confidentiality of
data it collects. Not all the data collected and published by the
statistical offices are confidential. The statistical offices have to
protect only confidential data. The cells in a table containing
confidential data are termed as “Sensitive cells” and all other cells are
167
termed as “non-sensitive cells”. In sensitive cells, we assume the
existence of individuals who may analyze the published pattern to
disclose the confidential information. These individuals are referred to
as “Attackers” (or “Intruders” or “Snoopers”). If there exists more
than one attacker in a cell, the problem is referred to as “Multiattacker” problem. On the other hand the problem with only one
attacker in a cell is referred to as “Single-attacker” Problem. Attackers
can also be categorized as “External attacker” and “Internal attacker”.
External attacker knows the set of linear system My = b and the
information that the cell values are non-negative. Internal attacker
knows the set of linear system My = b and also the tighter bounds
(lower and upper bounds) on cell values. Before publishing any
information, statistical offices face two problems. The first problem is
of identifying the sensitive cells in a table. Identification of sensitive
cells is carried out through several rules such as threshold rule, linear
sensitivity rule, p percent rule, p-q percent rule etc. This problem has
been discussed in details by Cox (1980, 1981), Willenborg and Waal
(2001) and Merola (2003a). The second problem is of protecting the
confidential
information
contained
in
sensitive
cells,
while
minimizing the loss of information. This problem is generally termed
168
as “Disclosure control”. In this chapter, we concern ourselves with the
problem of disclosure control with single internal attacker. The
confidential information can be protected by the application of
statistical disclosure limitation methods, which ensure that the risk of
disclosing confidential information is very low, while minimizing the
loss of information. Several disclosure control techniques are used in
the literature to achieve the required protection of confidential
information. Two widely used techniques of disclosure control are
“Controlled rounding” and “Cell suppression”.
Rounding techniques involve the replacement of the original
data by multiples of a given rounding base. Controlled rounding
problem is the problem of optimally rounding real valued entries in a
tabular array to adjacent integer values in a manner that preserves the
tabular structure of the array. Rounding methods are used for many
purposes, such as for improving the readability of data values, to
control statistical disclosure in tables, to solve the problem of iterative
proportional fitting (or raking) in two-way tables and controlled
selection. Statistical disclosure control is one of the area in which
rounding methods are widely used. Fellegi (1975) proposed a
technique for random rounding which unbiasedly rounds the cell
169
values and also maintains the additivity of the rounded table. The
drawback of the random rounding procedure proposed by Fellegi
(1975) is that it is applicable to one-dimensional tables only. Cox and
Ernst (1982) used the transportation theory in linear programming to
obtain an optimal controlled rounding of a two way tabular array.
Using
the
general
theory
of
transportation
problems
they
demonstrated that solutions always exist to the controlled rounding
problems. They also showed that their technique guarantees optimal
solutions to the zero-restricted controlled rounding problem, i.e., the
controlled rounding in which the absolute difference between the
original values and the rounded values is always less than the
rounding base, subject to the restriction that the integer values are
rounded to themselves. Causey, Cox and Ernst (1985) summarized the
idea of Cox and Ernst (1982) and used the transportation theory to
solve the controlled rounding problem. They discussed several
statistical applications in which controlled rounding can be used and
applied the concept of controlled rounding to solve the controlled
selection problem. They also showed that the zero-restricted
controlled rounding problem in three dimensional tables is not always
feasible. Cox (1987) presented a constructive algorithm for achieving
170
unbiased controlled rounding which is simple to implement by hand.
He also discussed a controlled rounding problem in three dimensions
and provided a counter example to the existence of unbiased
controlled rounding in three dimensions. Tiwari and Nigam (1988)
improved the method of Cox (1987) to terminate it in fewer steps.
Another method widely used by different researchers for
protecting sensitive cells in a table is the method of “Cell
suppression”, in which sensitive cells are not published i.e. they are
suppressed. These suppressed sensitive cells are called primary
suppressions. To make sure that the primary suppressions cannot be
derived by subtraction from published marginal totals, additional cells
are selected for suppressions, which are known as complementary
suppressions or secondary suppressions. Remaining cells in the table
are published with their original values. The problem of cell
suppression is to find out the complementary suppressions in such a
way that the loss of information is minimum. This problem has been
widely discussed by Cox (1980, 1995), Sande (1984), Carvalho et.al.
(1994) and Fischetti and Salazar (2000). In cell suppression, a large
amount of information is lost as in addition to suppression of sensitive
cells, some non-sensitive cells are also suppressed. To reduce the loss
171
of information, Fischetti and Salazar (2003) proposed an improved
methodology, known as “Partial cell suppression”. In partial cell
suppression,
instead
of
wholly
suppressing
primary
and
complementary suppressed cells, some intervals obtained with the
help of a mathematical model, are published for these cell entries. In
partial cell suppression, the published intervals must provide a
convenient set of possible values for the corresponding entries,
containing the true original values. The loss of information in partial
cell suppression is smaller in comparison to complete cell
suppression. In order to reduce the amount of data loss that occurs
from cell suppressions, Salazar (2005) proposed an improved method
and termed it as “Cell perturbation”. This method is closely related to
the classical controlled rounding methods and has the advantage that
it also ensures the protection of sensitive cells to a specified level,
while minimizing the loss of information. However, this method has
some disadvantages also. Firstly, it perturbs all the cell values
resulting a large amount of data loss. Secondly, the marginal cell
values of the resultant tables are not preserved, thereby disturbing the
marginal which are non-sensitive and expected to be published in
their original form.
172
In this chapter, we use the idea of random rounding and Integer
Quadratic Programming to propose an improved methodology for
disclosure control in an array that perturbs only the sensitive cells and
adjusts some non-sensitive cells to preserve the marginal values of the
array. The table obtained through the proposed procedure guarantees
the protection level requirement and also attempts to minimize the
information loss by minimizing the distance between the original and
final table.
In section 4.2, we first describe the problem of attacker and the
protection of sensitive cells and then introduce the proposed
methodology of disclosure control against the single internal attacker.
The proposed methodology appears to perform better than the
procedure suggested by Salazar (2005). In section 4.3, we discuss
some numerical examples to demonstrate the utility of the proposed
procedure.
173
4.2 CONTROLLED CELL PERTURBATION: THE
PROPOSED METHODOLOGY
Let A denote the tabular array
(a )
(a )
(a )
pq mXn
p . mX1
(a.. )1X1
. q 1Xn
.
The tabular array A can be represented with the help of a vector
a= (ai : i ∈ I ) , where a1 = a11, a2 = a12, a3 = a13 …so on are all nonnegative integers and I is the set of all elements including internal,
marginal and grand total, consisting of mn+m+n+1 elements with the
structure Ma=0, i.e.,
1
0
.
.
.
1
0
1
1
0
.
.
.
0
1
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
0
.
.
.
.
.
1
−1
0
.
.
.
.
.
0
0
1
.
.
.
1
.
1
0
1
.
.
.
0
1
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
1
.
.
.
.
.
1
0
−1
.
.
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
174
0
.
.
.
.
−1
.
.
0
.
.
.
.
0
−1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
 a1 
 a 2  0 

 0 
.

 0 
.

 0 
.
 a
 0 
n

 0 
 a n +1  0
 a

0   an + 2  00
n+3
 
0 
 0 
.
. 
 0 
.
. 

 = 0 
.
. 

 0 
a
0
2 n +1
0   a 2 n + 2  00
  
− 1 
.

 0 
.

 0 
.

 0 
 a mn + m +1  0
 a mn + m + 2  0

 0 
.

 0 
.
.

 0 
a mn + m + n +1 
.
The vector a= (ai : i ∈ I ) satisfies the linear system My=b and
contains some sensitive cells also. Let us denote the subset of
sensitive cells by S. Let there be r sensitive cells each having one
internal attacker denoted by ks (s= 1…r). The set of attackers in
different sensitive cells is denoted by K. Now suppose that by
observing the published pattern, attacker ks will compute the
interval ( y s k ... y s k ) , where y s k is the minimum and y s k is the
s
s
s
s
maximum value of the interval. The sensitive cell s will be protected
against the attacker ks if the interval computed by the attacker ks is
wide enough. To decide whether the interval computed by the attacker
ks is wide enough or not we need three parameters defined as follows:
Upper protection level: It is a number UPLs k representing a
s
desired lower bound for y s k − a s .
s
Lower protection level: It is a number LPLs k representing a
s
desired lower bound for a s − y s k .
s
Sliding protection level: It is a number SPLs k representing a
s
desired lower bound for y s k − y s k .
s
s
175
The values of these parameters are provided by statistical
offices for each sensitive cell and for each attacker ks. These values
can also be defined by using common sense rule [see, Sande (1984)].
Protection values are assumed to be unknown to the attacker. Let us
assume that the attacker ks knows two bounds lbi k and ubi k such that
s
k
s
k
ai ∈ (lbi s ...ubi s ) for each cell i ∈ I . Thus the sensitive cells in the
published table will be protected if
lbi
ks
≤ ys
ks
≤ ai − LPLs
ks
≤ ai ≤ ai + UPLs
ks
≤ ys
ks
≤ ubi
ks
(4.2.1).
This protection level is obtained by satisfying the protection
equations which are obtained with the help of the attacker’s problem.
Suppose the attacker is provided with the information that some
values of the table are rounded to a common rounding base b. Then
the attacker’s problem becomes
∑M
ji
yi = b j
i
xi − b ≤ y i ≤ x i + b
lbi
ks
≤ y i ≤ ubi
ks
(4.2.2)
∀ i∈I
where j represent the number of equations ( j = 1,…,m+n+1 )
and
( xi : i ∈ I ) is the published pattern. The attacker can compute the value
176
of y s k and y s k respectively by maximizing y s k and minimizing y s k ,
s
s
s
s
subject to the constraint (4.2.2).
The published table will be protected if
Maximize [ y s k : (4.2.2) holds] ≥ u s l s k
s
Minimize [ y s k : (4.2.2) holds] ≤ l s l s k
s
(4.2.3)
s
(4.2.4)
s
Maximize [ y s k :(4.2.2) holds]s
Minimize[ y s k : (4.2.2) holds] ≥ SPLs k
s
s
(4.2.5)
where u s l s k = as + UPLs k and l s l s k = as - LPLs k .
s
s
s
s
In order to solve the constraints (4.2.3)-(4.2.5), we convert these
constraints into linear form, using duality theory in linear
programming. Let us consider the dual variables α i 1 , β i 1 , α i 2 , β i 2 and γ j
associated with the inequalities y i ≤ ubi k , − yi ≤ −lbi k , y i ≤ xi + b ,
s
− y i ≤ b − xi
and
∑M
ji
s
y i = b j , respectively. Thus the attacker’s
i
problem Maximize [ y s k : (4.2.2) holds] is equivalent to
s
177
Minimize
∑γ
b j + ∑ [α i ubi
1
j
j
ks
+ α i ( x i + b) − β i lbi
2
1
ks
− β i ( x i − b)]
2
i
(4.2.6)
Subject to the constraints
α s 1 + α s 2 − β s 1 − β s 2 + ∑ M js γ j = 1
for all S
α i 1 + α i 2 − β i 1 − β i 2 + ∑ M ji γ j = 0
for all non - sensitive cells
j
j
α i1 ≥ 0
αi2 ≥ 0
(4.2.7)
β i1 ≥ 0
βi 2 ≥ 0
γ j is unrestricted in sign.
Now (4.2.3) can be written in simplified form as
k
k
Maximize [ y s : (4.2.2) holds] ≥ u s l s
s
s
⇒ Minimize (4.2.6) ≥ u s l s k : all α i 1 , α i 2 , β i 1 , β i 2 , γ j satisfying (4.2.7)
s
⇒Minimize
∑γ
j
b j + ∑ [α i ubi s + α i ( xi + b) − β i lbi s − β i ( xi − b)]
1
j
k
2
1
i
178
k
2
≥ usls k
s
⇒Minimize
∑γ
j
b j + ∑ [α i UBi s + α i xi + α i xi + α i b − β i xi + β i LBi
k
1
j
1
2
2
1
1
ks
i
− β i xi + β i b] ≥ a s + UPLs
2
2
ks
⇒
∑ [α
UBi s + α i ( xi + b − ai ) + β i LBi
1
i
k
2
1
ks
− β i ( xi − b − ai )] ≥ UPLs
2
ks
i
(4.2.8)
where UBi k = ubi k − ai and LBi k = ai - lbi k ,
s
s
s
s
for all α i 1 , α i 2 , β i 1 , β i 2 , γ j satisfying (4.2.7).
Similarly (4.2.4) can be written in simplified form as
∑ [α
UBi s + α i ( xi + b − ai ) + β i LBi
'1
i
k
'2
'1
ks
− β i ( xi − b − ai )] ≥ LPLs
'2
ks
i
(4.2.9)
for all α i '1 , α i '2 , β i '1 , β i '2 and γ j ' satisfying the following
constraints
α s '1 + α s '2 − β s '1 − β s '2 + ∑ M js γ j ' = −1 for all S
j
α i '1 + α i '2 − β i '1 − β i '2 + ∑ M ji γ j ' = 0 for all non-sensitive cells
j
179
α i '1 ≥ 0
α i '2 ≥ 0
(4.2.10)
β i '1 ≥ 0
β i '2 ≥ 0
γ j ' is unrestricted in sign,
and (4.2.5) reduces to
∑ [(α
1
i
+ α i )UBi s + (α i + α i )( xi + b − ai ) + ( β i + β i ) LBi
k
'1
2
'2
1
'1
ks
i
+ ( β i + β i )(ai − xi + b)] ≥ SPLs
2
'2
ks
(4.2.11)
for all α i 1 , α i 2 , β i 1 , β i 2 , γ j satisfying (4.2.7) and α i '1 , α i '2 , β i '1 , β i '2 , γ j '
satisfying (4.2.10).
The conditions obtained through (4.2.8), (4.2.9) and (4.2.11)
ensure upper protection, lower protection and sliding protection,
respectively. Solving (4.2.7) and (4.2.10), we obtain the values of the
dual variables α i 1 , α i 2 , β i 1 , β i 2 , α i '1 , α i '2 , β i '1 and β i '2 . In some situations,
it may happen that some or all of the values of α i 1 , α i 2 , β i 1 , β i 2 come
out to be 0 for one or more sensitive cells. In such situations, we may
not obtain the inequality for upper protection and hence the upper
protection requirement may not be satisfied. Similar situation may
180
arise for the case of lower protection and sliding protection also.
Substituting the values of UBi k , LBi k , UPLs k , LPLs k , SPLs k , α i 1 , α i 2
s
s
s
s
s
, β i 1 , β i 2 , α i '1 , α i '2 , β i '1 , β i '2 ,ai and b in (4.2.8), (4.2.9) and (4.2.11), we
obtain the three simplified inequalities, which consist of only the
variable xi and a constant. If the values for the sensitive cells in the
final table satisfy these inequalities, we say that the table is protected
against the single internal attacker.
Now, we round the sensitive cells unbiasedly to base b. The
rounding base b should be chosen in such a way that it is, as far as
possible, a multiple of the sum of the entries in the sensitive cells.
However, if it is not possible to choose a rounding base, which is a
multiple of sum of the entries in sensitive cells, some other rounding
base may be chosen. The advantage of taking the rounding base, a
multiple of the sum of entries of sensitive cells is that the sum of the
rounded values of the sensitive cells will remain unaltered. Moreover,
we also assure that b should not be a multiple of any of the sensitive
cell values, otherwise that value will be rounded to itself. From these
sets of unbiasedly rounded values, we select the set which satisfy the
simplified inequalities for upper, lower and sliding protection, i.e.,
(4.2.8), (4.2.9) and (4.2.11). If more than one set of unbiasedly
181
rounded values satisfy the protection equations, we choose the set
which has the minimum distortion between the rounded and the
original values, i.e.,
{

2
∑ ( xi − a i )
 i
}
1/ 2



(*)
where ai and xi represents the original and rounded values,
respectively. The sensitive values in the table are then replaced by
these unbiasedly rounded values. After replacing the sensitive cell
values with the rounded values, the resultant table may not be
additive. To make the table additive, some or all of the non-sensitive
cell values are then adjusted from their true value by as small an
amount as possible. This is achieved with the help of the following
model:
m
n
Minimize z = ∑∑
p =1 q =1
x pq
2
a pq
−1
(4.2.12)
Subject to the constraints
n
(i )∑ x pq = X p − ∑ S
q =1
∀ p = 1 …m
q
m
(ii )∑ x pq = X q − ∑ S
p =1
∀ q= 1… n
p
(iii)∑∑ x pq = X pq − ∑∑ S
p
q
p
(4.2.13)
q
182
(iv) lb pq
ks
≤ x pq ≤ ub pq
ks
for all non-sensitive cells
(v) x pq is integer and ≥ 0
Solving (4.2.12) and (4.2.13) by integer quadratic programming
using Microsoft Excel Solver of Microsoft Office 2000 package, we
obtain the required adjusted non-sensitive cell values. In the above
model, Xp and Xq denotes the marginal total of row and column
respectively and X pq is the grand total.
4.3 EXAMPLES
Example 1: Consider the following problem taken from Fellegi
(1975).
12 23 34 3 49 23 50 17 8 13
Let the cell values a4 and a9 are sensitive. We set the values of UBi k
s
and LBi k as
s
k
UBi s = ai
and LBi k = ai/2
s
for all the examples considered in
this chapter.
Let the protection level for a4 provided by statistical office is
UPL4 4 = 2,
k
LPL4 4 = 1, SPL4 4 = 5
k
LPL9 9 = 2, SPL9 9 = 5
UPL9 9 = 4,
k
k
k
k
183
and for a9 is
and b= 5.
In order to find out the protection equations, first we have to
find out the values of α i 1 , α i 2 , β i 1 , β i 2 , α i '1 , α i '2 , β i '1 and β i '2 for the two
sensitive cells. For this purpose, we have to solve (4.2.7) and (4.2.10).
For this example the matrix Mji can be defined as follows
Mji = (1
1
1
1
1
1
1
1
1
1
-1)
Now the equations of (4.2.7) for the sensitive cell a4 can be written as
follows
1.
2.
α 11 + α 1 2 − β11 − β1 2 + γ 1 =0
α 21 + α 2 2 − β 21 − β 2 2 + γ 2 =0
3.
α 31 + α 3 2 − β 31 − β 3 2 + γ 3 =0
4.
α 41 + α 4 2 − β 41 − β 4 2 + γ 4 =1
5.
α 51 + α 5 2 − β 51 − β 5 2 + γ 5 =0
6.
α 61 + α 6 2 − β 61 − β 6 2 + γ 6 =0
7.
α 7 1 + α 7 2 − β 7 1 − β 7 2 + γ 7 =0
8.
α 81 + α 8 2 − β 81 − β 8 2 + γ 8 =0
9.
α 91 + α 9 2 − β 91 − β 9 2 + γ 9 =0
10.
α 101 + α 10 2 − β 101 − β10 2 + γ 10 =0
11.
α 111 + α 11 2 − β111 − β11 2 − γ 11 =0
12.
α i 1 , β i 1 , α i 2 , β i 2 are ≥ 0 and γ j is unrestricted in sign,
where i, j = 1, …, 11.
184
(4.3.1)
Solving above equations, we get the following results for the
sensitive cell a4.
α 41 = 0, α 4 2 = 1, β 41 = 0 and β 4 2 = 0
Again to find out the values of α i '1 , α i '2 , β i '1 and β i '2 for the
sensitive cell a4, we have to solve the equations of (4.2.10). These
equations are given as follows
1. α 1 '1 + α 1 '2 − β1 '1 − β1 '2 + γ 1 ' =0
2. α 2 '1 + α 2 '2 − β 2 '1 − β 2 '2 + γ 2 ' =0
3. α 3 '1 + α 3 '2 − β 3 '1 − β 3 '2 + γ 3 ' =0
4. α 4 '1 + α 4 '2 − β 4 '1 − β 4 '2 + γ 4 ' =-1
5. α 5 '1 + α 5 '2 − β 5 '1 − β 5 '2 + γ 5 ' =0
6. α 6 '1 + α 6 '2 − β 6 '1 − β 6 '2 + γ 6 ' =0
(4.3.2)
7. α 7 '1 + α 7 '2 − β 7 '1 − β 7 '2 + γ 7 ' =0
8. α 8 '1 + α 8 '2 − β 8 '1 − β 8 '2 + γ 8 ' =0
9. α 9 '1 + α 9 '2 − β 9 '1 − β 9 '2 + γ 9 ' =0
10. α 10 '1 + α 10 '2 − β10 '1 − β10 '2 + γ 10 ' =0
11. α 11 '1 + α 11 '2 − β11 '1 − β11 '2 + γ 11 ' =0
12. α i '1 , α i '2 , β i '1 , β i '2 are
≥ 0 and γ j ' is unrestricted in sign,
where i, j = 1, …, 11.
Solving above equations, we get the following results for the sensitive
cell a4.
185
α 4 '1 =0, α 4 '2 =0, β 4 '1 =0 and β 4 '2 =1
The equations of (4.2.7) for the sensitive cell a9 are same as
(4.3.1), with the change in the values of the right hand side as follows.
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0.
After solving (4.2.7) for the sensitive cell a9, we get following results
α 91 = 0, α 9 2 = 1, β 91 = 0 and β 9 2 = 0
Similarly the equations of (4.2.10) for the sensitive cell a9 are
same as (4.3.2) with the change in the values of the right hand side as
follows
0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0.
Solving these equations, we get following results.
α 9 '1 =0, α 9 '2 =0, β 9 '1 =0 and β 9 '2 =1.
Putting these values in (4.2.8), (4.2.9) and (4.2.11), we get protection
equation for a4 as
(i) x4 +5-3 ≥ 2 ⇒ x4 ≥ 0
(ii) -x4 +5+3 ≥ 1 ⇒ x4 ≤ 7
and for a9 as
(i) x9 +5-8 ≥ 4 ⇒ x9 ≥ 7
(ii) -x9 +5+8 ≥ 2 ⇒ x9 ≤ 11.
186
Now we unbiasedly round above sensitive cell values and
found that only the set (0, 10) of unbiasedly rounded cell values
satisfies the protection equation. So we take this set and replace the
original sensitive cell values by these values. After putting these
rounded values, we observe that table is not additive. To make the
table additive we apply the model (4.2.12)-(4.2.13) as follows
Minimize z = 0.083*x1^2+0.043*x2^2+0.029*x3^2+0.020*x5^2+0.043
*x6^2+0.020*x7^2+0.059*x8^2+0.077*x10^2+.00431*X11^2-1
Subject to the constraints
1. x1+x2+x3+x5+x6+x7+x8+x10 = X11 – 10
2. 6 ≤ x1 ≤24
3. 11.5 ≤ x2 ≤46
4. 17 ≤ x3 ≤68
5. 24.5 ≤ x5 ≤98
6. 11.5 ≤ x6 ≤46
7. 25 ≤ x7 ≤100
8. 8.5 ≤ x8 ≤34
9. 6.5 ≤ x10 ≤26
10. 116 ≤ X11 ≤464
11. x1, x2, x3, x5, x6, x7, x8, x10 and X11 are integer and ≥ 0
After solving above model, we get following results.
12 23 34 0 50 23 50 17 10 13
and z = 234.4737.
187
After solving this example by the Salazar’s (2005) procedure, we get
following results
10 25 35 0 50 25 50 15 10 10 and z = -9.
Deviations of the final table obtained by the proposed procedure
from the original table using (*) is 3.74 and deviations of the final
table obtained by the procedure of Salazar from the original table is
6.63. Thus we see that the deviations in the proposed procedure are
small.
This table rounds the sensitive cells in such a way that the
confidential information contained in the sensitive cells is protected
against the single internal attacker and the marginal are also not
disturbed. To make the table additive only one non-sensitive cell (a5)
is disturbed and that also by only 2.0408%, while all other nonsensitive cell values are published in their original form.
Example2: Consider following example taken from Cox (1995).
20
10
20
10
20
80
10
10
20
5
15
60
40
10
10
20
10
90
5
5
15
10
5
40
75
35
65
45
50
270
188
Let the values a1, a9, a16 and a22 are sensitive and the protection
levels for a1, a9 and a16 provided by the statistical office is
k
UPLi i = 7,
k
k
LPLi i = 5, SPLi i = 14 for i = 1, 9 and 16
and for a22 is
UPL22
k 22
= 5,
k
LPL22 22 = 2, SPL22
k 22
= 14.
Now we solve (4.2.7) and (4.2.10) to find out the values of the
dual variables α i 1 , α i 2 , β i 1 , β i 2 , α i '1 , α i '2 , β i '1
and
β i '2
for all the
sensitive cells. After solving (4.2.7) and (4.2.10), we put these values
in (4.2.8), (4.2.9) and (4.2.11) and get the following protection
equation.
(i) x1 ≤ 47 for the sensitive cell a1 only to satisfy the lower protection
and sliding protection requirement. We can not obtain the equations to
satisfy the upper protection requirement for the cell a1. Since the
values of the dual variables for all the other sensitive cells come out to
be 0, we could not obtain any protection equation for all the other
sensitive cells. This may be noted that if we could not form any lower
or upper protection equation for a particular sensitive cell even then
the sensitive cell may be protected. Thus in auditing phase we have to
check whether the sensitive cell for which no protection equation
could be obtained or only one protection equation (upper or lower) is
189
obtained, is protected or not. Now we unbiasedly round these
sensitive cell values taking b=14 and get the following sets of rounded
values, which are protected and nearest to the set of original sensitive
cell values.
(i) (28, 14, 14, 14)
(ii) (14, 28, 14, 14)
(iii) (14, 14, 28, 14)
After replacing the original sensitive cell values by the above
sets of rounded values and applying the model (4.2.12)-(4.2.13), we
could not obtain the solution for the set (iii). Also the value of the
objective function, which minimizes the distance between original and
final table comes out to be 213.9713 and 209.7067 for the set (i) and
(ii), respectively. Hence we select set (ii) of rounded values and get
the following results.
Table 1
14
12
18
13
23
80
9
8
28
4
11
60
47
10
8
14
11
90
5
5
11
14
5
40
75
35
65
45
50
270
190
with z = 209.7067. Since for this problem, we could not obtain any
protection equation for the sensitive cell a9, a16 and a22. Also for the
sensitive cell a1 upper protection equation could not be obtained, so
we have to check, whether these sensitive cells are protected or not. In
auditing phase, we observe that all the sensitive cells are protected.
After solving this problem by the procedure of Salazar (2005), we get
following results.
14
14
28
14
14
84
14
14
14
0
14
56
42
0
14
14
14
84
0
0
14
14
14
42
70
28
70
42
56
266
with z = -68.
Deviations of the final table obtained by the proposed procedure
from the original table using (*) is 16.43 and deviations of the final
table obtained by the procedure of Salazar from the original table is
28.53. Thus we see that the deviations in the proposed procedure are
small for this problem also.
Thus in this problem, however we could not obtain any
protection equation for the sensitive cells a9, a16 and a22 but in the
final table these cells are protected. To make the table additive only
191
12 non-sensitive cells are disturbed, while in the procedure of Salazar
(2005) all the non-sensitive cells are disturbed and marginal are also
not preserved.
Example 3: Consider the following two way table:
200
40
50
200
120
610
20
70
60
100
120
370
40
90
250
100
30
510
100
150
30
80
150
510
360
350
390
480
420
2000
The cell values a4, a10, a15, a19, a20 and a23 are sensitive. Let the
protection levels provided by the statistical office for these sensitive
cells are:
For cells a4
UPL = 20, LPL = 10,
SPL = 15
For cells a10 and a19
UPL = 10, LPL = 5,
SPL = 15
For cells a15
UPL = 25, LPL = 20,
SPL = 15
For cells a20 and a23
UPL = 15, LPL = 7,
SPL = 15.
192
Now we solve (4.2.7) and (4.2.10) to find out the values of the
dual variables
α i 1 , α i 2 , β i 1 , β i 2 , α i '1 , α i '2 , β i '1 and β i '2 for all the
sensitive cells. After solving (4.2.7) and (4.2.10), we put these values
in (4.2.8), (4.2.9) and (4.2.11) and get the following protection
equation.
(i) x4 ≤ 209 for the sensitive cell a4 to satisfy the lower protection
and sliding protection requirement and
(ii) x23 ≤ 154 for the sensitive cell a23 to satisfy the lower protection
and sliding protection requirement. We can not obtain the equations to
satisfy the upper protection requirement for the cell a4 and a23. Since
the values of the dual variables for all the other sensitive cells come
out to be 0, we could not obtain any protection equation for all the
other sensitive cells. Now we unbiasedly round these sensitive cell
values taking b = 19 and get the following sets of rounded values.
Both of these sets are equidistant from the set of original sensitive cell
values.
(i) (190, 114, 247, 95, 152, 152)
(ii) (190, 95, 247, 114, 152, 152)
After replacing the original sensitive cell values by the above
sets of rounded values and applying the model (4.2.12)-(4.2.13), we
193
observe that the set (i) is more nearer to the set of the original
sensitive cell values than the set (ii). Thus we select set (i) and get the
following results
Table 2
204
41
52
190
123
610
19
65
58
114
114
370
42
92
247
98
31
510
95
152
33
78
152
510
360 350
390
480
420
2000
and z = 1056.654.
Since in this problem we could not obtain the protection
equation for some sensitive cells, so in auditing phase we have to
check whether these cells are protected or not. In auditing phase, we
observe that the sensitive cells a4 and a15 could not satisfy the upper
protection requirement, while all other cells are protected.
After solving this problem by the procedure of Salazar (2005),
we get following results.
209
38
57
190
114
608
19
57
57
114
114
361
38
95
247
95
38
513
95
152
38
76
152
513
361
342
399
475
418
1995
194
with z = -86.
Deviations of the final table obtained by the proposed procedure
from the original table using (*) is 21.45 and deviations of the final
table obtained by the procedure of Salazar from the original table is
34.99. Thus in this problem also the proposed procedure results with
smaller loss of information as compared to the procedure of
Salazar(2005).
Example 4: Consider the following two way table taken from
Fischetti and Salazar (2003).
20
50
10
80
8
19
22
49
17
32
12
61
45
101
44
190
The cell value a7 is sensitive. Let the protection levels provided by
the statistical office for a7 is
k
UPL7 7 = 7,
k
k
LPL7 7 = 5, SPL7 7 = 5.
Now we solve (4.2.7) and (4.2.10) to find out the values of the
dual variables α i 1 , α i 2 , β i 1 , β i 2 , α i '1 , α i '2 , β i '1 and β i '2 for the sensitive
cell a7. After solving (4.2.7) and (4.2.10) all the values of the above
dual variables comes out to be 0, so we can not form any protection
195
equation for the sensitive cell a7. After applying rounding procedure
taking b= 5 we get the rounded value as 20. Now we put this value in
place of the original sensitive cell value and apply the model (4.2.12)(4.2.13). After applying the model we get following results
Table 3
20
49
11
80
9
20
20
49
16
32
13
61
45
101
44
190
with z = 172.3245. Since in this problem also, we could not obtain
the protection equation for the sensitive cell, so in auditing phase we
have to check whether the sensitive cell is protected or not. In auditing
phase, we observe that the sensitive cell a7 could not satisfy the upper,
lower and sliding protection requirement and hence the cell a7 is not
protected.
We also solved this problem by the procedure of Salazar (2005)
and get the following results
20
50
15
85
10
20
20
50
20
30
10
60
100
45
195
50
and z = -9.
196
Deviations of the final table obtained by the proposed procedure
from the original table using (*) is 3.16 and deviations of the final
table obtained by the procedure of Salazar from the original table is
11.4. Thus we see that again the deviations in the proposed procedure
are small.
197
APPENDIX 4.0
Example2: Consider following example.
20
10
20
10
20
80
10
10
20
5
15
60
40
10
10
20
10
90
5
5
15
10
5
40
75
35
65
45
50
270
The matrix Mji for this example is given as follows.
Transpose of Mji =
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
x19
x20
x21
x22
x23
x24
x25
x26
x27
x28
x29
x30
1
1
1
1
1
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0 0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1 1
1 0
1 0
1 0
1 0
-1 0
0 -1
0 0
0 0
0 0
0 0
0 0
198
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
-1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
-1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
-1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
-1
0
1
1
1
1
1
0
1
1
1
1
1
0
1
1
1
1
1
0
1
1
1
1
1
0
0
0
0
0
0
-1
Let the cell values a1, a9, a16 and a22 are sensitive. The values of
k
k
k
UPLi i , LPLi i and SPLi i are already defined.
Now the equations of (4.2.7) for the sensitive cell a1 can be
written as follows
α 11 + α 1 2 − β11 − β1 2 + γ 1 =1
α 21 + α 2 2 − β 21 − β 2 2 + γ 2 =0
α 31 + α 3 2 − β 31 − β 3 2 + γ 3 =0
α 41 + α 4 2 − β 41 − β 4 2 + γ 4 =0
α 51 + α 5 2 − β 51 − β 5 2 + γ 5 =0
α 61 + α 6 2 − β 61 − β 6 2 + γ 6 =0
α 7 1 + α 7 2 − β 7 1 − β 7 2 + γ 7 =0
α 81 + α 8 2 − β 81 − β 8 2 + γ 8 =0
α 91 + α 9 2 − β 91 − β 9 2 + γ 9 =0
α 101 + α 10 2 − β 101 − β10 2 + γ 10 =0
α 111 + α 11 2 − β111 − β11 2 − γ 11 =0
α 121 + α 12 2 − β121 − β12 2 + γ 12 =0
α 131 + α 13 2 − β 131 − β 13 2 + γ 13 =0
α 141 + α 14 2 − β141 − β14 2 + γ 14 =0
α 151 + α 15 2 − β151 − β 15 2 + γ 15 =0
α 161 + α 16 2 − β 161 − β16 2 + γ 16 =0
α 17 1 + α 17 2 − β 17 1 − β 17 2 + γ 17 =0
α 181 + α 18 2 − β181 − β 18 2 + γ 18 =0
199
(4.3.3)
α 191 + α 19 2 − β 191 − β19 2 + γ 19 =0
α 201 + α 20 2 − β 201 − β 20 2 + γ 20 =0
α 211 + α 21 2 − β 211 − β 21 2 + γ 21 =0
α 221 + α 22 2 − β 221 − β 22 2 − γ 22 =0
α 231 + α 23 2 − β 231 − β 23 2 + γ 23 =0
α 241 + α 24 2 − β 241 − β 24 2 + γ 24 =0
α 251 + α 25 2 − β 251 − β 25 2 + γ 25 =0
α 261 + α 26 2 − β 261 − β 26 2 + γ 26 =0
α 27 1 + α 27 2 − β 27 1 − β 27 2 + γ 27 =0
α 281 + α 28 2 − β 281 − β 28 2 + γ 28 =0
α 291 + α 29 2 − β 291 − β 29 2 + γ 29 =0
α 301 + α 30 2 − β 301 − β 30 2 + γ 30 =0
α i 1 , β i 1 , α i 2 , β i 2 are ≥ 0 and γ j is unrestricted in sign,
where i, j = 1, …, 30.
Solving above equations, we get the following results for the cell a1.
α 11 =0,
β11 =0, α 1 2 =0 and β1 2 =0.
Now to find out the values of α i '1 , α i '2 , β i '1 and β i '2 for the
sensitive cell a1, we have to solve the equations of (4.2.10). These
equations are given as follows.
α 1 '1 + α 1 '2 − β1 '1 − β1 '2 + γ 1 ' =-1
α 2 '1 + α 2 '2 − β 2 '1 − β 2 '2 + γ 2 ' =0
α 3 '1 + α 3 '2 − β 3 '1 − β 3 '2 + γ 3 ' =0
α 4 '1 + α 4 '2 − β 4 '1 − β 4 '2 + γ 4 ' =0
200
α 5 '1 + α 5 '2 − β 5 '1 − β 5 '2 + γ 5 ' =0
α 6 '1 + α 6 '2 − β 6 '1 − β 6 '2 + γ 6 ' =0
(4.3.4)
α 7 '1 + α 7 '2 − β 7 '1 − β 7 '2 + γ 7 ' =0
α 8 '1 + α 8 '2 − β 8 '1 − β 8 '2 + γ 8 ' =0
α 9 '1 + α 9 '2 − β 9 '1 − β 9 '2 + γ 9 ' =0
α 10 '1 + α 10 '2 − β10 '1 − β10 '2 + γ 10 ' =0
α 11'1 + α 11 '2 − β11'1 − β11'2 + γ 11 ' =0
α 12 '1 + α 12 '2 − β12 '1 − β12 '2 + γ 12 ' =0
α 13 '1 + α 13 '2 − β13 '1 − β 13 '2 + γ 13 ' =0
α 14 '1 + α 14 '2 − β14 '1 − β14 '2 + γ 14 ' =0
α 15 '1 + α 15 '2 − β15 '1 − β 15 '2 + γ 15 ' =0
α 16 '1 + α 16 '2 − β16 '1 − β16 '2 + γ 16 ' =0
α 17 '1 + α 17 '2 − β 17 '1 − β 17 '2 + γ 17 ' =0
α 18 '1 + α 18 '2 − β18 '1 − β 18 '2 + γ 18 ' =0
α 19 '1 + α 19 '2 − β19 '1 − β19 '2 + γ 19 ' =0
α 20 '1 + α 20 '2 − β 20 '1 − β 20 '2 + γ 20 ' =0
α 21 '1 + α 21 '2 − β 21 '1 − β 21 '2 + γ 21 ' =0
α 22 '1 + α 22 '2 − β 22 '1 − β 22 '2 + γ 22 ' =0
α 23 '1 + α 23 '2 − β 23 '1 − β 23 '2 + γ 23 ' =0
α 24 '1 + α 24 '2 − β 24 '1 − β 24 '2 + γ 24 ' =0
α 25 '1 + α 25 '2 − β 25 '1 − β 25 '2 + γ 25 ' =0
α 26 '1 + α 26 '2 − β 26 '1 − β 26 '2 + γ 26 ' =0
201
α 27 '1 + α 27 '2 − β 27 '1 − β 27 '2 + γ 27 ' =0
α 28 '1 + α 28 '2 − β 28 '1 − β 28 '2 + γ 28 ' =0
α 29 '1 + α 29 '2 − β 29 '1 − β 29 '2 + γ 29 ' =0
α 30 '1 + α 30 '2 − β 30 '1 − β 30 '2 + γ 30 ' =0
α i '1 , α i '2 , β i '1 , β i '2 are ≥ 0 and γ j ' is unrestricted in sign,
where i, j = 1, …, 30.
Solving above equations, we get the following results for the
sensitive cell a1.
α 1 '1 =0, α 1 '2 =1, β1 '1 =0 and β1 '2 =2
The equations of (4.2.7) and (4.2.10) are same as (4.3.3) and
(4.3.4) for the sensitive cells a9, a16 and a22, with the change in the
right hand side(R. H. S. ) values as follows.
For a9 R. H. S. values for (4.3.3) are:
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
and the values of (4.3.4) are:
0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
For a16 R. H. S. values for (4.3.3) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
and the values of (4.3.4) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
For a22 R. H. S. values for (4.3.3) are:
202
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0
and the values of (4.3.4) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1,
0, 0, 0, 0, 0, 0, 0, 0
After solving (4.2.7) and (4.2.10) for the cells a9, a16 and a22, we get
following results.
α 91 =0, β 91 =0, α 9 2 =0 and β 9 2 =0
α 9 '1 =0, α 9 '2 =0, β 9 '1 =0 and β 9 '2 =0.
α 161 =0, β 161 =0, α 16 2 =0 and β16 2 =0
α 16 '1 =0, α 16 '2 =0, β 16 '1 =0 and β 16 '2 =0.
α 221 =0, β 221 =0, α 22 2 =0 and β 22 2 =0
α 22 '1 =0, α 22 '2 =0, β 22 '1 =0 and β 22 '2 =0.
Putting these values in (4.2.8), (4.2.9) and (4.2.11), we get the
protection equations for the sensitive cells a1, a9, a16 and a22. These
equations are already defined. We have already found out the set of
unbiasedly rounded and protected cell values, thus here we are giving
only the objective function and the constraints. The objective function
and the constraints for this example are given as follows.
Minimize z = 0.1*x2^2+0.05*x3^2+0.1*x4^2+0.05*x5^2+0.0125*X6^2
+0.1*x7^2+0.1*x8^ 2+0.2*x10^2+0.066667*x11^2+0.016667*X12^2+0.
025*x13^2+0.1*x14^2+0.1* x15^2+0.1*x17^2+0.011111*X18^2+0.2*x19
203
^2+0.2*x20^2+0.066667*x21^2+0.2*x23^2+0.025*X24^2+0.013333*X25
^2+0.028571*X26^2+0.015385* X27^2+0.022222*X28^2+0.02*X29^2+
0.003704*X30^2-1
Subject to the constraints
1. x2+x3+x4+x5=X6-14
2. x7+x8+x10+x11=X12-28
3. x13+x14+x15+x17=x18-14
4. x19+x20+x21+x23=X24-14
5. x7+x13+x19=X25-14
6. x2+x8+x14+x20=X26
7. x3+x15+x21=X27-28
8. x4+x10=X28-28
9. x5+x11+x17+x23=X29
10. x2+x3+x4+x5+x7+x8+x10+x11+x13+x14+x15+x17+x19+x20+x21+x23=
X30-70
11. 5 ≤ x2 ≤20
12. 10 ≤ x3 ≤40
13. 5 ≤ x4 ≤20
14. 10 ≤ x5 ≤40
15. 40 ≤ X6 ≤160
16. 5 ≤ x7 ≤20
17. 5 ≤ x8 ≤20
18. 2.5 ≤ x10 ≤10
19. 7.5 ≤ x11 ≤30
20. 30 ≤ X12 ≤120
21. 20 ≤ x13 ≤80
204
22. 5 ≤ x14 ≤20
23. 5 ≤ x15 ≤20
24. 5 ≤ x17 ≤20
25. 45 ≤ X18 ≤180
26. 2.5 ≤ x19 ≤10
27. 2.5 ≤ x20≤10
28. 7.5 ≤ x21 ≤30
29. 2.5 ≤ x23 ≤10
30. 20 ≤ X24 ≤80
31. 37.5 ≤ X25 ≤150
32. 17.5 ≤ X26 ≤70
33. 32.5 ≤ X27 ≤130
34. 22.5 ≤ X28 ≤90
35. 25 ≤ X29 ≤100
36. 135 ≤ X30 ≤540
37. x2, x3, x4, x5, X6, x7, x8, x10, x11, X12, x13, x14, x15, x17, X18, x19, x20, x21, x23,
X24, X25, X26, X27, X28, X29, and X30 are integer and ≥ 0.
After solving above model, we get the desired results displayed in
table 1.
Example 3: Consider the following population.
200
40
50
200
120
610
20
70
60
100
120
370
40
90
250
100
30
510
100
150
30
80
150
510
360
350
390
480
420
2000
205
The cell values a4, a10, a15, a19, a20 and a23 are sensitive and the
protection levels for these sensitive cells are already defined. The
matrix Mji for this population is given as follows.
Transpose of Mji =
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
x19
x20
x21
x22
x23
x24
x25
x26
x27
x28
x29
x30
1 0
1 0
1 0
1 0
1 0
-1 0
0 1
0 1
0 1
0 1
0 1
0 -1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
-1
0
0
0
0
0
0
0
0
0
0
0
0
0 1
0 0
0 0
0 0
0 0
0 0
0 1
0 0
0 0
0 0
0 0
0 0
0 1
0 0
0 0
0 0
0 0
0 0
1 1
1 0
1 0
1 0
1 0
-1 0
0 -1
0 0
0 0
0 0
0 0
0 0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
-1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
-1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
-1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
-1
0
1
1
1
1
1
0
1
1
1
1
1
0
1
1
1
1
1
0
1
1
1
1
1
0
0
0
0
0
0
-1
Now the equations of (4.2.7) for the sensitive cell a4 can be written as
follows.
206
α 11 + α 1 2 − β11 − β1 2 + γ 1 =0
α 21 + α 2 2 − β 21 − β 2 2 + γ 2 =0
α 31 + α 3 2 − β 31 − β 3 2 + γ 3 =0
α 41 + α 4 2 − β 41 − β 4 2 + γ 4 =1
α 51 + α 5 2 − β 51 − β 5 2 + γ 5 =0
α 61 + α 6 2 − β 61 − β 6 2 + γ 6 =0
α 7 1 + α 7 2 − β 7 1 − β 7 2 + γ 7 =0
α 81 + α 8 2 − β 81 − β 8 2 + γ 8 =0
α 91 + α 9 2 − β 91 − β 9 2 + γ 9 =0
α 101 + α 10 2 − β 101 − β10 2 + γ 10 =0
α 111 + α 11 2 − β111 − β11 2 − γ 11 =0
α 121 + α 12 2 − β121 − β12 2 + γ 12 =0
α 131 + α 13 2 − β 131 − β 13 2 + γ 13 =0
α 141 + α 14 2 − β141 − β14 2 + γ 14 =0
α 151 + α 15 2 − β151 − β 15 2 + γ 15 =0
α 161 + α 16 2 − β 161 − β16 2 + γ 16 =0
α 17 1 + α 17 2 − β 17 1 − β 17 2 + γ 17 =0
α 181 + α 18 2 − β181 − β 18 2 + γ 18 =0
α 191 + α 19 2 − β 191 − β19 2 + γ 19 =0
α 201 + α 20 2 − β 201 − β 20 2 + γ 20 =0
α 211 + α 21 2 − β 211 − β 21 2 + γ 21 =0
α 221 + α 22 2 − β 221 − β 22 2 − γ 22 =0
207
(4.3.5)
α 231 + α 23 2 − β 231 − β 23 2 + γ 23 =0
α 241 + α 24 2 − β 241 − β 24 2 + γ 24 =0
α 251 + α 25 2 − β 251 − β 25 2 + γ 25 =0
α 261 + α 26 2 − β 261 − β 26 2 + γ 26 =0
α 27 1 + α 27 2 − β 27 1 − β 27 2 + γ 27 =0
α 281 + α 28 2 − β 281 − β 28 2 + γ 28 =0
α 291 + α 29 2 − β 291 − β 29 2 + γ 29 =0
α 301 + α 30 2 − β 301 − β 30 2 + γ 30 =0
α i 1 , β i 1 , α i 2 , β i 2 are ≥ 0 and γ j is unrestricted in sign,
where i, j = 1, …, 30.
Solving above equations, we get the following results for the cell a4.
α 41 =0,
β 41 =0,
α 4 2 =0
and
β 4 2 =0.
Now to find out the values of α i '1 , α i '2 , β i '1 and β i '2 for the
sensitive cell a4, we have to solve the equations of (4.2.10). These
equations are given as follows.
α 1 '1 + α 1 '2 − β1 '1 − β1 '2 + γ 1 ' =0
α 2 '1 + α 2 '2 − β 2 '1 − β 2 '2 + γ 2 ' =0
α 3 '1 + α 3 '2 − β 3 '1 − β 3 '2 + γ 3 ' =0
α 4 '1 + α 4 '2 − β 4 '1 − β 4 '2 + γ 4 ' =-1
α 5 '1 + α 5 '2 − β 5 '1 − β 5 '2 + γ 5 ' =0
α 6 '1 + α 6 '2 − β 6 '1 − β 6 '2 + γ 6 ' =0
(4.3.6)
α 7 '1 + α 7 '2 − β 7 '1 − β 7 '2 + γ 7 ' =0
208
α 8 '1 + α 8 '2 − β 8 '1 − β 8 '2 + γ 8 ' =0
α 9 '1 + α 9 '2 − β 9 '1 − β 9 '2 + γ 9 ' =0
α 10 '1 + α 10 '2 − β10 '1 − β10 '2 + γ 10 ' =0
α 11 '1 + α 11 '2 − β11 '1 − β11 '2 + γ 11 ' =0
α 12 '1 + α 12 '2 − β12 '1 − β12 '2 + γ 12 ' =0
α 13 '1 + α 13 '2 − β13 '1 − β 13 '2 + γ 13 ' =0
α 14 '1 + α 14 '2 − β14 '1 − β14 '2 + γ 14 ' =0
α 15 '1 + α 15 '2 − β15 '1 − β 15 '2 + γ 15 ' =0
α 16 '1 + α 16 '2 − β16 '1 − β16 '2 + γ 16 ' =0
α 17 '1 + α 17 '2 − β 17 '1 − β 17 '2 + γ 17 ' =0
α 18 '1 + α 18 '2 − β18 '1 − β 18 '2 + γ 18 ' =0
α 19 '1 + α 19 '2 − β19 '1 − β19 '2 + γ 19 ' =0
α 20 '1 + α 20 '2 − β 20 '1 − β 20 '2 + γ 20 ' =0
α 21'1 + α 21'2 − β 21'1 − β 21 '2 + γ 21' =0
α 22 '1 + α 22 '2 − β 22 '1 − β 22 '2 + γ 22 ' =0
α 23 '1 + α 23 '2 − β 23 '1 − β 23 '2 + γ 23 ' =0
α 24 '1 + α 24 '2 − β 24 '1 − β 24 '2 + γ 24 ' =0
α 25 '1 + α 25 '2 − β 25 '1 − β 25 '2 + γ 25 ' =0
α 26 '1 + α 26 '2 − β 26 '1 − β 26 '2 + γ 26 ' =0
α 27 '1 + α 27 '2 − β 27 '1 − β 27 '2 + γ 27 ' =0
α 28 '1 + α 28 '2 − β 28 '1 − β 28 '2 + γ 28 ' =0
α 29 '1 + α 29 '2 − β 29 '1 − β 29 '2 + γ 29 ' =0
209
α 30 '1 + α 30 '2 − β 30 '1 − β 30 '2 + γ 30 ' =0
α i '1 , α i '2 , β i '1 , β i '2 are ≥ 0 and γ j ' is unrestricted in sign,
where i, j = 1, …, 30.
Solving above equations, we get the following results for the sensitive
cell a4.
α 4 '1 =0, α 4 '2 =0, β 4 '1 =0 and β 4 '2 =1
The equations of (4.2.7) and (4.2.10) are same as (4.3.5) and
(4.3.6) for the sensitive cells a10, a15 ,a19, a20 and a23, with the change
in the right hand side(R. H. S. ) values as follows.
For a10 R. H. S. values for (4.3.5) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
and the values of (4.3.6) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
For a15 R. H. S. values for (4.3.5) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
and the values of (4.3.6) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
For a19 R. H. S. values for (4.3.5) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
and the values of (4.3.6) are:
210
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
For a20 R. H. S. values for (4.3.5) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
and the values of (4.3.6) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
For a23 R. H. S. values for (4.3.5) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0
and the values of (4.3.6) are:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
After solving (4.2.7) and (4.2.10) for the cells a10, a15, a19, a20 and a23,
we get following results.
α 101 =0,
β 101 =0,
α 10 2 =0
and β 10 2 =0
α 10 '1 =0,
α 10 '2 =0,
β 10 '1 =0
and β 10 '2 =0.
α 151 =0,
β 151 =0,
α 15 2 =0
and β 15 2 =0
α 15 '1 =0,
α 15 '2 =0,
β 15 '1 =0
and β 15 '2 =0.
α 191 =0,
β 191 =0,
α 19 2 =0
and β 19 2 =0
α 19 '1 =0,
α 19 '2 =0,
β 19 '1 =0
and β 19 '2 =0.
α 201 =0,
β 201 =0,
α 20 2 =0
and β 20 2 =0
α 20 '1 =0,
α 20 '2 =0,
β 20 '1 =0
and β 20 '2 =0.
211
α 231 =0,
β 231 =0,
α 23 2 =0
and β 23 2 =0
α 23 '1 =0,
α 23 '2 =0,
β 23 '1 =0
and β 23 '2 =1.
Putting these values in (4.2.8), (4.2.9) and (4.2.11), we get the
protection equations for the sensitive cells a4, a10, a15, a19, a20 and a23,
which are already defined. The unbiasedly rounded set of sensitive
values is already defined. After replacing the original sensitive cell
values with the unbiasedly rounded values and then applying the
proposed model, we get the following objective function and the
constraints.
Minimize z = 0.005*x1^2+0.025*x2^2+0.02*x3^2+0.008333*x5^2+0.0
01639*X6^2+0.05*x7^2+0.014286*x8^2+0.016667*x9^2+0.008333*x1
1^2+0.002703*X12^2+0.025*x13^2+0.011111*x14^2+0.01*x16^2+0.03
3333*x17^2+0.001961*X18^2+0.033333*x21^2+0.0125*x22^2+0.00196
1*X24^2+0.002778*X25^2+0.002857*X26^2+0.002564*X27^2+0.00208
3*X28^2+0.002381*X29^2+0.0005*X30^2-1
Subject to the constraints
1. x1+x2+x3+x5=X6-190
2. x7+x8+x9+x11=X12-114
3. x13+x14+x16+x17=X18-247
4. x21+x22=X24-399
5. x1+x7+x13=X25-95
6. x2+x8+x14=X26-152
7. x3+x9+x21=X27-247
8. x16+x22=X28-304
212
9. x5+x11+x17=X29-152
10. x1+x2+x3+x5+x7+x8+x9+x11+x13+x14+x16+x17+x21+x22=X30-950
11. 100 ≤ x1 ≤400
12. 20 ≤ x2 ≤80
13. 25 ≤ x3 ≤100
14. 60 ≤ x5 ≤240
15. 305 ≤ X6 ≤1220
16. 10 ≤ x7 ≤40
17. 35 ≤ x8 ≤140
18. 30 ≤ x9 ≤120
19. 60 ≤ x11 ≤240
20. 185 ≤ X12 ≤740
21. 20 ≤ x13 ≤80
22. 45 ≤ x14 ≤120
23. 50 ≤ x16 ≤200
24. 15 ≤ x17 ≤60
25. 255 ≤ X18 ≤1020
26. 15 ≤ x21 ≤60
27. 40 ≤ x22 ≤160
28. 255 ≤ X24 ≤1020
29. 180 ≤ X25 ≤720
30. 175 ≤ X26 ≤700
31. 195 ≤ X27 ≤780
32. 240 ≤ X28 ≤960
33. 210 ≤ X29 ≤840
34. 1000 ≤ X30 ≤4000
213
35. x1, x2, x3, x5, X6, x7, x8, x9, x11, X12, x13, x14, x16, x17, X18, x21, x22,
X24, X25, X26, X27,X28,X29 and X30 are integer and >=0.
After solving above model, we get the desired results displayed in
table 2.
Example 4: consider following 3x3 population.
20
50
10
80
8
19
22
49
17
32
12
61
45
101
44
190
The cell value a7 is sensitive and the values of UPLi k , LPLi k
i
i
and SPLi k are already defined for a7. The matrix Mji for this example
i
is given as follows.
Transpose of Mji =
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
1
1
1
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
-1
0
0
0
214
1
0
0
0
1
0
0
0
1
0
0
0
-1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
-1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
-1
Now the equations of (4.2.7) for the sensitive cell a7 can be
written as follows.
α 11 + α 1 2 − β11 − β1 2 + γ 1 =0
α 21 + α 2 2 − β 21 − β 2 2 + γ 2 =0
α 31 + α 3 2 − β 31 − β 3 2 + γ 3 =0
α 41 + α 4 2 − β 41 − β 4 2 + γ 4 =0
α 51 + α 5 2 − β 51 − β 5 2 + γ 5 =0
α 61 + α 6 2 − β 61 − β 6 2 + γ 6 =0
(4.3.7)
α 7 1 + α 7 2 − β 7 1 − β 7 2 + γ 7 =1
α 81 + α 8 2 − β 81 − β 8 2 + γ 8 =0
α 91 + α 9 2 − β 91 − β 9 2 + γ 9 =0
α 101 + α 10 2 − β 101 − β10 2 + γ 10 =0
α 111 + α 11 2 − β111 − β11 2 − γ 11 =0
α 121 + α 12 2 − β121 − β12 2 + γ 12 =0
α 131 + α 13 2 − β 131 − β 13 2 + γ 13 =0
α 141 + α 14 2 − β141 − β14 2 + γ 14 =0
α 151 + α 15 2 − β151 − β 15 2 + γ 15 =0
α 161 + α 16 2 − β 161 − β16 2 + γ 16 =0
α i 1 , β i 1 , α i 2 , β i 2 are ≥ 0 and γ j is unrestricted in sign,
where i, j = 1, …, 16.
Solving above equations, we get the following results for the cell a7.
α 7 1 =0,
β 7 1 =0,
α 7 2 =0 and β 7 2 =0.
215
Now to find out the values of α i '1 , α i '2 , β i '1 and β i '2 for the
sensitive cell a7, we have to solve the equations of (4.2.10). These
equations are given as follows.
α 1 '1 + α 1 '2 − β1 '1 − β1 '2 + γ 1 ' =0
α 2 '1 + α 2 '2 − β 2 '1 − β 2 '2 + γ 2 ' =0
α 3 '1 + α 3 '2 − β 3 '1 − β 3 '2 + γ 3 ' =0
α 4 '1 + α 4 '2 − β 4 '1 − β 4 '2 + γ 4 ' =0
α 5 '1 + α 5 '2 − β 5 '1 − β 5 '2 + γ 5 ' =0
α 6 '1 + α 6 '2 − β 6 '1 − β 6 '2 + γ 6 ' =0
(4.3.8)
α 7 '1 + α 7 '2 − β 7 '1 − β 7 '2 + γ 7 ' =-1
α 8 '1 + α 8 '2 − β 8 '1 − β 8 '2 + γ 8 ' =0
α 9 '1 + α 9 '2 − β 9 '1 − β 9 '2 + γ 9 ' =0
α 10 '1 + α 10 '2 − β10 '1 − β10 '2 + γ 10 ' =0
α 11'1 + α 11 '2 − β11'1 − β11'2 + γ 11 ' =0
α 12 '1 + α 12 '2 − β12 '1 − β12 '2 + γ 12 ' =0
α 13 '1 + α 13 '2 − β13 '1 − β 13 '2 + γ 13 ' =0
α 14 '1 + α 14 '2 − β14 '1 − β14 '2 + γ 14 ' =0
α 15 '1 + α 15 '2 − β15 '1 − β 15 '2 + γ 15 ' =0
α 16 '1 + α 16 '2 − β16 '1 − β16 '2 + γ 16 ' =0
α i '1 , α i '2 , β i '1 , β i '2 are ≥ 0 and γ j ' is unrestricted in sign,
where i, j = 1, …, 16.
216
Solving above equations, we get the following results for the sensitive
cell a7.
α 7 '1 =0, α 7 '2 =0, β 7 '1 =0 and β 7 '2 =0
Putting these values in (4.2.8), (4.2.9) and (4.2.11), we could not
get any protection equation for the sensitive cell a7. After replacing
the original sensitive cell values with the unbiasedly rounded values
and then applying the proposed model, we get the objective function
and the constraints as follows.
Minimize z = 0.05*x1^2+0.02*x2^2+0.1*x3^2+0.0125*X4^2+0.125*x5
^2+0.052632*x6^2+0.020408*X8^2+0.058824*x9^2+0.03125*x10^2+0
.083333*x11^2+0.016393*X12^2+0.022222*X13^2+0.009901*X14^2+0
.022727*X15^2+0.005263*X16^2-1
Subject to the constraints
1.
x1+x2+x3=X4
2. x5+x6=X8-20
3. x9+x10+x11=X12
4. x1+x5+x9=X13
5. x2+x6+x10=X14
6. x3+x11=X15-20
7. x1+x2+x3+x5+x6+x9+x10+x11=X16-20
8.
10 ≤ x1 ≤40
9.
25 ≤ x2 ≤100
10. 5 ≤ x3 ≤20
11. 40 ≤ X4 ≤160
12. 4 ≤ x5 ≤16
217
13. 9.5 ≤ x6 ≤38
14. 24.5 ≤ X8 ≤98
15. 8.5 ≤ x9 ≤34
16. 16 ≤ x10 ≤64
17. 6 ≤ x11 ≤24
18. 30.5 ≤ X12 ≤122
19. 22.5 ≤ X13 ≤90
20. 50.5 ≤ X14 ≤202
21. 22 ≤ X15 ≤88
22. 95 ≤ X16 ≤380
23. x1, x2, x3, X4, x5, x6, X8, x9, x10, x11, X12, X13, X14, X15 and X16 are
integer and >=0.
After solving above model, we get the desired results displayed in
table 3.
218
CHAPTER V
OPTIMAL CONTROLLED SELECTION PROCEDURE
FOR SAMPLE CO-ORDINATION PROBLEM USING
LINEAR PROGRAMMING
5.1 INTRODUCTION
In many occasions it is often required to sample the population
on two or more surveys to cover a variety of topics or to obtain the
current estimates of a characteristic of the population. There are
certain applications for which samples are selected at the same time
point, for two or more surveys for the same population. For example,
a sample can be designed for households and persons and another
sample can be designed for literacy for the same population. These
surveys can be conducted simultaneously with different measures of
size and possibly with different stratification. On the other hand, if
after conducting a survey, improved data have become available after
a certain time period, then it would be desirable to improve the
stratification and measures of size. It is possible that both the
stratification and the selection probabilities (i.e. the measure of size)
219
of the sampling units are different in each survey. Thus a redesign is
undertaken in which the old units remain the same but the
stratification and the selection probabilities change, because an
improved data have been obtained. In the redesign of a survey for the
same population, the two samples must be selected sequentially since
the designs are for different time point. In the redesign of a survey the
new sample must be selected independently of the old sample but it
may be considered desirable to retain as many old units as possible in
the new sample, to reduce the expenses associated with hiring of new
interviewer, training of new data collectors etc. Moreover, in almost
all of the surveys, the cost of sampling is roughly proportional to the
total number of units sampled in the survey. Thus, if we select the
same unit twice instead of selecting two distinct units, it will reduce
the cost of the survey. Therefore, if it is possible to minimize the
number of distinct units chosen in the different survey, it would
minimize the cost of the survey. Therefore when the cost of survey is
limited, it is usually desirable to select the units which can be taken as
a sample for both the survey (in case of simultaneous as well as in
sequential selection). Thus in order to reduce the cost of a survey one
has to conduct the surveys in such a way as to minimize the maximum
220
number of units in the union of the samples. This is known as the
problem of maximization of overlap between the sampling units.
There also exists the situation when it is desirable to avoid or
minimize the likelihood of selecting the same unit for more than one
survey. This problem is called the minimization of overlap of
sampling units. For example, if we are interested to take two
consecutive samples from a population after a gap of say three months
to check the level of immunization in children of this population, we
may be interested that the children chosen in the second sample are
different from those selected in the first sample. Thus, we resort to the
technique of minimization of overlap of sampling units.
Therefore, we observe that in many situations it is desirable to
maximize or minimize the expected overlap between two or more
surveys drawn in different or same time points. The problem of
overlap of sampling unit is also referred to as the sample coordination problem. Maximizing the overlap of sampling unit can be
referred to as positive co-ordination and minimizing the overlap of
sampling unit is referred to as negative co-ordination.
The problem of co-ordination of sampling units has been a
topic of interest for more than fifty years. Different methods have
221
been proposed by various authors in order to solve the sample coordination problem. The first approach on sample co-ordination
problem was discussed by Keyfitz (1951). Keyfitz (1951) proposed an
optimum procedure for selecting one unit per stratum designs, when
initial and new designs have identical stratification, with only the
change in selection probability. Fellegi (1963, 1966), Gray and Platck
(1963) and Kish (1963) also proposed methods for sample coordination problem but these methods are in general restricted to
either two successive samples or to small sample size. In order to
solve the problem in context of a larger sample size, Kish and Scott
(1971) proposed a method for sample co-ordination problem. Brever
et. al. (1972) introduced the concept of permanent random number
(PRN) for solving the sample co-ordination problem. The concept of
linear programming approach for solving the problem of sample coordination was first discussed by Causey et al. (1985). Causey et. al.
(1985) proposed an optimum linear programming procedure for
maximizing the expected number of sampling units which are
common to the two designs, when the two sets of sample units are
chosen sequentially. Ernst and Ikeda (1995) also presented a linear
programming procedure for overlap maximization under very general
222
conditions. Ernst (1996) developed a procedure for sample coordination problem, with one unit per stratum designs where the two
designs may have different stratifications. Ernst (1998) proposed a
procedure for sample co-ordination problem with no restriction on the
number of sample units per stratum, but the stratification must be
identical. Both of these procedures proposed by Ernst (1996, 1998)
uses the controlled selection algorithm of Causey, Cox and Ernst
(1985) and can be used for simultaneous as well as sequential sample
surveys. Ernst and Paben (2002) proposed a new methodology for
sample co-ordination problem, which is based on the procedure of
Ernst (1996, 1998). This procedure has no restriction on the number
of sample units selected per stratum and also does not require that the
two designs have identical stratification. Recently Matei and Tillé
(2006) proposed a methodology for sample co-ordination problem for
two sequential sample surveys. They proposed an algorithm, based on
iterative proportional fitting (IPF), to compute the probability
distribution of a bi-design. Their methods can be applied to any type
of sampling design for which it is possible to compute the probability
distribution for both samples.
223
In this chapter, using the linear programming approach, we
propose an improved method for sample co-ordination problem which
maximizes (or minimizes) the overlap of sampling units between two
designs. The proposed procedure is motivated by Ernst (1998). The
basic idea of the proposed procedure is adapted from Ernst (1998),
however the way of solving the controlled selection problem is
different.
In section 5.2, we describe the proposed methodology for
positive and negative sample co-ordination problem. In section 5.3,
some numerical examples have been considered to demonstrate the
utility of the proposed procedure.
5.2 THE OPTIMAL CONTROLLED PROCEDURE
Following the notations of Ernst (1998), we consider two
sampling designs D1 and D2, with identical population and
stratification, consisting of N units, with S denoting one of the strata.
We have to select the given number of sample units from the two
designs. The selection probability of each unit in S is not in general
same for the two designs. In order to reduce the cost of the survey, let
us first suppose that we want to maximize the overlap of sampling
224
units for the two sampling designs. Thus the problem to be solved
here is of maximizing the overlap of sampling units in D1 and D2. To
maximize the overlap of sampling units in D1 and D2, we select the
sample units subject to the following conditions originally derived by
Ernst (1998):
(i) There are a predetermined number of units, nj, selected from S for
the Ds sample, s= 1,2. That is the sample size for each stratum
and design combination is fixed.
(5.2.1)
(ii) The ith unit in S is selected for the Ds sample with its
assigned probability, denoted π ij
(5.2.2)
(iii) The expected value of the number of sample units
common to the two designs is maximized.
(5.2.3)
(iv) The number of sample units in common to any D1 and D2 samples
is always within one of the maximum expected value.
(5.2.4)
As described in Ernst (1998) the problem of maximizing the
overlap of sampling units for the two designs can be converted into
the “controlled selection” problem W= ( wij ) , where ( wij ) denotes the
internal elements of W. Here W is an (N+1)x5 array with N internal
rows and 4 internal columns, where N is the number of units in the
225
stratum universe. The solution of this controlled selection problem W
would then maximize the overlap of sampling units. This controlled
selection problem W can be solved by constructing a sequence of
integer valued, tabular array, M1 = (mij1 ) , M2 = (mij 2 ) ,…, Mt = (mijt ) ,
with the same number of rows and columns as W and associated
probabilities, p1,…,pt, which specify certain conditions. Finally a
random array M = (mij ) , is then chosen from among these t arrays
using the indicated probability and this array determines the sample
allocation. The procedure of computing the internal elements of W is
as follows. Let
wi 3 = min (π i1 , π i 2 ),
wis = π ij − wi 3 ,
i = 1,..., N
s = 1,2
(5.2.5)
(5.2.6)
3
wi 4 = 1 − ∑ wij
(5.2.7)
j =1
This array W can be considered as a controlled selection
problem. In the i th internal row of the array W, the first element
denotes the probability that the i th unit is in D1, the second element
denotes the probability that the i th unit is in D2, the third element
denotes the probability that the i th unit is in D1 and D2 and the forth
element denotes the probability that the i th unit is neither in D1 nor in
226
D2. We next describe the conditions which must be satisfied by the
sequence of integer valued arrays, M1,…,Mt and associated
probabilities p1,…,pt which determine the sample allocation. In each
internal row of these arrays, one of the four internal columns shall
have the value 1 and the other three have the value 0. The value in the
first column indicates that the unit is only in the D1 sample; value 1 in
the second column indicates that the unit is only in the D2 sample;
value 1 in the third column indicates that the unit is in both samples;
and the value 1 in the forth column indicates that the unit is in neither
of the two samples.
Ernst (1998) has derived a set of conditions which, if met by
the random array M, are sufficient to satisfy (5.2.1)-(5.2.4).
(5.2.2) will be satisfied if
p(mis = 1) + p(mi 3 = 1) = wis + wi 3 = π is , i = 1,...N , s = 1,2
(5.2.8)
(5.2.3) will be satisfied if we have
p(mi 3 = 1) = wi 3 ,
i = 1,...N
(5.2.9)
Therefore if it can be established that
t
E (mij ) = ∑ p k mijk = wij ,
i = 1,..., N ,
k =1
227
j = 1,...,5
(5.2.10)
Then (5.2.2) and (5.2.3) will hold, since (5.2.10) implies (5.2.8) and
(5.2.9).
To establish (5.2.1), it only needs to show that
m( N +1) sk + m( N +1) 3k = n s ,
s = 1,2
k = 1,..., t
(5.2.11)
Finally, to establish (5.2.4), it is sufficient to show that
mijk − wij < 1, i = 1,..., N + 1
j = 1,...5
k = 1,...t
(5.2.12)
Since, in particular,
m( N +1)3k − w( N +1)3 < 1,
k = 1,...t
,
where w(N+1)3 is the maximum expected number of units which are
common to the two samples and m(N+1)3k is the number of units
common to the k th possible sample.
Now our problem becomes to find out the solution of the
controlled selection problem W in such a way as to satisfy the
conditions (5.2.10)-(5.2.12). The solutions of the controlled selection
problem W which satisfy the condition (5.2.10)-(5.2.12), will then
maximize the overlap of sampling units in the design D1 and D2.
The first step of the proposed procedure is to find out the all
possible combination of units according to the probabilities of the
228
array W. This set is denoted by A. In order to satisfy condition (5.2.11)
we select only those arrays for which
m( N +1) sk + m( N +1) 3k = n s ,
s = 1,2
k = 1,..., t
.
Thus the first step is to exclude those arrays, out of all possible
arrays, which does not satisfy the condition (5.2.11). Let us denote
this set of arrays by A1. Now we have proposed following model to
solve the controlled selection problem W and to satisfy the conditions
(5.2.10) and (5.2.12). In this model we use linear programming
approach, for maximizing the
probability of those sample
combinations which consist of maximum number of overlapped
−
sampling units and this set is denoted by A . The non-negativity
condition of H-T estimator is also included in this model for the
purpose of variance estimation. The proposed model is described as
follows:
Maximize z = ∑ p k
(5.2.13)
k∈ A
Subject to the constraints
t
i.
∑p
k =1
k
=1
229
ii.
t
∑p m
k =1
iii.
k
ijk
= wij
i = 1,..., N ,
mijk  ≤ mijk ≤ mijk 
iv.
π i i s ≤ π i sπ i s
v.
pk ≥ 0
vi.
πii s ≥ 0
12
1
2
j = 1,...,4
(5.2.14)
for s = 1,2
for k = 1,..., t
12
where k refers to a particular sample combination and mijk  and mijk 
denotes the lower and upper rounded values of mijk respectively, when
the rounding base is 1. Condition (i) and (v) are necessary for any
sampling design and Condition (ii) and (iii) are required to satisfy
(5.2.10) and (5.2.12) respectively. Condition (ii) also ensures that the
resultant design is an IPPS design. Condition (iv) is desirable as it
ensures the sufficient condition for non-negativity of the YatesGrundy estimator of the variance and condition (vi) is desirable
because it ensures unbiased variance estimation of H-T estimator. In
the above model, condition (iv) is very stringent and it is possible that
in some situations a feasible solution of the model may not exist with
this condition. In such situations, this condition can be dropped. Even
230
after dropping condition (iv), it is possible to find out a non-negative
estimate of H-T estimator in the situations where we have to select a
sample of size greater than two. The reason for this is that the
condition (iv) is necessary for the non-negative estimation of H-T
estimator for the sample of size less than or equal to two only and not
for the sample of size greater than two. Again, if in some situations
the condition (vi) is also not satisfied, this condition may also be
dropped and alternative method of variance estimation, other than
Horvitz-Thompson variance estimator, can be used.
Moreover, in many situations, it is often desirable to avoid the
selection of same unit for two or more surveys. Thus in these
situations, we have to minimize the overlap of sampling units for two
or more surveys. The proposed procedure can be easily modified to
minimize the overlap of sampling units. In order to minimize the
overlap of sampling units, we have to redefine the internal elements of
W. condition (5.2.5) can be replaced by
wi3= max ( πi1+ πi2 -1, 0)
( 5.2.15 )
and (5.2.6), (5.2.7) will remain the same as for the case of
maximization of overlap of sampling units. Rest of the procedure will
remain the same as defined for the case of maximization of overlap of
231
sampling units except of the objective function of the proposed model.
Objective function in the case of minimization of overlap of sampling
units can be defined as
Max z =
∑p
k∈C
k
where C denotes the set of all those sample
combinations, which consists of minimum number of overlapped
sampling units.
The proposed procedure can be used for the situations when
the two surveys are conducted for the same population with identical
stratification. These two surveys can be conducted sequentially or
simultaneously. There is no restriction on the number of units selected
per stratum. The proposed procedure is superior to the procedure of
Ernst (1998) as the proposed procedure maximizes the probability of
those sample combinations which consists of maximum number of
overlapped sampling units (in case of positive co-ordination) or
minimizes the probability of those sample combinations which
consists of maximum number of overlapped sampling units (in case of
negative co-ordination). The proposed procedure also ensures
variance estimation using Horvitz-Thompson (1952) variance
estimator and in the situations, where the conditions of H-T estimator
could not be satisfied some alternative variance estimator can be used.
232
Moreover, the procedure of Ernst (1998) does not take in account the
all possible combinations of units and terminates after few steps. But
the proposed procedure provides all possible combinations of
sampling units to the sampler so that the sampler has a large choice to
select the sample combination. In some situations it may happen that
the selected units in the sample are too distinct and thus increases the
cost of the survey. In this situation the sampler can select another
combination of units in order to reduce the cost of the survey. Thus if
we consider all possible combinations of units, then it will be
advantageous to the sampler to select any sample combination
according to the budget of the survey. Thus the proposed procedure
not only maximizes or minimizes the overlap of sampling units but
also controls the cost of the survey.
The proposed procedure can be applied to any number of
sampling units per stratum, but the size of the linear programming
problem increase rapidly as the number of sampling units per stratum
increases. But this problem encounters with all those procedures,
which uses linear programming approach for solving the sample coordination problem. Thus these procedures can be used for the designs
with small number of sample units per stratum.
233
5.3 EXAMPLES
In this section, we illustrate some numerical examples to
demonstrate the utility of the proposed procedure. We also compare
the proposed plan with the procedure of Ernst (1998) to demonstrate
the superiority of the proposed plan over the procedure of Ernst
(1998).
Example 1:
Consider the following example taken from Ernst
(1998). Inclusion probabilities are to be given for two sampling
designs for 5 different units.
TABLE 1: Inclusion probabilities
i
1
2
3
4
5
π i1
.6
.4
.8
.6
.6
π i2
.8
.4
.2
.4
.2
We have to select a sample of size 3 for the design D1 and a
sample of size 2 for the design D2.
Case I (Maximization): The first step is to find out the value of
internal elements of W using (5.2.5)-(5.2.7), which are given as
follows
234
W=
0
.2
.6
.2
1
0
0
.4
.6
1
.6
0
.2
.2
1
.2
0
.4
.4
1
.4
0
.2
.4
1
1.2
.2
1.8
1.8
5
Now the problem becomes of solving the above controlled
selection problem with N= 20 and n=5, where N denotes the total
number of population units and n denotes the sample units in the
above array.
All possible combinations satisfying condition (5.2.11) are as follows
.
.
(1) x
x
.
x . .
. x . .
. x . .
. x
. x .
. . . x
. . x .
. .
. . . (2) x . . . (3) x . . . (4) x .
. . .
x . . .
. . . x
. .
. . x
. . x .
x . . .
x .
. .
. x
. .
x .
. .
.
.
(5) x
.
.
.
.
.
.
.
x
x
.
.
.
.
. .
.
. .
. (6) x .
x
. .
x
. .
x
.
.
x
.
.
x
.
.
x
.
.
(7)x
.
.
.
.
.
.
.
x
.
.
.
x
.
x
.
x
.
.
x
.
x
.
x
.
.
.
x
.
.
(9) x
.
.
.
.
.
.
.
.
x
.
.
x
x
. .
.
. .
. (10) x .
x
. .
.
. .
.
.
.
x
x
x
.
x
.
. (11).
.
x
.
x
x
.
.
.
.
.
x
.
.
.
.
. x .
.
. . .
x (12). . x
.
x . .
.
x . .
.
x
.
.
.
235
.
.
(8)x
.
.
.
.
.
.
.
.
.
(13).
x
.
.
.
.
.
.
x
x
.
.
.
.
. .
.
. .
x (14) . .
.
x .
x
. .
x
.
x
.
.
.
. .
x
. .
. (15). .
.
x .
x
. .
x
.
.
.
x
.
. . .
x
. . x
x (16). . x
.
x . .
.
. . .
x
.
.
.
x
.
.
(17) .
x
.
.
.
.
.
.
.
x
.
.
x
x
. .
.
. .
x (18) . .
.
x .
.
. .
.
.
x
.
x
x
. .
x
. .
. (19) . .
.
. .
.
x .
x
x
.
.
.
.
.
x (20)
x
.
.
.
.
.
x
.
.
.
.
.
x
.
x
.
.
.
x
.
x
.
.
.
(21) .
.
x
.
.
.
.
.
x
.
.
x
.
.
. .
x
. .
x (22) . .
.
. .
.
x .
.
x
x
.
.
x
. .
.
. .
. (23) . .
x
. .
.
x .
.
x
.
x
.
x
.
x (24)
.
.
.
.
.
.
x
.
.
.
.
.
.
x
x
. .
x
x
.
.
.
Now we apply the proposed model as follows.
Maximize z = p5+p6+p7+p8+p9+p10+p13+p14+p15+p16+p17+p18+p19+
p20+p21+p22+p23+p24
Subject to the Constraints
1.
p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+
p16+p17+p18+p19+p20+p21+p22+p23+p24=1
2.
p1+p2+p3+p4+p11+p12=.2
3.
p5+p6+p7+p13+p14+p15+p19+p20+p21=.6
4.
p8+p9+p10+p16+p17+p18+p22+p23+p24=.2
5.
p1+p3+p5+p8+p9+p11+p13+p16+p17+p19+p22+p23=.4
6.
p2+p4+p6+p7+p10+p12+p14+p15+p18+p20+p21+p24=.6
7.
p1+p2+p3+p4+p5+p6+p7+p8+p9+p10=.6
236
8.
p12+p14+p16+p18+p20+p22+p24=.2
9.
p11+p13+p15+p17+p19+p21+p23=.2
10.
p1+p2+p11+p12+p13+p14+p15+p16+p17+p18=.2
11.
p4+p6+p8+p10+p21+p23+p24=.4
12.
p3+p5+p7+p9+p19+p20+p22=.4
13.
p3+p4+p11+p12+p19+p20+p21+p22+p23+p24=.4
14.
p2+p7+p9+p10+p15+p17+p18=.2
15.
p1+p5+p6+p8+p13+p14+p16 =.4
16.
p5+p13+p19<=.24
17.
p5+p6+p7+p14+p20<=.48
18.
p6+p13+p14+p15+p21 <=.36
19.
p7+p15+p19+p20+p21 <=.36
20.
p1+p3+p5+p8+p9+p16+p22<=.32
21.
p1+p8+p11+p13+p16+p17+p23<=.24
22.
p3+p9+p11+p17+p19+p22+p23<=.24
23.
p2+p3+p4+p7+p9+p10+p12+p18+p20+p22+p24<=.48
24.
p1+p2+p4+p6+p8+p10+p12+p14+p16+p18+p24<=.48
25.
p2+p4+p10+p11+p12+p15+p17+p18+p21+p23+p24<=.36
26.
p1+p3+p5+p11+p13+p19<=.32
27.
p12+p14+p20<=.16
28.
p4+p6+p21 <=.32
29.
p2+p7+p15<=.16
30.
p16+p22<=.08
31.
p8+p23<=.16
32.
p9+p17<=.08
33.
p24<=.08
237
34.
p18<=.04
35.
p10<=.08
36.
pi(s) ≥ 0 for i = 1,…,24.
37.
The equations from 16 to 35 should be ≥ 0.
After solving above model, we get following solutions
Z = .8
p1= .03017;
p2= .02386;
p3= .04727;
p4= .07873;
p5= .118;
p6= .09977;
p7= .07396;
p8= .0585;
p9= .02465;
p10= .0451;
p11= .06692;
p12= .01328; p13= .0298;
p14= .0564;
p15= .0236;
p16= .0073;
p17= .0029;
p18= .0059;
p19= .0463;
p20= .0757;
p21= .0765;
p22= .0141;
p23= .01418;
p24= .02718;
Ernst (1998) has given following results for this problem.
p3= .2;
p6= .4;
p18= .2;
p19= .2;
However the summation of the probability of those sample
combinations which consist of maximum number of overlapped
sampling units for the procedure of Ernst (1998) is .8, which is same
as achieved by the proposed procedure but the proposed procedure
has the advantage that the variance estimation is possible with this
procedure. Also the procedure of Ernst (1998) does not consider all
possible combination of sampling units but the proposed procedure
takes into account all possible combination of sampling units. The
238
advantage of considering the all possible combination of sampling
units can be understood by the following situation.
Suppose for the sampling design D1 the sample combinations
consisting of the units 2, 3, 4 or 2, 4, 5 are desirable and for the
sampling design D2 the sample combinations consisting of the units 2,
3 or 3, 4 or 4, 5 is desirable. In this situation, if we consider the
solutions provided by the procedure of Ernst (1998), then there exist
no sample combination in his solution which consists of these units
together for the sampling design D1 and D2. However, the proposed
procedure provides the sample combinations consisting of these units
together for the sampling design D1 and D2.
Case II (Minimization): For the case of minimization of overlap of
sampling units, the proposed procedure could not get a feasible
solution with the constraints (iv) and (vi), so we dropped the
constraints (iv) and (vi) and get the following results.
Z= .6
p1= .150275;
p2= .099617;
p3= .029148;
p4= .150108;
p5= .108444;
p6= .038652;
p7= .0;
p8= .004897;
p9= .0;
p10= .01886;
p11= .023756;
p12= .0;
p13= .176244;
p14= .107237;
p15= .008826;
p16= .083937.
239
After solving this example by the procedure of Ernst (1998), for
the case of minimization of overlap of sampling units, we get
following results.
p1= .4;
p6= .2;
p13= .2;
p16= .2.
For the case of minimization of overlap of sampling units, the
summation of the probability of those sample combinations which
consists of minimum number of overlapped sampling units for the
procedure of Ernst (1998) is .6, which is again same as achieved by
the proposed procedure but as we said earlier that the proposed
procedure has some advantages over the procedure of Ernst (1998).
Example 2: Consider the following example taken from Tiwari,
Nigam and Pant (2007). Inclusion probabilities for two designs for 6
units are given as follows.
TABLE 2: Inclusion probabilities
i
1
2
3
π i1
0.42
0.42
0.45
0.48
0.66
0.57
π i2
0.3
0.45
0.3
0.60
0.81
0.54
4
5
6
We have to select a sample of size 3 for both the designs.
240
Case I (Maximization): Internal values of W are as follows.
W=
.12
0
.3
.58
1.0
0
.03
.42
.55
1.0
.15
0
.3
.55
1.0
0
.12
.48
.4
1.0
0
.15
.66
.19
1.0
.03
0
.54
.43
1.0
.3
.3
2.7
2.7
6.0
All possible combinations satisfying condition (5.2.11) are as follows.
x . . .
. x . .
(1) . . x .
. . x .
. . . x
. . . x
x . . .
. x . .
(2) . . x .
. . . x
. . x .
. . . x
x . . .
. x . .
(3) . . x .
. . . x
. . . x
. . x .
x . . .
. x . .
(4) . . . x
. . x .
. . x .
. . . x
x . . .
. x . .
(6) . . . x
. . . x
. . x .
. . x .
x . . .
. . x .
(7) . . x .
. x . .
. . . x
. . . x
x . . .
. . x .
(8) . . . x
. x . .
. . x .
. . . x
x . . .
. . x .
(9) . . . x
. x . .
. . . x
. . x .
x . . .
x . . .
. . . x
. . . x
(11) . . x . (12) . . . x
. x . .
. x . .
. . . x
. . x .
. . x .
. . x .
x . . .
. . x .
(13) . . x .
. . . x
. x . .
. . . x
241
x .
. x
(5) . .
. .
. .
. .
. .
. .
. x
x .
. x
x .
x . . .
. . . x
(10) . . x .
. x . .
. . x .
. . . x
x . . .
x . . .
. . x .
. . x .
(14) . . . x (15) . . . x
. . x .
. . . x
. x . .
. x . .
. . . x
. . x .
x . . .
x . . .
. . . x
. . . x
(16) . . x . (17) . . x .
. . x .
. . . x
. x . .
. x . .
. . . x
. . x. .
x . . .
. . . x
(18) . . . x
. . x .
. x . .
. . x .
.
.
(21)x
.
.
.
. x .
. . . x
x . .
. x . .
. . . (22) x . . .
. . x
. . x .
. . x
. . x .
. x .
. . . x
.
.
(23) x
.
.
.
. . x
. . . x
. . x .
x . .
. x . .
. . x .
. . . (24) x . . . (25) x . . .
. x .
. . . x
. x . .
. . x
. . x .
. . . x
. x .
. . x .
. . . x
.
.
(26)x
.
.
.
. x .
. . x .
. . x
. . . x
. . . (27) x . . .
x . .
. x . .
. x .
. . . x
. . x
. . x .
.
.
(28)x
.
.
.
. . x
. x .
. . .
x . .
. x .
. . x
. . . x
. . . x
. . x .
. . . x
(29) x . . . (30) x . . .
. x . .
. x . .
. . . x
. . x .
. . x .
. . x .
. . x .
. . x .
. . . x
. . . x
(32) x . . . (33) x . . .
. . x .
. . . x
. x . .
. x . .
. . . x
. . x .
. . . x
. . . x
. . x .
. . x .
(34) x . . . (35) x . . .
. . x .
. . . x
. x . .
. x . .
. . . x
. . x .
. . x .
. . x .
(31) x . . .
. . . x
. x . .
. . . x
.
.
(36)x
.
.
.
. . x .
. . x .
. x . .
. x . .
(19) x . . . (20) x . . .
. . x .
. . . x
. . . x
. . x .
. . . x
. . . x
. . x
. . x .
. . x
. x . .
. . . (37) . . x .
. x .
. . . x
x . .
. . . x
. x .
x . . .
. . x .
. x . .
(38) . . . x
. . x .
. . . x
x . . .
. . x .
. x . .
(39) . . . x
. . . x
. . x .
x . . .
. . . x
. x . .
(40) . . x .
. . x .
. . . x
x . . .
. . . x
. . . x
. x . .
. x . .
(41). . x . (42) . . . x
. . . x
. . x .
. . x .
. . x .
x . . .
x . . .
. . x .
. . x .
(43) . . . x
. x . .
. . . x
x . . .
. . x .
. . . x
(44) . . x .
. x . .
. . . x
x . . .
. . x .
. . . x
(45) . . . x
. x . .
. . x .
x . . .
242
. . . x
. . x .
(46). . x .
. x . .
. . . x
x . . .
. . . x
. . x .
(47) . . . x
. x . .
. . x .
x . . .
. . . x
. . . x
(48) . . x .
. x . .
. . x .
x . . .
. . x .
. . x .
(49) . . . x
. . . x
. x . .
x . . .
. . x .
. . . x
(50) . . x .
. . . x
. x . .
x . . .
. . x .
. . . x
. . . x
. . x .
(51). . . x (52) . . x .
. . x .
. . . x
. x . .
. x . .
x . . .
x . . .
. . . x
. . x .
(53) . . . x
. . x .
. x . .
x . . .
. . . x
. . . x
(54) . . x .
. . x .
. x . .
x . . .
. . x .
. . x .
(55) . . x .
. . . x
. . . x
. . . x
.
.
(56).
.
.
.
.
.
.
.
.
.
x .
x .
. x
x .
. x
. x
x .
x .
. x
. x
x .
. x
.
.
(58) .
.
.
.
.
.
.
.
.
.
x .
x .
. x
. x
. x
x .
.
.
(59) .
.
.
.
.
.
.
.
.
.
x .
. x
x .
x .
. x
. x
. . x .
. . . x
(60) . . x .
. . . x
. . x .
. . . x
.
.
(61).
.
.
.
.
.
.
.
.
.
x .
. . x .
. x
. . . x
x . (62) . . . x
. x
. . x .
. x
. . x .
x .
. . . x
.
.
(63) .
.
.
.
.
.
.
.
.
.
x .
. x
. x
x .
. x
x .
.
.
(64) .
.
.
.
.
.
.
.
.
.
x .
. x
. x
. x
x .
x .
. . . x
. . x .
(65) . . x .
. . x .
. . . x
. . . x
. . . x
. . x .
(70) . . . x
. . . x
. . x .
. . x .
.
.
(57) .
.
.
.
.
.
.
.
.
.
.. . x
. . x .
(66) . . x .
. . . x
. . x .
. . . x
.
.
(67) .
.
.
.
.
.
.
.
.
.
. x
x .
x .
. x
. x
x .
.
.
(68) .
.
.
.
.
.
.
.
.
.
. x
x .
. x
x .
x.
.x
.
.
(69) .
.
.
.
.
.
.
.
.
.
. x
x .
. x
x .
. x
x .
.
.
(71) .
.
.
.
.
.
(72) .
.
.
.
.
.
.
.
.
.
. x
. x
x .
x .
. x.
x .
.
.
(73) .
.
.
.
.
.
.
.
.
.
. x
. x
x .
. x
x .
x .
.
.
(74) .
.
.
.
.
.
.
.
.
.
. x
. x
. x
x .
x .
x .
.
.
.
.
.
.
. x
. x
x .
x .
x .
. x
243
After applying the proposed model, we get following results.
Z = .7
p1= .0181;
p2= .0002;
p3= .0009;
p4= .0006;
p5= .0002;
p6= .0011;
p7= .022;
p8= .0019;
p9= .0085;
p10= .0012;
p11= .0069;
p12= .0125;
p13= .0119;
p14= .0082;
p15= .0102;
p16= .0042;
p17= .0084;
p18= .0031;
p19= .0031;
p20= .0002;
p21= .0031;
p22= .0009;
p23= .0002;
p24= .0011;
p25= .0193;
p26= .0015;
p27= .0099;
p28= .0031;
p29= .0064;
p30= .0183;
p31= .0228;
p32= .016;
p33= .0121;
p34= .0054;
p35= .0142;
p36= .0177;
p37= .000004;
p38= .00001;
p39= .00006;
p40= .00001;
p41= .00001; p42= .0003;
p43= .001;
p44=.0044;
p45= .0004;
p46= .0011;
p47= .0009;
p48= .0005;
p49= .0004;
p50= .0026;
p51= .0173;
p52= .0001;
p53= .0006;
p54= .0003;
p55= .0014;
p56= .0048;
p57= .0173;
p58= .0078;
p59= .0028;
p60= .017;
p61= .0035;
p62= .0543;
p63= .0143;
p64= .0678;
p65= .0057;
p66= .0348;
p67= .0075;
p68= .091;
p69= .0228;
p70= .0887;
p71= .0657;
p72= .014;
p73= .0646;
p74= .1137;
We have also solved this example by the method of Ernst
(1998) and found the following results.
p8= .067;
p10= .042;
p17= .012;
p24= .033;
p30= .009;
p31= .108;
p48= .003;
p52= .032;
p55= .192;
p66=.022;
p74= .48;
244
The
summation
of
the
probability
of
those
sample
combinations which consist of maximum number of overlapped
sampling units for the procedure of Ernst (1998) is .694 which is less
than the value achieved by the proposed procedure. Also the proposed
procedure has the advantage that the variance estimation is possible
with it.
The proposed procedure could not get a feasible solution with
the constraint(iv), for the case of minimization of overlap of sampling
units, so for this example also, we dropped the constraint (iv) and get
the following results.
Z= .34
p1= .0;
p2= .0;
p4= .0519;
p5= .0781;
p6= .0111;
p9= .0138; p10= .0109;
p11= .0069;
p12= .0141;
p13= .015; p14= .0087; p15= .0129; p16= .01;
p17= .011;
p18= .0203;
p19= .058; p20= .0005; p21= .025; p22= .088;
p23= .0005;
p24= .0003;
p25= .017; p26= .003; p27= .0064; p28= .022;
p29= .0024;
p30= .0212;
p31= .004; p32= .002; p33= .0;
p35= .0223;
p36= .0022;
p37= .018; p38= .0012; p39= .0133; p40= .0239;
p41= .0118;
p42= .0;
p43= .021; p44= .0086; p45= .0254; p46= .0028;
p47= .0009;
p48= .008;
p49= .007; p50= .0022; p51= .005; p52= .0269;
p53= .0104;
p54= .0039;
p55= .019; p56= .0131; p57= .0277; p58= .0104;
p59= .0085;
p60= .0013;
p61=.003; p62= .012; p63= .0036; p64= .029;
p65= .013;
p66= .0026;
p67= .0;
p71= .0267;
p72= .0021;
p7= .0163; p8= .0;
p3= .0;
p34= .0;
p68= .0086; p69= .0214; p70= .017;
245
p73= .008; p74= .0;
After solving this example by the method of Ernst (1998) for
the case of minimization of overlap of sampling units we found the
following results.
p9= .03;
p18= .11;
p20= .19;
p22= .24;
p24= .1;
p30= .03;
p37= .01;
p40= .17;
p65= .09;
p71= .02;
p29= .01;
For this example the summation of the probability of those
sample combinations which consists of minimum number of
overlapped sampling units for the procedure of Ernst (1998) is .33,
which is less than the value achieved by the proposed procedure and
thus shows the superiority of the proposed procedure.
Example 3: consider the following population taken from Keyfitz
(1951) for two sequential sample surveys. Inclusion probabilities are
to be given as follows.
TABLE 3: Inclusion probabilities
i
1
2
3
4
π i1
0.14562
0.6462
0.58534
0.62284
π i2
0.16404
0.67018
0.5596
0.60618
We have to select a sample of size two for both the survey.
Internal elements of W are as follows.
246
W=
0
0
.01842
.02398
.14562
.6462
.83596
.32982
1
1
.02574
0
.5596
.41466
1
.01666
0
.60618
.37716
1
.0424
.0424
1.9576
1.9576
4.0
All possible combinations satisfying condition (5.2.11) are as follows.
. . .
. . .
(1). . x
. . x
x
. . . x
. . x .
. . . x
. . x .
. . x .
x
. . x .
. . . x
. . x .
. . . x
. . x .
. (2) . . . x (3) . . . x (4) . . x . (5) . . x . (6) . . . x
.
. . x .
. . x .
. . . x
. . . x
. . . x
.
.
(7) .
x
. . x
x . .
. x .
. . .
.
.
(8).
x
. x .
. x . .
. x . .
. . . x
. . x .
x . .
. . . x
. . x .
. x . .
. x . .
. . x (9). . x . (10). . . x (11)x . . .(12) x . . .
. . .
x . . .
x . . .
. . x .
. . . x
.
.
(13)x
.
x . .
. x . .
. . x
. . x .
. . . (14)x . . .
. x .
. . . x
After applying the proposed model, we get following results.
Z = .96
p1= .2517;
p2= .2885;
p3= .0475;
p4= .2717;
p5= .0307;
p6= .0675;
p7= .0055;
p8= 0.0;
p9= 0.0;
p10= .0111;
p11= .0185;
p12=0.0;
p13= 0.0;
p14= .0073;
We have also solved this example by the method of Ernst
(1998) and found the following results.
247
p1= .16;
p2= .42;
p4= .23;
p5= .15;
p7= .02;
p13= .02;
However for this example again the summation of the
probability of those sample combinations which consist of maximum
number of overlapped sampling units for the procedure of Ernst
(1998) is same as achieved by the proposed procedure but for this
example also the variance estimation is possible with the proposed
procedure.
For this example, the proposed procedure gives the following
results for the case of minimization of overlap of sampling units.
Z = .31
p1= .0335; p2= .0583; p3= .0538; p4= .055; p5= .0547; p6= .05468;
p7=.1145; p8= .0725; p9= .1145; p10= .1582; p11= .0725; p12=.1582;
For the case of minimization of overlap of sampling units, the
method of Ernst (1998) gives the following results.
p3= .14;
p6= .16;
p7= .24;
p10=.27;
p11=.05;
p12=0.05;
p8= 0.09;
For this example the summation of the probability of those
sample combinations which consists of minimum number of
overlapped sampling units for the procedure of Ernst (1998) is .3,
which is again less than the value achieved by the proposed procedure
248
Example 4: Consider the following population taken from Goodman
and Kish (1950) for D1 and from Tiwari, Nigam and Pant (2007) for
D2. Inclusion probabilities for the two designs are given as follows.
TABLE 4: Inclusion probabilities
i
1
2
3
4
5
6
π i1
0.4
0.6
0.4
0.8
1
0.8
π i2
0.42
0.42
0.45
0.48
0.66
0.57
We have to select a sample of size 4 for the design D1 and a
sample of size 3 for the design D2. The value of internal elements of
W, are given as follows
W=
0
.02
.4
.58
1
.18
0
.42
.4
1
0
.05
.4
.55
1
.32
0
.48
.2
1
.34
0
.66
0
1
.23
0
.57
.2
1
1.07
.07
2.93
1.93
6.0
All possible combinations satisfying condition (5.2.11) are as follows.
. . x .
. . x .
. . x .
. . . x
. . . x
. . . x
x . . .
x . . .
x . . .
x . . .
x . . .
x . . .
. . x .
. . . x
. . . x
. . x .
. . x .
. . . x
(1). . . x (2) . . x . (3). . . x (4). . x . (5) . . . x (6). . x .
. . x .
. . x .
. . x .
. . x .
. . x .
. . x .
. . . x
. . . x
. . x .
. . . x
. . x .
. . x .
249
.
.
.
(7)x
.
.
.
.
.
.
.
.
x .
x .
. x
. .
x .
. x
. . x .
. . x .
.
. . . x
. . . x
.
. . x .
. . . x
.
(8)x . . . (9) x . . . (10)x
. . x .
. . x .
.
. . . x
. . x .
.
. . x
. . . x
. . . x
. x .
. . x .
. . . x
. x .
. . . x
. . x .
. . . (11)x . . . (12)x . . .
. x .
. . x .
. . x .
. . x
. . x .
. . x .
. . x .
. . x .
. . x .
. . x .
. . x .
. . x .
. . x .
. . x .
. . x .
. . . x
. . . x
. . . x
(13). . x . (14). . . x (15). . . x(16). . x . (17). . x .(18). . . x
. . . x
. . x .
. . . x
. . x .
. . . x . . x .
x . . .
x . . .
x . . .
x . . .
x . . . x . . .
. . . x
. . . x
. . x .
. . . x
. . x . . . x .
. . . x
. . . x
. . . x
. . . x
. . x .
. . x .
. . x .
. . x .
. . x .
. . . x
. . x .
. . . x
(19) . . x . (20). . x . (21). . . x (22). . x .(23) . . . x (24). . x .
. . x .
. . . x
. . x .
. . x .
. . . x
. . . x
x . . .
x . . .
x . . .
x . . .
. . x .
. . x .
. . . x
. . x .
. . x .
. . x . x . . .
x . . .
. . x .
. . . x
. . . x
. . . x
. x . . . x .
. . . x
. . x .
. . x .
. . x .
x . . . x . .
(25). . . x (26). . x . (27) . . . x (28). . x . (29). . x .(30). . .
. . x .
. . . x
. . x .
. . x .
x . . .
x . .
. . x .
. . x .
. . x .
. . x .
. . x . . . x
x . . .
x . . .
x . . .
x . . .
. . . x . . x
. . x .
. . . x
. x
x . . .
x . . .
x .
(31). x . . (32). x . . (33) . .
x . . .
x . . .
. .
. . x .
. . x .
x .
. . . x
. . x .
. .
. . x
x . .
. x .
(37). . .
x . .
. . x
.
.
x
.
.
.
. .
. x . .
. x . .
. . x .
. .
x . . .
x . . .
x . . .
x .(34) . . x . (35) . . . x(36). x . .
x .
. . . x
. . x .
. . x .
. .
x . . .
x . . .
x . . .
. x
. . x .
. . x .
. . . x
.
. . . x
. x . .
. x . .
.
x . . .
x . . .
x . . .
.
. x . .
. . x . . . . x
x (38). . x . (39). . . x(40). . x .
.
x . . .
. . x . . . x .
.
. . x .
x . . . x . . .
250
. . x .
. . . x
x . . .
x . . .
. x . .
. x . .
(41) . . . x(42). . x .
. . x .
. . x .
x . . . x . . .
. x . .
. x . .
. x . . . . x
. . x .
. . x .
. . . x . . x
. . x .
. . . x
. . x . . x .
(43)x . . . (44) x . . . (45)x . . .(46)x . .
x . . .
x . . .
x . . . x . .
. . . x
. . x .
. . x . . . .
. x .
. . x
(49). . .
x . .
. . x
x . .
.
.
.
.
.
x
. . x .
. . . x
. . . x
. . x .
. x . .
. x . .
(47)x . . . (48)x . . .
x . . .
x . . .
. . x .
. . x .
.
. x . .
. . x .
. . . x
.
.
. . . x
. . . x
. . x .
.
x (50). . x . (51). x . . (52). x . .(53) .
.
x . . .
x . . .
x . . .
.
.
. . x . . . x .
. . x . x
.
x . . .
x . . .
x . . .
x
. x . .
. . . x
. . x .
(55). . x .
x . . .
x . . .
. . x .
. . x .
. x . .
(56) . . . x
x . . .
x . . .
. . x .
. . . x
. x . .
(57) . . x .
x . . .
x . . .
x . .
. x . .
. x .
. . x .
. x . (54) . . . x
. . x
. . x .
. . .
x . . .
. . .
x . . .
. . . x
. . x .
. x . .
(58) . . x .
x . . .
x . . .
After applying the proposed model, we could not get a feasible
solution for this example. Since we have to select a sample of size 3
and 4 for this population so constraint (iv) is not necessary for nonnegative estimation of H-T estimator. Thus we dropped constraint (iv)
After dropping constraint (iv), we get following solutions.
Z = .93
p1= 0;
p2= .0111;
p3= .0154;
p4= .0123;
p5= .0169;
p7=.019; p8=.0133;
p9=.0794;
p10=.0202;
p11=.0685; p12= .0882;
p13=.062; p14= .01;
p15= .0139;
p16= .0232;
p17= .0121; p18= .0435;
p19=.011; p20= .0154; p21= .0544;
p22= .0485;
p23= .0127; p24= .007;
p25=.038; p26= .0132; p27= .0782;
p28= .0415;
p29= .0002; p30= .001;
251
p6= .102;
p31=.004; p32= .003; p33= .0004;
p34= .00008; p35= .0005; p36= .001;
p37= .002; p38= .0015; p39= .0021; p40= .0006; p41= .0063; p42= .001;
p43=.009; p44= .001;
p45= .0005;
p49= .001; p50= .0006; p51= .0028;
p46= .004;
p47= .0012; p48= .002;
p52= .0017; p53= .0021; p54= .0004;
p55= .0002; p56= .0194; p57= .0005; p58= .0011;
We have also solved this example by the method of Ernst
(1998) and found the following results.
p1= .17;
p9= .22;
p10= .03;
p11= .07;
p21= .26; p28= .16;
p55= .02;
p58= .05;
p20= .02;
For this example also the summation of the probability of those
sample combinations which consist of maximum number of
overlapped sampling units for the procedure of Ernst (1998) is same
as achieved by the proposed procedure but again for this example the
variance estimation is possible with the proposed procedure.
For the case of minimization of overlap of sampling units, the
proposed procedure could not get a feasible solution with the
constraints (iv) and (vi), so we dropped the constraints (iv) and (vi)
and get the following results.
Z = .67
p1= 0;
p2= .0;
p6= .001; p7= .0184;
p3= .0;
p4= .0;
p5= .001015;
p8= .0;
p9= .0647;
p10= .0184;
252
p11= .027;
p12= .0213;
p13= .0434;
p14= .0632;
p15= .0272;
p16= .0158; p17= .0215;
p18= .0423;
p19= .0408;
p20= .0286;
p21= .049; p22= .0438;
p23= .0106;
p24= .0107;
p25= .0315;
p26= .0178; p27= .0386;
p28= .0329;
p29= .000;
p30= .001;
p31= .00;
p32= .00;
p33= .00;
p34= .0122;
p35= .0047;
p36= .000; p37= .00;
p38= .00;
p39= .00;
p40= .0;
p41= .0;
p42= .0;
p43= .0;
p44= .0;
p45= .0237;
p46= .0;
p47= .0151;
p48= .0453;
p49= .0169;
p50= .0124;
p51= .0;
p52= .0157;
p53= .0202;
p54= .0535;
p55= .0353;
p56= .0067; p57= .029;
p58= .038;
We have also solved this example by the method of Ernst
(1998) and get the following results.
p9= .01;
p16= .01;
p21= .42;
p24= .2;
p25= .02;
p36= .02; p43= .17;
p51= .11;
p55= .02;
p57= .02;
For this example the summation of the probability of those
sample combinations which consist of minimum number of
overlapped sampling units for the procedure of Ernst (1998) is .66,
which is again less than the value achieved by the proposed
procedure.
Thus we see that for the proposed procedure, in all the
examples discussed above the summation of the probability of those
sample combinations which consists of maximum number of
253
overlapped sampling units ( in case of positive co-ordination) and the
summation of the probability of those sample combinations which
consists of minimum number of overlapped sampling units ( in case of
negative co-ordination) is always greater than or equal to the
summation of the probabilities obtained by the procedure of Ernst
(1998), with the advantage of estimation of variance.
254
APPENDIX 5.0
Example 1: Case II (Minimization): The internal elements of W,
using (5.2.15), (5.2.6) and (5.2.7) are given as follows.
W=
0.2
0.4
0.4
0.0
1.0
0.4
0.4
0.0
0.2
1.0
0.8
0.2
0.0
0.0
1.0
0.6
0.4
0.0
0.0
1.0
0.6
0.2
0.0
0.2
1.0
2.6
1.6
0.4
0.4
5.0
Now above problem becomes of solving the controlled
selection problem with N= 20 and n= 5. All possible combinations
satisfying condition (5.2.11) are as follows.
. x
. x
(1)x .
x .
x .
.
.
.
.
.
.
. x . .
x . . .
. x .
.
x . . .
. x . .
x . .
. (2). x . . (3). x . . (4)x . .
.
x . . . x . . .
. x .
.
x . . . x . . .
x . .
.
x . . .
x . .
.
. x . .
x . .
. (5)x . . . (6). x .
. . x . .
. x .
. x . . .
x . .
. x
x .
(7) x .
x .
. x
.
.
.
.
.
.
x . .
.
. x .
. (8)x . .
.
x . .
.
. x .
.
. . x .
. . x .
.
. . . x
. . . x
. (11). x . . (12)x . . .
.
x . . .
. x . .
.
x . . .
x . . .
. . x .
. . x
. . . x
. x .
(13)x . . . (14)x . .
x . . .
x . .
. x . .
. . .
. x . .
. x . .
. (9). x .
. x . .
. . x .
.
x .
.
x .
. (10)x .
.
. x
.
. x
.
.
.
.
.
. . . x .
. . x .
. x . . .
x . . .
. (15). x . . (16)x . . .
. x . . .
. x . .
x
. . . x
. . . x
Now we apply the proposed model as follows.
255
.
.
.
.
.
Max
z = p1+p2+p3+p4+p5+p6+p7+p8+p9+p10
Subject to the Constraints
1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16=1
2. p3+p5+p6+p8+p9+p10=.2
3. p1+p2+p4+p7=.4
4. p11+p12+p13+p14+p15+p16=.4
5. p2+p4+p6+p7+p9+p10+p15+p16=.4
6. p1+p3+p5+p8+p14=.4
7. p11+p12+p13=.2
8. p1+p4+p5+p7+p8+p10+p12+p13+p14+p16=.8
9. p2+p3+p6+p9+p11+p15=.2
10. p1+p2+p3+p7+p8+p9+p11+p13+p14+p15=.6
11. p4+p5+p6+p10+p12+p16=.4
12. p1+p2+p3+p4+p5+p6+p11+p12 =.6
13. p7+p8+p9+p10+p13=.2
14. p14+p15+p16=.2
15. p6+p9+p10+p15+p16<=.24
16. p5+p8+p10+p12+p13+p14+p16<=.48
17. p3+p8+p9+p11+p13+p14+p15<=.36
18. p3+p5+p6+p11+p12<=.36
19. p4+p7+p10+p16<=.32
20. p2+p7+p9+p15<=.24
21. p2+p4+p6<= .24
22. p1+p7+p8+p13+p14<=.48
23. p1+p4+p5+p12<=.48
24. p1+p2+p3+p11<=.36
256
25. p1+p14<=.32
26. p2+p11+p15<=.16
27. p4+p12+p16<=.32
28. p7+p13<=.16
29. p3<=.08
30. p5<=.16
31. p8<=.08
32. p6<=.08
33. p9<=.04
34. p10<=.08
35. pi(s)≥ 0 for i =1,…,16.
36. The equations from15 to 34 should be ≥ 0.
After solving above model, we get the desired results displayed
already in example 1.
Example 2: Case I (Maximization):
The internal elements of W and all possible combinations for this
problem are already defined in example 2. The objective function and
the constraints are given as follows.
Max z = p55+p56+p57+p58+p59+p60+p61+p62+p63+p64+p65+p66+p67+p68+
p69+p70 +p71+p72+p73+p74
Subject to the Constrains
1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+p17+
p18+p19+p20+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+p31+p32+
257
p33+p34+p35+p36+p37+p38+p39+p40+p41+p42+p43+p44+p45+p46+p47+
p48+p49+p50+p51+p52+p53+p54+p55+p56+p57+p58+p59+p60+p61+p62+
p63+p64+p65+p66+p67+p68+p69+p70+p71+p72+p73+p74=1
2. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+p17+
p18= .12
3. p19+p20+p21+p25+p26+p27+p31+p32+p33+p37+p38+p39+p43+p44+p45+
p49+p50+p51+p55+p56+p57+p58+p59+p60+p61+p62+p63+p64=.3
4. p22+p23+p24+p28+p29+p30+p34+p35+p36+p40+p41+p42+p46+p47+p48+
p52+p53+p54+p65+p66+p67+p68+p69+p70+p71+p72+p73+p74=.58
5. p1+p2+p3+p4+p5+p6+p19+p20+p21+p22+p23+p24+p37+p38+p39+p40+
p41+p42=.03
6. p7+p8+p9+p13+p14+p15+p25+p28+p29+p31+p34+p35+p43+p46+p47+p49
+p52+p53+p55+p56+p57+p58+p65+p66+p67+p68+p69+p70=.42
7. p10+p11+p12+p16+p17+p18+p26+p27+p30+p32+p33+p36+p44+p45+p48+
p50+p51+p54+p59+p60+p61+p62+p63+p64+p71+p72+p73+p74=.55
8. p19+p20+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+p31+p32+p33+
p34+p35+p36=.15
9. p1+p2+p3+p7+p10+p11+p13+p16+p17+p37+p40+p41+p44+p46+p48+p50+
p52+p54+p55+p59+p60+p61+p65+p66+p67+p71+p72+p73=.3
10. p4+p5+p6+p8+p9+p12+p14+p15+p18+p38+p39+p42+p43+p45+p47+p49+
p51+p53+p56+p57+p58+p62+p63+p64+p68+p69+p70+p74=.55
11. p7+p8+p9+p10+p11+p12+p25+p26+p27+p28+p29+p30+p43+p44+p45+p46+
p47+p48=.12
12. p1+p4+p5+p14+p16+p18+p19+p22+p23+p32+p34+p36+p38+p40+p42+p51
+p53+p54+p56+p59+p62+p63+p65+p68+p69+p71+p72+p74=.48
13. p2+p3+p6+p13+p15+p17+p20+p21+p24+p31+p33+p35+p37+p39+p41+p49
258
+p50+p52+p55+p57+p58+p60+p61+p64+p66+p67+p70+p73=.4
14. p13+p14+p15+p16+p17+p18+p31+p32+p33+p34+p35+p36+p49+p50+p51+
p52+p53+p54=.15
15. p2+p4+p6+p8+p10+p12+p20+p22+p24+p26+p28+p30+p39+p41+p42+p45+
p47+p48+p51+p60+p62+p64+p66+p68+p70+p71+p73+p74=.66
16. p1+p3+p5+p7+p9+p11+p19+p21+p23+p25+p27+p29+p37+p38+p40+p43+
p44+p46+p55+p56+p58+p59+p61+p63+p65+p67+p69+p72=.19
17. p37+p38+p39+p40+p41+p42+p43+p44+p45+p46+p47+p48+p49+p50+p51+
p52+p53+p54=.03
18. p3+p5+p6+p9+p11+p12+p15+p17+p18+p21+p23+p24+p27+p29+p30+p33+
p35+p36+p58+p61+p63+p64+p67+p69+p70+p72+p73+p74=.54
19. p1+p2+p4+p7+p8+p10+p13+p14+p16+p19+p20+p22+p25+p26+p28+p31+
p32+p34+p55+p56+p57+p59+p60+p62+p65+p66+p68+p71 =.43
20. p7+p8+p9+p13+p14+p15+p25+p31+p43+p49+p55+p56+p57+p58<=.1764
21. p1+p2+p3+p7+p10+p11+p13+p16+p17+p19+p20+p21+p25+p26+p27+p31+
p32+p33+p37+p44+p50+p55+p59+p60+p61<=.189
22. p1+p4+p5+p14+p16+p18+p19+p32+p38+p51+p56+p59+p62+p63<=.2016
23. p2+p4+p6+p8+p10+p12+p20+p26+p39+p45+p57+p60+p62+p64<=.2772
24. p3+p5+p6+p9+p11+p12+p15+p17+p18+p21+p27+p33+p37+p38+p39+p43+
p44+p45+p49+p50+p51+p58+p61+p63+p64<=.2394
25. p7+p13+p25+p28+p29+p31+p34+p35+p46+p52+p55+p65+p66+p67<=.189
26. p14+p34+p53+p56+p65+p68+p69<=.2016
27. p8+p28+p47+p57+p66+p68+p70<=.2772
28. p9+p15+p29+p35+p43+p46+p47+p49+p52+p53+p58+p67+p69+p70<=.2394
29. p1+p16+p19+p22+p23+p32+p34+p36+p40+p54+p59+p65+p71+p72<=.216
30. p2+p10+p20+p22+p24+p26+p28+p30+p41+p48+p60+p66+p71 +p73 <=.297
259
31. p3+p11+p17+p21+p23+p24+p27+p29+p30+p33+p35+p36+p37+p40+p41+
p44+p46+p48+p50+p52+p54+p61+p67+p72+p73<=.2565
32. p4+p22+p42+p62+p68+p71+p74<=.3168
33. p5+p18+p23+p36+p38+p40+p42+p51+p53+p54+p63+p69+p72+p74<=.2736
34. p6+p12+p24+p30+p39+p41+p42+p45+p47+p48+p64+p70+p73+p74<=.3762
35. p19+p20+p21+p25+p31+p37+p38+p39+p43+p49+p55+p56+p57+p58<=.135
36. p37+p44+p50+p55+p59+p60+p61 <=.09
37. p19+p25+p26+p27+p32+p38+p43+p44+p45+p51+p56+p59+p62+p63<=.18
38. p20+p26+p31+p32+p33+p39+p45+p49+p50+p51+p57+p60+p62+p64<=.243
39. p21+p27+p33+p58+p61+p63+p64<=.162
40. p1+p2+p3+p7+p13+p37+p40+p41+p46+p52+p55+p65+p66+p67<=.135
41. p1+p4+p5+p7+p8+p9+p14+p19+p22+p23+p25+p28+p29+p34+p38+p40+
p42+p43+p46+p47+p53+p56+p65+p68+p69<=.27
42. p2+p4+p6+p8+p13+p14+p15+p20+p22+p24+p28+p31+p34+p35+p39+p41+
p42+p47+p49+p52+p53+p57+p66+p68+p70<=.3645
43. p3+p5+p6+p9+p15+p21+p23+p24+p29+p35+p58+p67+p69+p70<=.243
44. p1+p7+p10+p11+p16+p40+p44+p46+p48+p54+p59+p65+p71+p72<=.18
45. p2+p10+p13+p16+p17+p41+p48+p50+p52+p54+p60+p66+p71 +p73 <=.243
46. p3+p11+p17+p61+p67+p72+p73<=.162
47. p4+p8+p10+p12+p14+p16+p18+p22+p26+p28+p30+p32+p34+p36+p42+p45
+p47+p48+p51+p53+p54+p62+p68+p71+p74<=.486
48. p5+p9+p11+p12+p18+p23+p27+p29+p30+p36+p63+p69+p72+p74<=.324
49. p6+p12+p15+p17+p18+p24+p30+p33+p35+p36+p64+p70+p73+p74<=.4374
50. pi(s) ≥0 for i=1,…,74.
51. The equations from 20 to 49 should be ≥ 0.
260
After solving above model, we get the desired results displayed
already in example 2.
Example 2: Case II (Minimization): The internal elements of W are
given as follows.
W=
0.42
0.3
0.0
0.28
1.0
0.42
0.45
0.0
0.13
1.0
0.45
0.3
0.0
0.25
1.0
0.4
0.52
0.08
0.0
1.0
0.19
0.34
0.47
0.0
1.0
0.46
0.43
0.11
0.0
1.0
2.34
2.34
0.66
0.66
6.0
Now above problem becomes of solving the controlled
selection problem with N= 24 and n= 6. All possible combinations
satisfying condition (5.2.11) are as follows.
x .
x .
(1)x .
. x
. x
. x
.
.
.
.
.
.
.
.
.
.
.
.
x .
x .
(2). x
x .
. x
. x
.
.
.
.
.
.
.
.
.
.
.
.
x .
x .
(3). x
. x
x .
. x
.
.
.
.
.
.
.
.
.
.
.
.
x . . .
x . . .
(4). x . .
. x . .
. x . .
x . . .
x . . .
. x . .
(5)x . . .
x . . .
. x . .
. x . .
x . . .
. x . .
(6)x . . .
. x . .
x . . .
. x . .
x .
. x
(7)x .
. x
. x
x .
.
.
.
.
.
.
.
.
.
.
.
.
x .
. x
(8). x
x .
x .
. x
.
.
.
.
.
.
.
.
.
.
.
.
x . . .
x . . .
. x . .
. x . .
. x . .
. x . .
x . . .
x . . .
(9). x . . (10). x . . (11)x . . . (12)x . . .
x . . .
. x . .
x . . .
. x . .
. x . .
x . . .
. x . .
x . . .
x . . .
x . . .
. x . .
. x . .
. x . .
. x . .
. x . .
. x . .
. x . .
. x . .
x . . .
x . . .
x . . .
x . . .
. x . .
. x . .
(13)x . . . (14). x . . (15). x . . (16). x . . (17)x . . . (18)x . . .
261
. x . .
. x . .
x . . .
x . . .
x . . .
. x . .
. x . .
. x . .
. x . .
. x . .
(19)x . . . (20). x . .
. x . .
x . . .
x . . .
x . . .
x . . .
x . . .
x . .
. . .
(25)x . .
. x .
. . x
. x .
x . .
. x .
(31). . .
. x .
x . .
. . x
.
x . .
x
. . .
. (26)x . .
.
. . x
.
. x .
.
. x .
x . . .
. x . .
x . . .
x . .
x . .
(21). . .
. x .
. x .
. . x
. x . .
x . . .
x . . .
x . . .
x . . .
. x . .
x . . .
. x . .
x . . .
.
x . . .
x . . .
x . . .
.
x . . .
x . . .
. . . x
x (22). . . x (23). . . x (24) x . . .
.
. x . .
. . x .
. x . .
.
. . x .
. x . .
. x . .
.
. x . .
. x . .
. . x .
.
x . . .
x
. x . .
. (27). . . x
.
x . . .
.
. x . .
.
. . x .
x . . .
x . . .
x . . .
. x . .
. . . x
. . . x
(28). . . x (29). x . . (30). x . .
x . . .
x . . . x . . .
. . x .
. x . . . . x .
. x . .
. . x . . x . .
.
x . . .
x . . .
.
. x . .
. . . x
x (32). . . x (33). x . .
.
. . x .
. x . .
.
x . . .
x . . .
.
. x . .
. . x .
x . . .
x . . .
x . .
. . . x
. x . .
. x .
(34). x . . (35). . . x (36). . .
. . x .
. x . .
. . x
x . . .
. . x .
. x .
. x . .
x . . .
x . .
.
.
x
.
.
.
x . . .
x . . .
. . . x
. . . x
. . . x
x . . .
(37). x . . (38). x . . (39)x . . .
. x . .
. . x .
. x . .
. . x .
. x . .
. x . .
x . . .
x . . .
. . x .
. . . x
. . . x
. x .
x . . .
x . . .
x . .
(40)x . . . (41)x . . . (42). . .
. x . .
. . x .
x . .
. . x .
. x . .
. x .
. x . .
. x . .
. . x
.
.
x
.
.
.
. x . .
. . . x
. . . x
x . . .
x . . .
x . . .
(43). . . x (44). x . . (45). x . .
x . . .
x . . .
x . . .
. . x .
. x . .
. . x .
. x . .
. . x .
. x . .
262
. x .
x . .
(46). . .
. x .
x . .
. . x
.
. x .
.
x . .
x (47). . .
.
. . x
.
x . .
.
. x .
.
. . . x
.
x . . .
x (48). x . .
.
. x . .
.
x . . .
.
. . x .
. . .
x . .
(49). x .
. . x
x . .
. x .
x
. x . .
. x . .
.
x . . .
x . . .
. (50). . . x (51). . . x
.
. x . .
. . x .
.
. . x .
. x . .
.
x . . .
x . . .
. . . x
. . . x
. x . .
x . . .
x . . .
. . . x
(52). x . . (53). x . .(54)x . . .
. x . .
. . x .
x . . .
. . x .
. x . .
. x . .
x . . .
x . . .
. . x .
. x . .
. . . x
. . . x
. . . x
. . . x
. . . x
. x . .
. x . .
. x . .
. x . .
(55) x . . . (56)x . . . (57)x . . . (58)x . . . (59)x . . .
x . . .
x . . .
x . . .
. x . .
. . x .
. . x .
. x . .
. . x .
x . . .
x . . .
. x . .
. . x .
. x . .
. . x .
. x . .
. x . .
. . . x
(60)x . . .
. x . .
x . . .
. . x .
. x . .
. x . .
. x . .
. . . x
. . . x
. . . x
. . . x
. . . x
. x . .
. x . .
(61)x . . . (62)x . . . (63)x . . . (64)x . . . (65)x . . .
. . x .
. x . .
. . x .
. x . .
. . x .
x . . .
. . x .
. x . .
. . x .
. x . .
. x . .
x . . .
x . . .
x . . .
x . . .
. x .
. x .
(66). . .
x . .
x . .
. . x
.
.
x
.
.
.
. x . .
. . . x
(67). x . .
x . . .
x . . .
. . x .
. x .
. x .
(72). . .
. . x
x . .
x . .
.
.
x
.
.
.
. . . x
. x . .
. x . .
. x . .
. x . .
. . . x
(68). x . . (69). . . x (70). x . .
x . . .
x . . .
x . . .
x . . .
. . x .
. . x .
. . x .
x . . .
x . . .
. . . x
. x . .
. x . .
. . . x
(73). x . . (74). x . .
. . x .
. . x .
x . . .
x . . .
x . . .
x . . .
263
. . . x
. x . .
(71). x . .
x . . .
. . x .
x . . .
Now we apply the proposed model as follows.
Max Z=p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16
+p17+p18+p19+p20
Subject to the Constraints
1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+p17+
p18+p19+p20+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+p31+p32+
p33+p34+p35+p36+p37+p38+p39+p40+p41+p42+p43+p44+p45+p46+p47+
p48+p49+p50+p51+p52+p53+p54+p55+p56+p57+p58+p59+p60+p61+p62+
p63+p64+p65+p66+p67+p68+p69+p70+p71+p72+p73+p74=1
2. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p21+p22+p23+p24+p25+p26+p27+
p28+p29+p30+p31+p32+p33+p34+p35+p36+p37+p38=.42
3. p11+p12+p13+p14+p15+p16+p17+p18+p19+p20+p42+p43+p46+p47+p50+
p51+p54+p55+p60+p61+p62+p63+p66+p67+p69+p70+p72+p74=.3
4. p39+p40+p41+p44+p45+p48+p49+p52+p53+p56+p57+p58+p59+p64+p65+
p68+p71+p73=.28
5. p1+p2+p3+p4+p11+p12+p13+p14+p15+p16+p21+p22+p23+p39+p40+p41+
p42+p43+p44+p45+p46+p47+p48+p49+p50+p51+p52+p53=.42
6. p5+p6+p7+p8+p9+p10+p17+p18+p19+p20+p27+p28+p31+p32+p35+p36+
p56+p57+p58+p59+p64+p65+p66+p68+p69+p71+p72+p73=.45
7. p24+p25+p26+p29+p30+p33+p34+p37+p38+p54+p55+p60+p61+p62+p63+
p67+p70+p74=.13
8. p1+p5+p6+p7+p11+p12+p13+p17+p18+p19+p24+p25+p26+p39+p40+p41+
p54+p55+p56+p57+p58+p59+p60+p61+p62+p63+p64+p65=.45
9. p2+p3+p4+p8+p9+p10+p14+p15+p16+p20+p29+p30+p33+p34+p37+p38+
p44+p45+p48+p49+p52+p53+p67+p68+p70+p71+p73+p74=.3
264
10. p21+p22+p23+p27+p28+p31+p32+p35+p36+p42+p43+p46+p47+p50+p51+
p66+p69+p72= .25
11. p2+p5+p8+p9+p11+p14+p15+p17+p18+p20+p27+p28+p29+p30+p42+p43+
p44+p45+p54+p55+p56+p57+p66+p67+p68+p69+p70+p71=.4
12. p1+p3+p4+p6+p7+p10+p12+p13+p16+p19+p21+p22+p24+p25+p31+p33+
p35+p37+p39+p40+p46+p48+p50+p52+p58+p60+p62+p64=.52
13. p23+p26+p32+p34+p36+p38+p41+p47+p49+p51+p53+p59+p61+p63+p65+
p72+p73+p74=.08
14. p3+p6+p8+p10+p12+p14+p16+p17+p19+p20+p31+p32+p33+p34+p46+p47+
p48+p49+p58+p59+p60+p61+p66+p67+p68+p72+p73+p74=.19
15. p1+p2+p4+p5+p7+p9+p11+p13+p15+p18+p21+p23+p24+p26+p27+p29+
p36+p38+p39+p41+p42+p44+p51+p53+p54+p56+p63+p65=.34
16. p22+p25+p28+p30+p35+p37+p40+p43+p45+p50+p52+p55+p57+p62+p64+
p69+p70+p71 =.47
17. p4+p7+p9+p10+p13+p15+p16+p18+p19+p20+p35+p36+p37+p38+p50+p51+
p52+p53+p62+p63+p64+p65+p69+p70+p71+p72+p73+p74=.46
18. p1+p2+p3+p5+p6+p8+p11+p12+p14+p17+p22+p23+p25+p26+p28+p30+
p32+p34+p40+p41+p43+p45+p47+p49+p55+p57+p59+p61 =.43
19. p21+p24+p27+p29+p31+p33+p39+p42+p44+p46+p48+p54+p56+p58+p60+
p66+p67+p68=.11
20. p1+p2+p3+p4+p21+p22+p23<=.1764
21. p1+p5+p6+p7+p24+p25+p26<=.189
22. p2+p5+p8+p9+p23+p26+p27+p28+p29+p30+p32+p34+p36+p38<=.2016
23. p3+p6+p8+p10+p22+p25+p28+p30+p31+p32+p33+p34+p35+p37<=.2772
24. p4+p7+p9+p10+p21+p24+p27+p29+p31+p33+p35+p36+p37+p38<=.2394
25. p1+p11+p12+p13+p39+p40+p41<=.189
265
26. p2+p11+p14+p15+p23+p41+p42+p43+p44+p45+p47+p49+p51+p53<=.2016
27. p3+p12+p14+p16+p22+p40+p43+p45+p46+p47+p48+p49+p50+p52<=.2772
28. p4+p13+p15+p16+p21+p39+p42+p44+p46+p48+p50+p51+p52+p53<=.2394
29. p5+p11+p17+p18+p26+p41+p54+p55+p56+p57+p59+p61+p63+p65<=.216
30. p6+p12+p17+p19+p25+p40+p55+p57+p58+p59+p60+p61+p62+p64<=.297
31. p7+p13+p18+p19+p24+p39+p54+p56+p58+p60+p62+p63+p64+p65<=.2565
32. p8+p14+p17+p20+p28+p30+p32+p34+p43+p45+p47+p49+p55+p57+p59+
p61+p66+p67+p68+p69+p70+p71+p72+p73+p74<=.3168
33. p9+p15+p18+p20+p27+p29+p36+p38+p42+p44+p51+p53+p54+p56+p63+
p65+p66+p67+p68+p69+p70+p71+p72+p73+p74<=.2736
34. p10+p16+p19+p20+p31+p33+p35+p37+p46+p48+p50+p52+p58+p60+p62+
p64+p66+p67+p68+p69+p70+p71+p72+p73+p74<=.3762
35. p17+p18+p19+p20+p66+p69+p72<=.135
36. p14+p15+p16+p20+p67+p70+p74<=.09
37. p12+p13+p16+p19+p46+p47+p50+p51+p60+p61+p62+p63+p72+p74<=.18
38. p11+p13+p15+p18+p42+p43+p50+p51+p54+p55+p62+p63+p69+p70<=.243
39. p11+p12+p14+p17+p42+p43+p46+p47+p54+p55+p60+p61+p66+p67<=.162
40. p8+p9+p10+p20+p68+p71+p73<=.135
41. p6+p7+p10+p19+p31+p32+p35+p36+p58+p59+p64+p65+p72+p73<=.27
42. p5+p7+p9+p18+p27+p28+p35+p36+p56+p57+p64+p65+p69+p71<=.3645
43. p5+p6+p8+p17+p27+p28+p31+p32+p56+p57+p58+p59+p66+p68<=.243
44. p3+p4+p10+p16+p33+p34+p37+p38+p48+p49+p52+p53+p73+p74<=.18
45. p2+p4+p9+p15+p29+p30+p37+p38+p44+p45+p52+p53+p70+p71<=.243
46. p2+p3+p8+p14+p29+p30+p33+p34+p44+p45+p48+p49+p67+p68<=.162
47. p1+p4+p7+p13+p21+p22+p23+p24+p25+p26+p35+p36+p37+p38+p39+p40
+p41+p50+p51+p52+p53+p62+p63+p64+p65<=.486
266
48. p1+p3+p6+p12+p21+p22+p23+p24+p25+p26+p31+p32+p33+p34+p39+p40
+p41+p46+p47+p48+p49+p58+p59+p60+p61 <=.324
49. p1+p2+p5+p11+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+
p39+p40+p41+p42+p43+p44+p45+p54+p55+p56+p57<=.4374
50. pi(s)≥ 0 for i =1,…,74.
51. The equations from 20 to 49 should be ≥0
After solving above model, we get the desired results
displayed already in example 2.
Example 3: Case I (Maximization):
The internal elements of W and all possible combinations for
this problem are already defined in example 3. The objective function
and the constraints are given as follows.
Max z = p1+p2+p3+p4+p5+p6
Subject to the equations:
1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14=1
2. p9+p10+p13+p14=.01842
3. p3+p5+p6+p8+p12=.14562
4. p1+p2+p4+p7+p11=.83596
5. p7+p8+p11+p12=.02398
6. p2+p4+p6+p10+p14=.6462
7. p1+p3+p5+p9+p13=.32982
8. p11+p12+p13+p14=.02574
9. p1+p4+p5+p7+p9=.5596
10. p2+p3+p6+p8+p10=.41466
267
11. p7+p8+p9+p10=.01666
12. p1+p2+p3+p11+p13=.60618
13. p4+p5+p6+p12+p14=.37716
14. p6<=.0941
15. p5+p12<=.085237
16. p3+p8<=.090698
17. p4+p14<=.378247
18. p2+p10<=.402479
19. p1+p7+p9+p11+p13<=.364573
20. p6+p8+p10+p12+p14<=.109936
21. p5+p9<=.091797
22. p3+p13<=.099438
23. p4+p7<=.375033
24. p2+p11<=.40625
25. p1<=.339218
26. pi(s) ≥0 for i= 1,…,14.
27. The equations from 14 to 25 should be ≥0.
After solving above model, we get the desired results
displayed already in example 3.
Example 3: Case II (Minimization): The internal elements of W are
given as follows.
W=
0.14562
0.16404
0.0
0.69034
1.0
0.32982
0.3538
0.31638
0.0
1.0
0.4404
0.41466
0.14494
0.0
1.0
0.39382
0.37716
0.22902
0.0
1.0
1.30966
1.30966
0.69034
0.69034
4.0
268
Now above problem becomes of solving the controlled
selection problem with N= 16 and n= 4. All possible combinations
satisfying condition (5.2.11) are as follows.
x . . .
(1) x . . .
. x . .
. x . .
. . . x
(7)x . . .
. x . .
. . x .
x . . .
(2). x . .
x . . .
. x . .
.
(8)x
.
.
. . x
. . .
. x .
x . .
x . . .
(3). x . .
. x . .
x . . .
. . .
(9). x .
x . .
. . x
. x . .
(4)x . . .
x . . .
. x . .
x
. . . x
. (10). . x .
.
x . . .
.
. x . .
. x
(5)x .
. x
x .
Max z = p1+p2+p3+p4+p5+p6
Subject to the equations
1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12=1
3. p4+p5+p6=.16404
4. p7+p8+p9+p10+p11+p12=.69034
5. p1+p4+p5+p7+p8=.32982
6. p2+p3+p6+p9+p11=.3538
7. p10+p12=.31638
8. p2+p4+p6+p9+p10= .4404
9. p1+p3+p5+p7+p12=.41466
10. p8+p11=.14494
11. p3+p5+p6+p11+p12=.39382
12. p1+p2+p4+p8+p10=.37716
269
.
.
.
.
. x
(6). x
x .
x .
.
.
.
.
.
.
.
.
. . . x
. . . x
(11). x . . (12). . x .
. . x .
. x . .
x . . .
x . . .
Now we apply the proposed model as follows.
2. p1+p2+p3=.14562
.
.
.
.
13. p7+p9=.22902
14. p1<=.0941
15. p2<=.085237
16. p3<=.090698
17. p4+p8+p10<=.378247
18. p5+p7+p12<=.402479
19. p6+p9+p11 <=.364573
20. p6<=.109936
21. p5<=.091797
22. p4<=.099438
23. p3+p11+p12<=.375033
24. p2+p9+p10<=.40625
25. p1+p7+p8<=.339218
26. pi(s) ≥0 for i=1,…,12.
27. The equations from 14 to 25 should be ≥0
After solving above model, we get the desired results
displayed already in example 3.
Example 4: Case I (Maximization):
The internal elements of W and all possible combinations for
this problem are already defined in example 4. The objective function
and the constraints are given as follows.
Max z =p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+
p17+p18+p19+p20+p21+p22+p23+p24+p25+p26+p27+p28
Subject to the Constraints
270
1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+p17+
p18+p19+p20+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+p31+p32+
p33+p34+p35+p36+p37+p38+p39+p40+p41+p42+p43+p44+p45+p46+p47+
p48+p49+p50+p51+p52+p53+p54+p55+p56+p57+p58=1
2. p29+p30+p33+p34+p35+p39+p40+p43+p44+p45+p49+p50+p53+p54+p55=
.02
3. p1+p2+p3+p7+p8+p9+p13+p14+p15+p16+p17+p18+p23+p24+p25+p31+
p36+p37+p41+p46+p47+p51+p56+p57=.4
4. p4+p5+p6+p10+p11+p12+p19+p20+p21+p22+p26+p27+p28+p32+p38+p42
+p48+p52+p58=.58
5. p1+p2+p3+p4+p5+p6+p29+p30+p31+p32+p33+p34+p35+p36+p37+p38+
p39+p40+p41+p42=.18
6. p7+p10+p11+p13+p14+p15+p19+p20+p21+p23+p26+p27+p43+p44+p46+
p48+p49+p52+p53+p54+p56+p58=.42
7. p8+p9+p12+p16+p17+p18+p22+p24+p25+p28+p45+p47+p50+p51+p55+
p57=.4
8. p31+p32+p36+p37+p38+p42+p43+p44+p47+p48+p51+p52+p56+p57+p58=
.05
9. p4+p5+p8+p10+p12+p13+p16+p17+p19+p20+p22+p24+p26+p28+p29+p33
+p34+p39+p43+p45+p50+p53+p55=.4
10. p2+p3+p6+p7+p9+p11+p14+p15+p18+p21+p23+p25+p27+p30+p35+p40+
p44+p49+p54=.55
11. p7+p8+p9+p10+p11+p12+p29+p30+p31+p32+p43+p44+p45+p46+p47+p48
+p49+p50+p51+p52=.32
12. p2+p4+p6+p14+p16+p18+p19+p21+p22+p25+p27+p28+p33+p35+p36+p38
+p40+p42+p54+p55+p57+p58=.48
271
13. p1+p3+p5+p13+p15+p17+p20+p23+p24+p26+p34+p37+p39+p41+p53+p56
=.2
14. p13+p14+p15+p16+p17+p18+p19+p20+p21+p22+p33+p34+p35+p36+p37+
p38+p43+p44+p45+p46+p47+p48+p53+p54+p55+p56+p57+p58=.34
15. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p23+p24+p25+p26+p27+
p28+p29+p30+p31+p32+p39+p40+p41+p42+p49+p50+p51+p52=.66
16. p23+p24+p25+p26+p27+p28+p39+p40+p41+p42+p49+p50+p51+p52+p53+
p54+p55+p56+p57+p58=.23
17. p3+p5+p6+p9+p11+p12+p15+p17+p18+p20+p21+p22+p30+p32+p34+p35+
p37+p38+p44+p45+p47+p48=.57
18. p1+p2+p4+p7+p8+p10+p13+p14+p16+p19+p29+p31+p33+p36+p43+p46
=.2
19. p1+p2+p3+p7+p13+p14+p15+p23+p31+p36+p37+p41+p46+p56<=.24
20. p1+p8+p13+p16+p17+p24<=.16
21. p2+p14+p16+p18+p25+p36+p57<=.32
22. p1+p2+p3+p7+p8+p9+p13+p14+p15+p16+p17+p18+p23+p24+p25+p31+
p36+p37+p41+p46+p47+p51+p56+p57<=.4
23. p3+p9+p15+p17+p18+p23+p24+p25+p37+p41+p47+p51+p56+p57<=.32
24. p1+p4+p5+p10+p13+p19+p20+p26+p29+p33+p34+p39+p43+p53<=.24
25. p2+p4+p6+p7+p10+p11+p14+p19+p21+p27+p29+p30+p31+p32+p33+p35+
p36+p38+p40+p42+p43+p44+p46+p48+p49+p52+p54+p58<=.48
26. p1+p2+p3+p4+p5+p6+p7+p10+p11+p13+p14+p15+p19+p20+p21+p23+p26
+p27+p29+p30+p31+p32+p33+p34+p35+p36+p37+p38+p39+p40+p41+p42
+p43+p44+p46+p48+p49+p52+p53+p54+p56+p58<=.6
27. p3+p5+p6+p11+p15+p20+p21+p23+p26+p27+p30+p32+p34+p35+p37+p38
+p39+p40+p41+p42+p44+p48+p49+p52+p53+p54+p56+p58<=.48
272
28. p4+p8+p10+p12+p16+p19+p22+p28+p29+p33+p43+p45+p50+p55<=.32
29. p1+p4+p5+p8+p10+p12+p13+p16+p17+p19+p20+p22+p24+p26+p28+p29+
p33+p34+p39+p43+p45+p50+p53+p55<=.4
30. p5+p12+p17+p20+p22+p24+p26+p28+p34+p39+p45+p50+p53+p55<=.32
31. p2+p4+p6+p7+p8+p9+p10+p11+p12+p14+p16+p18+p19+p21+p22+p25+
p27+p28+p29+p30+p31+p32+p33+p35+p36+p38+p40+p42+p43+p44+p45+
p46+p47+p48+p49+p50+p51+p52+p54+p55+p57+p58<=.8
32. p6+p9+p11+p12+p18+p21+p22+p25+p27+p28+p30+p32+p35+p38+p40+
p42+p44+p45+p47+p48+p49+p50+p51+p52+p54+p55+p57+p58<=.64
33. p3+p5+p6+p9+p11+p12+p15+p17+p18+p20+p21+p22+p23+p24+p25+p26+
p27+p28+p30+p32+p34+p35+p37+p38+p39+p40+p41+p42+p44+p45+p47+
p48+p49+p50+p51+p52+p53+p54+p55+p56+p57+p58<=.8
34. p7+p13+p14+p15+p23+p43+p44+p46+p49+p53+p54+p56<=.1764
35. p1+p8+p13+p16+p17+p24+p29+p31+p33+p34+p36+p37+p39+p41+p43+
p45+p46+p47+p50+p51+p53+p55+p56+p57<=.189
36. p2+p14+p16+p18+p25+p33+p35+p36+p40+p54+p55+p57<=.2016
37. p1+p2+p3+p7+p8+p9+p23+p24+p25+p29+p30+p31+p39+p40+p41+p49+
p50+p51 <=.2772
38. p3+p9+p15+p17+p18+p30+p34+p35+p37+p44+p45+p47<=.2394
39. p10+p13+p19+p20+p26+p43+p46+p48+p52+p53+p56+p58<=.189
40. p14+p19+p21+p27+p54+p58<=.2016
41. p7+p10+p11+p23+p26+p27+p49+p52<=.2772
42. p11+p15+p20+p21+p44+p48<=.2394
43. p4+p16+p19+p22+p28+p33+p36+p38+p42+p55+p57+p58<=.216
44. p1+p4+p5+p8+p10+p12+p24+p26+p28+p29+p31+p32+p39+p41+p42+p50+
p51 +p52<=.297
273
45. p5+p12+p17+p20+p22+p32+p34+p37+p38+p45+p47+p48<=.2565
46. p2+p4+p6+p25+p27+p28+p40+p42<=.3168
47. p6+p18+p21+p22+p35+p38<=.2736
48. p3+p5+p6+p9+p11+p12+p30+p32<=.3762
49. pi(s) ≥0 for i= 1,…, 58.
50. The equations from 19 to 48 ≥0.
After solving above model, we get the desired results displayed
already in example 4.
Example 4: Case II (Minimization): The internal elements of W are
given as follows.
W=
0.4
0.42
0.0
0.18
1.0
0.58
0.4
0.02
0.0
1.0
0.4
0.45
0.0
0.15
1.0
0.52
0.2
0.28
0.0
1.0
0.34
0.0
0.66
0.0
1.0
0.43
0.2
0.37
0.0
1.0
2.67
1.67
0.33
6.0
1.33
Now above problem becomes of solving the controlled selection
problem with N= 24 and n= 6. All possible combinations satisfying
condition (2.11) are as follows.
x . . .
. . x .
(1)x . . .
. x . .
x . . .
. x . .
x . .
. . x
(2). x .
x . .
x . .
. x .
.
.
.
.
.
.
x . . .
. . x .
(3). x . .
. x . .
x . . .
x . . .
274
. x .
. . x
(4)x . .
x . .
x . .
. x .
.
. x . .
.
. . x .
. (5)x . . .
.
. x . .
.
x . . .
.
x . . .
. x .
. . x
(6). x .
x . .
x . .
x . .
.
.
.
.
.
.
x . . .
x . . .
(7). x . .
. . x .
x . . .
. x . .
x . .
. x .
(8)x . .
. . x
x . .
. x .
x . . .
x . .
x . . .
x . .
(13)x . . . (14). x .
. x . .
x . .
. . x .
. . x
. x . .
. x .
.
.
.
.
.
.
x . .
. x .
(9). x .
. . x
x . .
x . .
.
x . .
.
x . .
. (15). x .
.
. x .
.
. . x
.
x . .
.
. x .
.
x . .
. (10)x . .
.
. . x
.
x . .
.
. x .
.
. x .
.
x . .
. (11). x .
.
. . x
.
x . .
.
x . .
.
. x .
.
. x .
. (12)x . .
.
. . x
.
x . .
.
x . .
.
.
.
.
.
.
.
x . . .
x . .
.
. x . .
. x .
. (16)x . . . (17)x . .
.
x . . .
. x .
.
. . x .
. . x
.
. x . .
x . .
.
x . .
.
. x .
. (18). x .
.
x . .
.
. . x
.
x . .
.
.
.
.
.
.
.
x . .
.
. x .
. (24)x . .
.
. x .
.
x . .
.
. . x
.
.
.
.
.
.
. x .
x . .
(19)x . .
x . .
. . x
. x .
.
. x
.
x .
. (20)x .
.
. x
.
. .
.
x .
. .
. x .
. .
x . .
. . (21). x .
. .
x . .
x .
. . x
. .
x . .
.
. x .
.
. x .
. (22)x . .
.
x . .
.
. . x
.
x . .
x . .
. x .
(25). x .
x . .
x . .
. . x
.
. x . .
. x .
.
x . . .
x . .
. (26)x . . . (27). x .
.
. x . .
x . .
.
x . . .
x . .
.
. . x .
. . x
.
. x .
.
. x .
. (28)x . .
.
x . .
.
x . .
.
. . x
x .
. .
(31). .
. .
x .
. x
. .
. x . .
. . . x
.
x .
. . x .
. . x .
.
. x (32). . . x (33)x . . . (34)x
x .
. . x .
x . . .
.
. .
x . . .
. . x .
.
. .
x . . .
. x . .
x
.
x . .
.
x . .
. (23). x .
.
. x .
.
x . .
.
. . x
.
. . . x
. . . x
.
. . x .
. . x .
. (29)x . . . (30). x . .
.
. . x .
. . x .
.
x . . .
x . . .
.
. x . .
x . . .
. . x
. . . x
x . . .
. x .
. . x .
. . x .
. . . (35). x . . (36). . . x
x . .
x . . .
x . . .
. x .
. . x .
. . x .
. . .
x . . .
. x . .
x . . .
. x . .
. . . x
. . . x
x . . .
. x . .
. . x .
. . x .
. . x .
. . x .
. . x .
. . x .
(37). . . x (38). . . x (39)x . . . (40). x . . (41). . . x (42). . . x
. x . .
x . . .
. x . .
x . . .
. x . .
x . . .
. . x .
. . x .
x . . .
x . . .
x . . .
x . . .
x . . .
x . . .
. . x .
. . x .
. . x .
. . x .
275
. . .
x . .
(43)x . .
. . x
. . x
. x .
x
. . . x
. . . x
.
x . . .
. x . .
. (44). x . . (45)x . . .
.
. . x .
. . x .
.
. . x .
. . x .
.
x . . .
x . . .
x . . .
x . . .
. x . .
x . . .
. x . .
x . . .
(46). . . x (47). . . x (48). . . x
. . x .
. . x .
. . x .
. . x .
. . x .
. . x .
. x . .
x . . .
x . . .
. . . x
. . . x
x . . .
. x .
x . . .
. x . .
. x . .
x . .
(49). x . . (50)x . . . (51). . . x (52). . .
. . x .
. . x .
. . x .
. . x
x . . .
x . . .
x . . .
x . .
. . x .
. . x .
. . x .
. . x
. . . x
x . . .
x . . .
. x . .
x . . .
. x . .
(55)x . . . (56). . . x (57). . . x
x . . .
. x . .
x . . .
. . x .
. . x .
. . x .
. . x .
. . x .
. . x .
.
. . . x
. . . x
.
x . . .
x . . .
x (53)x . . . (54). x . .
.
. x . .
x . . .
.
. . x .
. . x .
.
. . x .
. . x .
. x . .
x . . .
(58). . . x
x . . .
. . x .
. . x .
Now we apply the proposed model as follows.
Max z= p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+
p16+p17+p18+p19+p20+p21+p22+p23+p24+p25+p26+p27+p28
Subject to the constraints
1. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p13+p14+p15+p16+p17+p18+
p19+p20+p21+p22+p23+p24+p25+p26+p27+p28+p29+p30+p31+p32+p33+p34+
p35+p36+p37+p38+p39+p40+p41+p42+p43+p44+p45+p46+p47+p48+p49+p50+
p51+p52+p53+p54+p55+p56+p57+p58=1
2. p1+p2+p3+p7+p8+p9+p13+p14+p15+p16+p17+p18+p23+p24+p25+p31+p36+
p37+p41+p46+p47+p51+p56+p57=.4
3. p4+p5+p6+p10+p11+p12+p19+p20+p21+p22+p26+p27+p28+p32+p38+p42+p48+
p52+p58=.42
276
4. p29+p30+p33+p34+p35+p39+p40+p43+p44+p45+p49+p50+p53+p54+p55=.18
5. p7+p10+p11+p13+p14+p15+p19+p20+p21+p23+p26+p27+p43+p44+p46+p48+
p49+p52+p53+p54+p56+p58=.58
6. p8+p9+p12+p16+p17+p18+p22+p24+p25+p28+p45+p47+p50+p51+p55+p5 7=.4
7. p1+p2+p3+p4+p5+p6+p29+p30+p31+p32+p33+p34+p35+p36+p37+p38+p39+
p40+p41+p42=.02
8. p1+p4+p5+p8+p10+p12+p13+p16+p17+p19+p20+p22+p24+p26+p28+p29+
p33+p34+p39+p43+p45+p50+p53+p55=.4
9. p2+p3+p6+p7+p9+p11+p14+p15+p18+p21+p23+p25+p27+p30+p35+p40+p44+
p49+p54=.45
10. p31+p32+p36+p37+p38+p42+p46+p47+p48+p51+p52+p56+p57+p58=.15
11. p2+p4+p6+p14+p16+p18+p19+p21+p22+p25+p27+p28+p33+p35+p36+p38
+p40+p42+p54+p55+p57+p58=.52
12. p1+p3+p5+p13+p15+p17+p20+p23+p24+p26+p34+p37+p39+p41+p43+p44+p53+
p56=.2
13. p7+p8+p9+p10+p11+p12+p29+p30+p31+p32+p43+p44+p45+p46+p47+p48+p49+
p50+p51+p52=.28
14. p1+p2+p3+p4+p5+p6+p7+p8+p9+p10+p11+p12+p23+p24+p25+p26+p27+p28+
p29+p30+p31+p32+p39+p40+p41+p42+p49+p50+p51+p52=.34
15. p13+p14+p15+p16+p17+p18+p19+p20+p21+p22+p33+p34+p35+p36+p37+p38+
p43+p44+p45+p46+p47+p48+p53+p54+p55+p56+p57+p58=.66
16. p3+p5+p6+p9+p11+p12+p15+p17+p18+p20+p21+p22+p30+p32+p34+p35+p37+
p38+p44+p45+p47+p48=.43
17. p1+p2+p4+p7+p8+p10+p13+p14+p16+p19+p29+p31+p33+p36+p43+p46=.2
18. p23+p24+p25+p26+p27+p28+p39+p40+p41+p42+p49+p50+p51+p52+p53+p54+
p55+p56+p57+p58=.37
277
19. p1+p2+p3+p7+p13+p14+p15+p23+p31+p36+p37+p41+p46+p56<=.24
20. p1+p8+p13+p16+p17+p24<=.16
21. p2+p7+p8+p9+p14+p16+p18+p25+p31+p36+p46+p47+p51+p57<=.32
22. p1+p2+p3+p7+p8+p9+p13+p14+p15+p16+p17+p18+p23+p24+p25+p31+p36+
p37+p41+p46+p47+p51+p56+p57<=.4
23. p3+p9+p15+p17+p18+p23+p24+p25+p37+p41+p47+p51+p56+p57<=.32
24. p1+p4+p5+p10+p13+p19+p20+p26+p29+p33+p34+p39+p43+p53<=.24
25. p2+p4+p6+p7+p10+p11+p14+p19+p21+p27+p29+p30+p31+p32+p33+p35+
p36+p38+p40+p42+p43+p44+p46+p48+p49+p52+p54+p58<=.48
26. p1+p2+p3+p4+p5+p6+p7+p10+p11+p13+p14+p15+p19+p20+p21+p23+p26
+p27+p29+p30+p31+p32+p33+p34+p35+p36+p37+p38+p39+p40+p41+p42
+p43+p44+p46+p48+p49+p52+p53+p54+p56+p58<=.6
27. p3+p5+p6+p11+p15+p20+p21+p23+p26+p27+p30+p32+p34+p35+37+p38+p39+
p40+p41+p42+p44+p48+p49+p52+p53+p54+p56+p58<=.48
28. p4+p8+p10+p12+p16+p19+p22+p28+p29+p33+p43+p45+p50+p55<=.32
29. p1+p4+p5+p8+p10+p12+p13+p16+p17+p19+p20+p22+p24+p26+p28+p29+p33+
p34+p39+p43+p45+p50+p53+p55<=.4
30. p5+p12+p17+p20+p22+p24+p26+p28+p34+p39+p45+p50+p53+p55<=.32
31. p2+p4+p6+p7+p8+p9+p10+p11+p12+p14+p16+p18+p19+p21+p22+p25+p27+
p28+p29+p30+p31+p32+p33+p35+p36+p38+p40+p42+p43+p44+p45+p4
6+p47+p48+p49+p50+p51+p52+p54+p55+p57+p58<=.8
32. p6+p9+p11+p12+p18+p21+p22+p25+p27+p28+p30+p32+p35+p38+p40+p42+p44
+p45+p47+p48+p49+p50+p51+p52+p54+p55+p57+p58<=.64
33. p3+p5+p6+p9+p11+p12+p15+p17+p18+p20+p21+p22+p23+p24+p25+p26+p27+
p28+p30+p32+p34+p35+p37+p38+p39+p40+p41+p42+p44+p45+p47+p48+p49+
p50+p51+p52+p53+p54+p55+p56+p57+p58<=.8
278
34. p4+p5+p6+p12+p22+p28+p32+p38+p42<=.1764
35. p6+p11+p21+p27<=.189
36. p5+p10+p11+p12+p20+p26+p32+p48+p52<=.2016
37. p19+p20+p21+p22+p38+p48+p58<=.2772
38. p4+p10+p19+p26+p27+p28+p42+p52+p58<=.2394
39. p2+p3+p6+p9+p18+p25+p30+p35+p40<=.189
40. p1+p3+p5+p8+p9+p12+p17+p24+p29+p30+p31+p32+p34+p37+p39+p41+p45+
p47+p50+p51<=.2016
41. p16+p17+p18+p22+p33+p34+p35+p36+p37+p38+p45+p47+p55+p57<=.2772
42. p1+p2+p4+p8+p16+p24+p25+p28+p29+p31+p33+p36+p39+p40+p41+p42+p50+
p51 +p55+p57<=.2394
43. p3+p7+p9+p11+p15+p23+p30+p44+p49<=.216
44. p14+p15+p18+p21+p35+p44+p54<=.297
45. p2+p7+p14+p23+p25+p27+p40+p49+p54<=.2565
46. p13+p15+p17+p20+p34+p37+p43+p44+p45+p46+p47+p48+p53+p56<=.3168
47. p1+p7+p8+p10+p13+p23+p24+p26+p29+p31+p39+p41+p43+p46+p49+p50+p51+
p52+p53+p56<=.2736
48. p13+p14+p16+p19+p33+p36+p43+p46+p53+p54+p55+p56+p57+p58<=.3762
49. pi(s) ≥0 for i=1,…,58.
50. The equations from 19 to 48 should be ≥0.
After solving above model, we get the desired results displayed
already in example 4.
279
CHAPTER VI
THE APPLICATION OF FUZZY LOGIC TO THE
SAMPLING SCHEME
6.1 INTRODUCTION
In many situations, while conducting a survey, it is not
possible to enumerate all the units in the population, as it is very time
consuming and also increases the cost of the survey. Thus in most of
the cases, the sampler likes to take a part of the population instead of
taking the whole population, to determine the characteristics of the
population. This part of the population which represents the
characteristics of the whole population is known as a “sample”. There
exist many sampling procedures in the literature to draw a sample
form the population e.g. Simple random sampling (SRS), Stratified
sampling, systematic sampling, Probability proportional to Size
sampling etc.
The simplest method of drawing a sample from the population
is SRS, in which each and every unit of the population has equal
chance of being included in the sample i.e. the sampler can draw the
280
sampling units one by one by assigning equal probability of selection
to each of the available units in the population. In SRS there is no
restriction in the selection of the sampling units but the drawback of
the SRS is that there is no guarantee that all the segments of the
population will be represented in the sample. One way to overcome
from the drawback of the SRS is to use stratified sampling. In
stratified sampling, the whole population is divided into several
groups (called strata), each of which is more homogeneous than the
entire population and then draw a random sample of pre-determined
size from each one of the groups. Stratified sampling can effectively
be used in the situations when the population is heterogeneous.
Systematic sampling is another way to draw a sample from the
population, in which only the first unit is selected at random, the rest
being automatically selected according to some predetermined pattern
involving regular spacing of units. The main drawback of the
systematic sampling is that an unbiased estimation of variance is not
possible with systematic sampling. There exist many other sampling
procedures in the literature, which have their own characteristics and
these sampling procedures can be used in different situations
according to their suitability.
281
One drawback of the above mentioned sampling procedures is
that none of these sampling procedure takes into account the size of
the population units, while selecting the units from the population. If
the population units vary considerably in size, then it may not be
appropriate to select the population units with equal probability, since
it does not take into account the possible importance of the larger
units in the population. One way to overcome from this problem is to
assign unequal probabilities of selection to different units in the
population. Thus, when the population units vary considerably in size
and the variate under study is highly correlated with the size of the
unit, probabilities of selection may be assigned in proportion to the
size of the population unit. For example, villages with larger
geographical area are likely to have larger area under food crop. Thus
in estimating production or food supply, it may well be desirable to
adopt a scheme of selection, in which villages are selected with
probabilities proportional to their geographical areas.
Thus, a sampling scheme in which the units are selected with
probabilities proportional to some measure of their size is known as
sampling with probability proportional to size (PPS). PPS sampling
can be done with or without replacement. If xi is an integer
282
proportional to the size of the ith unit, in a population of N units i.e.,
i=1,…,N, then the initial selection probabilities using PPS can be
defined as follows
pi =
N
xi
, where x = ∑ xi .
x
i =1
(6.1.1)
There exists a lot of literature, related with the estimation
purpose for the PPS sampling. An estimator commonly used to
estimate the population mean or total with PPS is the well known
Horvitz- Thompson estimator, defined in expression (9) of subsection
2.2.3 of chapter 2.
Sen (1953) and Yates and Grundy (1953) have defined the
∧
expression for variance Y HT , given in expression (10) of subsection
∧
2.2.3 of chapter 2. The evaluation of Y HT using expression (10)
consists of calculating the value of π ij . Various techniques have been
defined by different authors to calculate the value of π ij . Ashok and
Sukhatme (1976a) provide good approximation for π ij correct to O
(N-4), for Sampford’s procedure, given as follows
283
{
}
π ij = n(n − 1) pi p j [1 + ( pi + p j ) − ∑ pt 2 +
{2( p
2
i
+ p j ) − 2 ∑ p t − ( n − 2) p i p j
2
3
}
(6.1.2)
+ (n − 3)( pi + p j )∑ pt − (n − 3)(∑ pt ) 2 ]
2
2
∧
and the variance of Y HT correct to O (N-2) is given as follows
∧
V (Y HT ) SAMP =
1 
2
2
2
p A − (n − 1) ∑ pi Ai 
2 ∑ i i
nN  i∈S
i∈S

n −1 
3
2
2
−
2∑ pi Ai − ∑ pi
2
nN  i∈S
i∈S


2
pi Ai + (n − 2) ∑ pi Ai 
∑
i∈S
 i∈S

2
2
2



(6.1.3)
where Ai =
N
Yi
− Y and Y = ∑ Yi .
pi
i =1
(6.1.4)
Goodman and Kish (1950) has also defined the expression for
∧
variance of Y HT correct to O(N-2), defined in expression (12) of
subsection (2.2.3) of chapter 2. Recently, Brewer and Donadio (2003)
derived the πij-free formula for high entropy variance of HT estimator.
Their expression for the variance of the HT estimator is given in
expression (13) of subsection (2.2.3) of chapter 2.
284
In PPS sampling, we take into consideration only the size of
the population units, but in some situations, it may happen that some
auxiliary information related with the population units is also
available. This information can also be utilized while assigning the
initial selection probabilities to the population units, to increase the
efficiency of the survey.
In this chapter, we have made an attempt to utilize all the
available auxiliary information related with the population units, in
addition to the size of the population unit, in assigning the initial
selection probabilities to the population units. For this purpose, we
use the concept of Fuzzy approach. Using Fuzzy approach, we can
utilize all the auxiliary information related with the population units,
to obtain more efficient sampling design.
In section 6.2, we give the concept of fuzzy logic approach. In
section 6.3, we describe the proposed procedure and also give some
examples to show the superiority of the proposed procedure over the
PPS sampling procedure.
285
6.2 FUZZY LOGIC APPROACH
As the name suggests, the fuzzy logic is a logic, which
deals with the values, which are approximate rather than exact. The
classical logic relies on something which is either true or false. A True
element is usually assigned a value of 1 and false has a value 0. Thus,
something either completely belongs to a set or it is completely
excluded from the set. The fuzzy logic broadens this definition of
classical logic. The basis of the logic is fuzzy sets. Unlike in classical
sets, where membership is full or none, an object is allowed to belong
only partly to one set. The membership of an object to a particular set
is described by a real value which lies between 0 and 1. Thus, for
instance, an element can have a membership value 0.5, which
describes a 50% membership in a given set. Such logic allows a much
easier application of many problems that cannot be easily
implemented using classical approach. The importance of fuzzy logic
derives from the fact that most modes of human reasoning and
especially common sense reasoning are approximate in nature. For
example, consider a set of tall people in the classical logic, and
suppose that the person with height greater than or equal to 6 feet is
considered as tall. Then, a person having height 6 feet and 1 inch will
286
be included in the set of tall people and the person having height 5
feet 11 inches will not be included in the set. In this case such a
representation of reality leaves much to be desired. On the other hand,
using the fuzzy logic, the person being 6-1 tall can still have a full
membership of the set of tall people, but the person that is 5-11 tall,
can have 90% membership of the set. The 5-11 person thus can have,
what can be described as a "quite tall" representation in a model.
Fuzzy Set Theory was formalized by Professor Lofti Zadeh at
the University of California in 1965. Lofti Zadeh described the
essential characteristics of fuzzy logic as follows.
•
In fuzzy logic, exact reasoning is viewed as a limiting case
of approximate reasoning.
•
In fuzzy logic everything is a matter of degree.
•
Any logical system can be fuzzified.
•
In fuzzy logic, knowledge is interpreted as a collection of
elastic or, equivalently, fuzzy constraint on a collection of
variables.
•
Inference is viewed as a process of propagation of elastic
constraints.
287
Now, we give a brief introduction of fuzzy inference system
(FIS). Fuzzy inference system provides the facility to incorporate all
the auxiliary information to draw the conclusions. Mat lab Fuzzy logic
toolbox provides an opportunity to look at all the components of
Fuzzy inference system. The first step of working with Fuzzy
inference system is to define a base line model for all the input
variables and also for the final or output variable. Input variables
contain some auxiliary information and the output variable gives the
final results by extracting all the useful information from the input
variables. After defining the base line model, the next step is to define
the fuzzy rules, which plays an important role in assigning the final
grade of membership to the elements. In the following subsections,
we give a brief introduction of the base line model and the fuzzy
rules.
6.2.1 Base line model:
To draw any inferences from the Fuzzy inference system, the
first step is to define the base line model. The base line model consists
of some input variables and also the output variable. An example of
the base line model is given in figure 1. In the base line model, we
288
have to define the fuzzy sets for all the input variables and also for the
final or output variable. Input variables are those variables from which
one has to draw the inferences i.e. the input variables contains some
auxiliary information and the output variable is the variable, which
defines the final grade of membership for all the elements of the set.
To define the fuzzy sets for any input variable, first, we have to
choose the range and the membership function for that input variable.
Range of the input variable can be defined by taking the minimum
and the maximum value of the input variable. There exist many
membership functions in fuzzy inference system. The membership
function for any input variable can be chosen according to the
property of that input variable. These membership functions can be
located graphically in fuzzy inference system. After choosing the
membership function for the input variable, we define the different
fuzzy sets for the input variable, using the selected membership
function. An example of the model (consisting of fuzzy sets and range
for the input variable) of input variable is given in figure 2. The
membership function for different input variables can be different.
Actually the membership function depends upon the characteristics of
the input variable, hence it may vary from one input variable to
289
another input variable. After defining the range and the fuzzy sets for
all the input variables, we define the range and the fuzzy set for the
output variable. Figure 5 shows the model for the output variable.
6.2.2 Fuzzy Rules:
In the real life situations, all the human being makes the
decision. These decisions are based on rules. For Example: if the
weather is fine and today is holiday, then we may decide to go out or
if the forecast says that the weather will be bad today, but fine
tomorrow, then we make a decision not to go today, and postpone it
till tomorrow. Similarly, in order to design a FIS, we have to define
the fuzzy rules. Fuzzy inference system consists of if-then rules that
specify a relationship between the input and output fuzzy sets. In
order to draw some conclusion from input variables, we have to
define the fuzzy rules. These rules are based on common sense. The
fuzzy rules, are formulated using a series of if-then statements,
combined with AND/OR operators. These rules are very useful to find
out the final decision.
290
6.3 THE PROPOSED PROCEDURE
The fuzzy inference process can be described completely in the
five steps as follows:
Step 1: The first step is to choose the base line model in fuzzy
inference system.
Step 2: The second step is to take inputs and define the range and
membership function for these input variables.
Step 3: Define the membership function and range for output variable
to obtain the final grading.
Step 4: Define the fuzzy if-then rules for the inference process.
Step 5: Enter the value of input variables to get the required result.
Now, we describe the proposed procedure through the
following examples.
291
Example 1: The government of Uttarakhand wants to run a scheme
for literacy. Before applying the scheme to all the districts of the state,
the government wants to apply it to a few districts of the state to get
the idea of its success. The districts should be selected on the basis of
low literacy rate, less population and less area. The problem to be
solved here is to find out the initial selection probability of each
district by considering the above mentioned criteria. For this sample
survey, we have N=13 and n=3.
In this situation, if the initial selection probabilities are
assigned according to PPS, then it will not be justified, as in PPS
sampling procedure the probabilities are assigned only by taking the
criteria of size and all other factors are ignored in PPS sampling.
In this situation, the concept of fuzzy logic approach works
very well as fuzzy logic approach has the capability to express the
above mentioned factors in mathematical terms, which can be utilized
to assign the initial selection probabilities to all the districts. In order
to incorporate all the above three factors, we have to use the fuzzy
inference system. The procedure of using fuzzy inference system is as
follows.
292
Consider the following data related with the 13 districts of
Uttarakhand.
Table 1
Districts
Population
Previous
literacy
rate
Area
Number of
interviewer
Pithoragarh
462289
75.9%
(Square
K. M.)
7169
Almora
630567
73.6%
3689
315
Nainital
762909
78.4%
3422
381
Bageshwar
249462
71.3%
1696
124
Champawat
224542
70.4%
2004
112
U. S. Nagar
1235614
64.9%
3055
617
Uttarkashi
295013
65.7%
8016
147
Chamoli
370359
75.4%
7520
185
Rudraprayag
227439
73.6%
2439
113
Tehri
604747
66.7%
3796
302
Dehradun
1282143
79%
3088
641
Pauri
697078
77.5%
5230
348
Hardwar
1447187
63.7%
2360
723
231
Firstly, we define the following base line model, for the above
data.
293
Figure 1
The above base line model consists of three input variables and
one output variable. The first input variable is the number of
interviewer, the second input variable is the literacy rate and the third
input variable is the area of the district. For the first input variable, i.e.
for the number of interviewers, we have defined 3 fuzzy sets, namely,
excellent, good and poor. Less number of interviewers will reduce the
cost of the survey, thus less number of interviewers are taken under
the category of excellent. Similarly, the fuzzy sets good and poor are
defined according to the increment in the number of interviewers. We
have taken the gaussmf membership function for making these three
294
fuzzy sets. The gaussmf membership function depends on two
parameters σ and c and is given as follows.
f ( x; σ , c) = e
−
( x −c ) 2
2σ 2
One interviewer per 2000 population has been taken as the
criterion for the required number of interviewers for a given
population. Number of interviewers for all the districts are given in
the 5th column of table 1. The range for number of interviewers is
taken as [100, 1000]. Thus the model for the first input variable is as
follows.
Figure 2
For the second input variable, i.e. for the literacy rate, we have
295
defined 2 fuzzy sets, namely, good and poor. Since the scheme is
applied for illiterate persons, so the district with less number of
literate persons will be preferred and thus we have taken less literacy
rate under the category of good and high literacy rate under the
category of poor. We have taken the pimf membership function for
making these two fuzzy sets. The pimf is a Π shaped membership
function. The syntax for pimf membership function can be described
as y = pimf(x,[a b c d]). This membership function is evaluated at the
points determined by the vector x. The parameters a and d locate the
“feet” of the curve, while b and c locate its “shoulders”. The range for
literacy rate is taken as [0,100] and the model for the literacy rate is
as follows.
Figure 3
296
For the third input variable, i.e. for the area of the district, we
have again defined 3 fuzzy sets, namely, excellent, good and poor.
Less area of the district will reduce the distance and hence the time of
the survey will be reduced. Thus less area of the district is taken under
the category of excellent. Similarly, the fuzzy sets good and poor are
defined according to the increment in the area of the district. We have
taken the gaussmf membership function for making these three fuzzy
sets. The range for the area of the district is taken as [2000, 8000].
Thus the model for the third input variable is as follows.
Figure 4
After defining the membership functions for all the three input
variables, we define the membership function for the output variable
297
i.e. for the final grade of membership. For the output variable, we
have defined 3 fuzzy sets, namely, excellent, good and poor. The
fuzzy set excellent consists of the districts with high grade of
membership and the fuzzy set good and poor consists of the districts
according to the grade of membership in decreasing order. For the
output variable, we have taken the membership function gaussmf. The
range for the output variable is [0, 1] and the model for the final grade
of membership is as follows.
Figure 5
After defining the base line model, we define the fuzzy rules
for the above problem as follows.
298
1. If (Interviewer is excellent) AND (literacy is good) AND
(area is excellent) then (grade is excellent.)
2. If (Interviewer is good) AND (literacy is good) AND (area
is good) then (grade is good.)
3. If (Interviewer is poor) AND (literacy is poor) AND (area
is poor) then (grade is poor).
The following figure shows the fuzzy inference process
Figure 6: fuzzy inference process
Now, we have done all the things, which are necessary for the
data input. In order to find out the final grade of membership for any
district, we have to put the triplet in fuzzy inference system, i.e. the
299
values of the number of interviewers, literacy rate and area from
table1. After putting the values of these three factors for all the
districts, we find the grade of membership for all the districts, given in
table 2.
Table 2
Districts
Grade of
membership
Pithoragarh
Almora
Nainital
Bageshwar
Champawat
U.S. Nagar
Uttarkashi
Chamoli
Rudraprayag
Tehri
Dehradun
Pauri
Hardwar
0.5
0.5265
0.5455
0.8568
0.8589
0.5176
0.5
0.5
0.804
0.5181
0.5328
0.4993
0.5106
Initial selection Initial
selection
probability
probability
(Proposed
(PPS)
procedure)
0.065
0.05
0.069
0.07
0.071
0.09
0.112
0.03
0.112
0.04
0.067
0.15
0.065
0.03
0.065
0.04
0.105
0.03
0.068
0.07
0.069
0.15
0.065
0.08
0.067
0.17
After getting the final grade of membership for all the districts,
the next step is to obtain the initial selection probabilities for all the
districts. This can be done as follows.
300
Let Xi (1=1,…,N) represents the final grade of membership for
the ith district. Then the initial selection probability of the ith district
can be obtained as follows.
pi =
N
Xi
, where X = ∑ X i .
X
i =1
(6.3.1)
Using (6.3.1), we get the initial selection probabilities for all
the 13 districts, which are displayed in the 3rd column of table 2.
To compare the proposed procedure with PPS sampling
procedure, we have also solved the above problem by PPS sampling
procedure, by taking the size proportional to the population of the
districts and get the initial selection probabilities of all the districts,
which have been given in the 4th column of table 2.
Now, to demonstrate the utility of the proposed procedure in
terms of precision of the estimate, the variance for the proposed
procedure is compared with the variance for the PPS sampling
procedure using the expression (10), (12) and (13) of subsection
(2.2.3) of chapter 2 and the expression (6.1.3) of this chapter.
301
∧
In order to compute the variance of (Y HT ) YG using the
expression (10), we have to calculate the values of π ij . Thus, using
(6.1.2), we get the values of π ij for the proposed procedure
demonstrated in table 3.
Table 3
( i,j)
π ij
( i,j)
π ij
( i,j)
π ij
( i,j)
π ij
( i,j)
π ij
1,2
1,3
1,4
1,5
1,6
1,7
1,8
1,9
1,10
1,11
1,12
1,13
2,3
2,4
2,5
2,6
.027
.029
.060
.060
.026
.025
.025
.054
.027
.027
.025
.026
.031
.063
.063
.029
2,7
2,8
2,9
2,10
2,11
2,12
2,13
3,4
3,5
3,6
3,7
3,8
3,9
3,10
3,11
3,12
.027
.027
.057
.029
.030
.027
.029
.065
.065
.030
.029
.029
.059
.031
.031
.029
3,13
4,5
4,6
4,7
4,8
4,9
4,10
4,11
4,12
4,13
5,6
5,7
5,8
5,9
5,10
5,11
.031
.109
.062
.06
.06
.101
.062
.063
.06
.062
.062
.06
.06
.101
.062
.063
5,12
5,13
6,7
6,8
6,9
6,10
6,11
6,12
6,13
7,8
7,9
7,10
7,11
7,12
7,13
8,9
.06
.062
.026
.026
.056
.028
.029
.026
.027
.025
.054
.027
.027
.025
.026
.054
8,10
8,11
8,12
8,13
9,10
9,11
9,12
9,13
10,11
10,12
10,13
11,12
11,13
12,13
.027
.027
.025
.026
.056
.057
.054
.056
.029
.027
.028
.027
.029
.026
Having obtained the values of π ij , we have calculated the
∧
variance of (Y HT ) YG for the proposed procedure, the value of which is
demonstrated in table 4. Similarly, we have calculated the values of
302
π ij for PPS sampling procedure and then calculate the value of
∧
variance of (Y HT ) YG
for PPS sampling procedure, which is
demonstrated in table 4.
∧
We have also calculated the value of variance of Y HT by using
the expressions (12) and (13) of subsection (2.2.3) of chapter 2 and
the expression (6.1.3) of this chapter for both the procedures i.e. for
the proposed procedure and for PPS sampling procedure. The value of
∧
variance of (Y HT ) obtained through all the above expressions are
demonstrated in table 4.
Table 4
∧
V (Y HT ) YG
205.9167
Proposed
procedure
PPS
530.4458
∧
∧
∧
V (Y HT ) GK
207.746
V (Y HT ) SAMP
207.586
V (Y HT ) BD
203.507
460.978
458.735
446.011
From table 4, we observe that the variance of the proposed
procedure is very small as compared to PPS sampling procedure in all
the cases. This shows that the proposed procedure has less variability
303
in comparison to PPS sampling procedure and hence the proposed
procedure can be considered more efficient than PPS sampling
procedure.
Example 2: Consider the following data taken from singh and
Chaudhary (1986).
Table 5
S. No.
1
2
3
4
5
6
7
8
No. of Trees(xi)
50
30
25
40
26
44
20
35
Yield
60
35
30
44
30
50
22
40
pi (xi/x)
.185
.111
.093
.148
.096
.163
.074
.130
The number of trees and the yield for 8 orchards are given and
we have to assign the initial selection probabilities to the above 8
orchards. The initial selection probabilities using PPS sampling
procedure has been given in the 4th column of table 5. From table 5,
we observe that only the criteria of number of trees has been taken
while assigning the initial selection probabilities to the orchards and
the criteria of yield of orchards has been ignored. We can utilize the
304
auxiliary information of yield also, in addition to the number of trees,
in assigning the initial selection probabilities to the orchards. In order
to utilize the yield also, we have to use the concept of fuzzy logic
approach. For this sample survey, we have N=8 and suppose n=3.
The fuzzy inference process for this example can be described as
follows.
We have taken the following baseline model for the above
data.
Figure 7
The above base line model consists of two input variables and
one output variable. The first input variable is the number of trees and
the second input variable is the yield in the orchards.
305
For the first input variable, i.e. for the number trees, we have
defined 3 fuzzy sets, namely, low, medium and high. Less number of
trees will have less yield, thus less number of trees are taken under the
category of low. Similarly, the fuzzy sets medium and high are
defined according to the increment in the number of trees. We have
taken the gaussmf membership function for making these three fuzzy
sets. The range for number of trees is taken as [20, 50]. Thus the
model for the first input variable is as follows.
Figure 8
306
For the second input variable, i.e. for the yield, we have
defined 3 fuzzy sets, namely, poor, good and excellent. We have taken
the gaussmf membership function for making these three fuzzy sets.
The range for the yield is taken as [20, 60] and the model for the
second input variable is as follows.
Figure 9
After defining the membership functions for the input
variables, we define the membership function for the output variable
i.e. for the final grade of membership. For the output variable, we
have defined 3 fuzzy sets, namely, low, medium and high. The fuzzy
set high consists of the orchards with high grade of membership and
307
the fuzzy set medium and low consists of the orchards according to
the grade of membership in decreasing order. For the output variable,
we have taken the function gaussmf. The range for the output variable
is [0, 1] and the model for the final grade of membership is as follows.
Figure 10
After defining the base line model, we define the fuzzy rules
for the above problem as follows.
1. If (tree is low) AND (yield is good) then (grade is medium.)
2. If (tree is low) AND (yield is excellent) then (grade is high.)
3. If (tree is medium) AND (yield is poor) then (grade is low.)
308
4. If (tree is medium) AND (yield is excellent) then (grade is high.)
5. If (tree is high) AND (yield is poor) then (grade is low.)
6. If (tree is high) AND (yield is good) then (grade is medium.)
The following figure shows the fuzzy inference process
Figure 11: Fuzzy inference process
Now, the process of data input has been done. In order to find
out the final grade of membership for any orchard, we have to put the
values of number of trees and the yield for that orchard. After putting
the values of the number of trees and the yield for all the orchards, we
309
get the final grade of membership for all the orchards, given in table
6.
Table 6
orchards
1
2
3
4
5
6
7
8
Grade
of Initial selection
membership
probability
0.7435
0.4612
0.3933
0.5173
0.3571
0.6401
0.2922
0.4916
(Proposed
procedure)
0.19
0.12
0.10
0.13
0.09
0.16
0.07
0.13
Having obtained the final grade of membership for all the
orchards, we have calculated the initial selection probabilities for all
the orchards using (6.3.1), given in the 3rd column of table 6.
The values of π ij for the proposed procedure using (6.1.2) are
given in the table 7. Having obtained the values of π ij , we have
∧
calculated the variance of (Y HT ) YG for the proposed procedure, the
value of which is demonstrated in table 8. Similarly, we have
310
calculated the values of π ij for PPS sampling procedure and then
∧
calculate the value of variance of (Y HT ) YG for PPS sampling procedure,
which is demonstrated in table 8.
Table 7
( i,j)
1,2
1,3
1,4
1,5
1,6
1,7
1,8
π ij
.1997
.1657
.2177
.1496
.2754
.1193
.2177
( i,j)
2,3
2,4
2,5
2,6
2,7
2,8
3,4
π ij
.0757
.1248
.0734
.1691
.051
.1248
.0976
( i,j)
3,5
3,6
3,7
3,8
4,5
4,6
4,7
π ij
.0523
.137
.0328
.0976
.0848
.186
.0608
( i,j)
4,8
5,6
5,7
5,8
6,7
6,8
7,8
π ij
.1393
.1218
.0244
.0848
.0933
.1860
.0608
∧
We have also calculated the value of variance of Y HT by using
the expressions (12) and (13) of subsection (2.2.3) of chapter 2 and
the expression (6.1.3) of this chapter for both the procedures i.e. for
the proposed procedure and for PPS sampling procedure. The value of
∧
variance of (Y HT ) obtained through all the above expressions are
demonstrated in table 8.
311
Table 8
∧
V (Y HT ) YG
Proposed .03571
procedure
PPS
.09416
∧
∧
∧
V (Y HT ) GK
V (Y HT ) SAMP
V (Y HT ) BD
.01247
.01246
.01249
.02607
.02607
.02589
From table 8, we observe that for this example also the
variance of the proposed procedure is very small as compared to PPS
sampling procedure in all the cases.
From both of the above examples, we observe that the
proposed procedure has less variability than the PPS sampling
procedure and thus we can say that the proposed procedure is more
efficient than the PPS sampling procedure.
312
CHAPTER VII
SUMMARY
‘Controlled selection’ or ‘controlled sampling’ as the name
suggests, is a method of selecting the samples from the finite
population by imposing certain restrictions or controls while selecting
the samples. The technique of controlled selection is used in sampling
to minimize as far as possible the probability of selecting the nonpreferred samples, while conforming strictly to the requirements of
probability sampling. Although the concept of controlled selection
was being used by the statisticians for a long period of time, it
received considerable attention in recent years due to its practical
importance.
The term ‘controlled selection’ or ‘controlled
sampling’ is rather uncommon in the field of sample surveys, however
the need of this special technique in sampling was long felt. Even a
generation ago, the conflicting needs of controls and randomization
were widely thought to be irreconcilable, as can be seen on the debate
313
between purposive and random methods in the Bulletin of
International Statistical Institute (1926).
Conceptually, the imposition of controls in selecting a
sample may be viewed as an extension of the technique of purposive
sampling, although it involves more judgment than purposive
sampling. In fact, any departure from simple random sampling may be
regarded as a control, which enhances the probability of preferred
combinations by eliminating or reducing non-preferred (undesirable)
combinations.
The technique of controlled selection was originally formulated
by Goodman and Kish (1950). They used the technique of controlled
selection to a specified problem of selecting twenty one primary
sampling units to represent the North-central states and found that by
the use of this technique, the between first stage unit components of
the variance were reduced from 11% to 32% below the same
components corresponding to the stratified random sampling.
The concept of ‘Controlled Selection’ is applicable in many
fields, such as rounding techniques, disclosure control, overlap of
sampling units etc.
314
In chapter I of the thesis, we have given a brief introduction of
historical background of controlled selection, definition of controlled
selection and some applications of controlled selection to statistical
problems. A brief review of literature of controlled selection and the
fields in which the controlled selection is applicable is also given in
this chapter. In the last section of the chapter, the problem of
estimates of the variance is also discussed.
In chapter II of the thesis, we have used the concept of ‘nearest
proportional to size sampling designs’ originated by Gabler (1987) to
obtain an optimal controlled sampling design, which ensures the
probabilitie of selecting the non-preferred samples exactly equal to
zero. The variance estimation for the proposed optimal controlled
sampling design using the Yates-Grundy form of Horvitz-Thompson
estimator is discussed. Variance of the proposed procedure is
compared with that of existing optimal controlled and uncontrolled
high entropy selection procedures. Utility of the proposed procedure
is demonstrated with the help of examples.
In chapter III of the thesis, using the quadratic programming
and utilizing the concept of ‘nearest proportional to size sampling
design’, we have proposed a method for two dimensional optimal
315
controlled selection, which ensures zero probability to non- preferred
samples. An estimator for estimating the variance in controlled
selection is also proposed. The utility of the proposed procedure is
demonstrated with the help of examples.
In chapter IV of the thesis, using the technique of random
rounding, we have introduced a new methodology for protecting the
confidential information of tabular data with minimum loss of
information. The tables obtained through the proposed method consist
of unbiasedly rounded values, are additive and have specified level of
confidentiality protection. Some numerical examples are also
discussed to demonstrate the superiority of the proposed procedure
over the existing procedures.
In chapter V of the thesis, we have proposed a new
methodology, which not only selects the sample in a controlled way
but also maximizes or minimizes the overlap of sampling units for
the two sample surveys.
The two surveys can be conducted
simultaneously or sequentially. The proposed method uses the linear
programming approach for maximizing the probability of those
sample combinations which consists of maximum number of
overlapped sample units or for minimizing the probability of those
316
sample combinations which consists of maximum number of
overlapped sample units. The proposed procedure has the advantage
of the estimation of variance as the proposed procedure satisfies the
non-negativity condition of Horvitz-Thompson (H-T) estimator for
variance estimation and in those situations where the non-negativity
condition of Horvitz- Thompson (H-T) estimator could not be
satisfied, alternative method of estimation can be used.
In chapter VI of the thesis, we have used the concept of fuzzy
logic approach to obtain the more efficient sampling design. The
proposed procedure utilizes all the auxiliary information, while
assigning the initial selection probability to the population units. The
superiority of the proposed procedure over the PPS sampling
procedure is also discussed through some numerical examples.
317
REFERENCES:
Albert, P. (1978).
The algebra of fuzzy logic.
Fuzzy sets and
systems, 1(3), 203-230.
Ashok, C. and Sukhatme, B. V. (1976a). On Sampford’s procedure of
unequal probability sampling without replacement. Journal of
American Statistical Association, 71, 912-918.
Avadhani, M.S. and Sukhatme, B.V. (1973). Controlled sampling
with equal
probabilities and without replacement. International
Statistical Review, 41, 175-182.
Azmi, Z. A. (1993). New Fuzzy approaches by using Statistical and
Mathematical methodologies in operations research. Journal of fuzzy
Mathematics, 1(1), 69-87.
Bellman, R. E. and Zadeh, L. A. (1970). Decision making in a fuzzy
environment. Management science, 17(4), 141-164.
Biswal, M. P. (1992). Fuzzy Programming technique to solve multiobjective geometric programming problems. Fuzzy sets and systems,
51(1), 67-72.
Bit, A. K., Biswal, M. P. and Alam, S. S. (1992). Fuzzy programming
approach to multicriteria decision making transportation problem.
Fuzzy sets and systems, 50(2), 135-141.
318
Brewer, K. R. W. and Donadio, M. E. (2003). The high- entropy
variance of the Horvitz-Thompson Estimator. Survey Methodology,
29, 189-196.
Brewer, K. R. W., Early, L. J. and Joyce, S. F. (1972). Selecting
several samples from a single population. Australian journal of
statistics, 14, 231-239.
Bryant, E.C.(1961). Sampling methods. Seminar paper, Iowa State
University.
Bryant, E.C., Hartley H.O. and Jessen, R.J.(1960). Design and
estimation in two-way stratification. Journal of American Statistical
Association, 55, 105-124.
Carvalho, F. D., Dellaert, N. P. and Osório, M. S. (1994). Statistical
Disclosure in Two-Dimensional Tables: General Tables. Journal of
American Statistical Association, 89, 1547-1557.
Cassel, C.M.
and Särndal, C.E. (1972). A model for studying
robustness of estimators and in formativeness of labels in sampling
with varying probabilities. Journal of Royal Statistical Society, Series
B, 34, 279-289.
Causey, B.D., Cox, L.H. and Ernst, L.R. (1985). Application of
transportation theory to statistical problems. Journal of American
Statistical Association, 80, 903-909.
319
Chakrabarti, M.C. (1963). On the use of incidence matrices of designs
in sampling from finite populations. Journal of Indian Statistical
Association, 1, 78-85.
Cox, L. H. (1980). Suppression Methodology and Statistical
Disclosure Control. Journal of American Statistical Association, 75,
377-385.
Cox, L. H. (1981). Linear Sensitivity Measures in Statistical
Disclosure Control. Journal of Statistical Planning and Inference, 5,
153-164.
Cox, L. H. (1987).
A Constructive Procedure for Unbiased
Controlled Rounding. Journal of American Statistical Association, 82,
420-424.
Cox, L. H. (1995). Network Models for Complementary Cell
Suppression. Journal of the American Statistical Association, 90,
1453-1462.
Cox, L. H. and Ernst, L. R. (1982). Controlled Rounding. INFOR 20,
423-432.
Doherty, P. D., Driankov and Hellendoorn, H. (1993). Fuzzy if-then
unless rules and their implementation. International Journal of
uncertainty, Fuzziness and knowledge based systems, 1(2), 167-182.
320
Dockery, J. T. and Murrar, E. (1987). A Fuzzy approach in
aggregating military assessments. International journal of approximate
reasoning, 1(3), 251-271.
Ernst, L. R. (1996). Maximizing the overlap of sample units for two
designs with simultaneous selection. Journal of official statistics, 12,
33-45.
Ernst, L. R. (1998).
Maximizing and Minimizing overlap when
selecting a large number of units per stratum simultaneously for two
designs. Journal of official statistics, 14, 297-314.
Ernst, L. R. and Ikeda, M. (1995). A reduced-size Transportation
algorithm for maximizing the overlap between surveys. Survey
Methodology, 21, 147-157.
Ernst, L. R. and Paben, S. P. (2002). Maximizing and minimizing
overlap when selecting any number of units per stratum
simultaneously for two designs with different stratifications. Journal
of official statistics, 18, 185-202.
Fellegi, I. (1963). Sampling with varying probabilities without
replacement, rotating and non-rotating samples. Journal of American
Statistical Association, 58, 183-201.
Fellegi, I. (1966). Changing the probabilities of selection when two
units are selected with PPS without replacement. In proceedings of the
321
Social
Statistics
Section,
American
Statistical
Association,
Washington, 434-442.
Fellegi, I. P. (1975). Controlled Random Rounding. Survey
Methodology, 1, 123-135.
Fischetti, M. and Salazar, J. J. (2000). Models and Algorithms for
Optimizing Cell Suppression in Tabular Data with Linear Constraints.
Journal of American Statistical Association, 95, 916-928.
Fischetti, M. and Salazar, J. J. (2003). Partial Cell Suppression: A new
Methodology for Statistical Disclosure Control. Statistics and
Computing, 13, 13-21.
Foody, W. and Hedayat, A. (1977). On theory and applications of BIB
designs and repeated blocks. Annals of Statistics, 5, 932-945.
Frankel, L. R. and Stock, J.S. (1942): On the sample survey of
unemployment. Journal of American Statistical Association, 10, 288293.
Frühwirth-Schnatter, S. (1992). On Statistical Inference for Fuzzy
data with applications to descriptive Statistics. Fuzzy sets and
systems, 50(2), 143-165.
Frühwirth-Schnatter, S. (1993). On Fuzzy Bayesian Inference. Fuzzy
sets and systems, 60(1), 41-58.
322
Gabler, S. (1987). The nearest proportional to size sampling design.
Communications in Statistics-Theory & Methods, 16(4), 1117-1131.
Goodman, R. and Kish, L. (1950). Controlled selection-a technique in
probability sampling. Journal of American Statistical Association, 45,
350-372.
Gray, G. and Platek, R. (1963). Several methods of redesigning area
samples utilizing probabilities proportional to size when the sizes
change significantly. Journal of American Statistical Association, 63,
1280-1297.
Gupta, V.K., Nigam, A.K. and Kumar, P. (1982). On a family of
sampling schemes with inclusion probability proportional to size.
Biometrika, 69, 191- 196.
Gupta, V. K., Srivastava, A. K. and Reddy, K. S. (1989): On the use
of connected block designs in inclusion probability proportional to
size. Technical report. Indian agricultural research institute, New
Delhi.
Hedayat, A. and Lin, B.Y. (1980). Controlled probability proportional
to size sampling designs. Technical Report, University of Illinois at
Chicago.
323
Hedayat, A., Lin, B.Y. and Stufken, J. (1989). The construction of
IPPS sampling designs through a method of emptying boxes. Annals
of Statistics, 17, 1886-1905.
Hess, I., Riedel, D. C. and Fitzpatrick, T. P. (1961): Probability
sampling of hospitals and patients. Annals Arbor. Mich.: Bureau of
hospital administration.
Hess, I. and Srikantan, K.S. (1966). Some aspects of probability
sampling technique of controlled selection. Health Serv. Res.
Summer 1966, 8-52.
Horvitz, D.G. and Thompson, D.J. (1952). A generalization of
sampling
without replacement from finite universes. Journal of
American Statistical Association, 47, 663-85.
Jessen, R. J. (1969): Some methods of probability non replacement
sampling. Journal of American Statistical Association, 64, 175-193.
Jessen, R.J. (1970). Probability sampling with marginal constraints.
Journal of American Statistical Association, 65, 776-796.
Jessen, R.J. (1973). Some properties of probability lattice sampling.
Journal of American Statistical Association, 68, 26-28.
Jessen, R.J. (1975). Square and cubic lattice sampling. Biometrics 31,
449-471.
324
Jessen, R.J. (1978). Statistical Survey Techniques. Wiley, New York.
Keyfitz, N. (1951). Sampling with probabilities proportional to size:
Adjustment for changes in probabilities. Journal of American
statistical Association, 46, 105-109.
Kish, L. (1963). Changing Strata and selection probabilities. In
proceeding of the Social Statistics section, American Statistical
Association, Washington, 124-131.
Kish, L. and Scott, A. (1971). Retaining units after changing Strata
and Probabilities. Journal of American statistical Association, 66,
461-470.
Kuhn, H. W. and Tucker A. W. (1951). Non-linear programming.
Proceedings of Second Berkely Symposium on Mathematical
Statistics and Probability, 481-492.
Lu, W. and Sitter, R.R. (2002). Multi-way stratification by linear
programming made practical. Survey Methodology Vol. 28, No. 2,
199-207.
Mahalanobis, P.C. (1939). A sample survey of the acreage under Jute
in Bengal. Sankhya 4, 511-531.
Mahalanobis, P.C. (1946). Recent experiments in Statistical sampling
in the Indian Statistical Institute. Journal of the Royal Statistical
Society, 109, 325-378.
325
Matei, A. and Tillé, Y. (2006). Maximal and Minimal sample coordination. Sankhya: The Indian journal of Statistics, 67, 590-612.
Merola, G. M. (2003a). Generalized Risk Measures for Tabular Data.
Proceedings of the 54th Session of the International Statistical
Institute.
Midzuno, H. (1952). On the sampling system with probability
proportional to sums of sizes. Annals of Institute of Statistics &
Mathematics, 3, 99-107.
Moore, R.P., Chromy, J.R. and Rogers, W.T. (1974). The National
Assessment approach to sampling. Nat. Assess. Of Education
Progress, Denver.
Mukhopadhyay, P. and Vijayan, K. (1996). On controlled sampling
designs. Journal of Statistical Planning & Inference, 52, 375-378.
Murthy, M.N. (1957). Ordered and unordered estimators in sampling
without replacement. Sankhya, 18, 379-90.
Nargundkar, M. S. and Saveland, W. (1972). Random Rounding to
prevent Statistical Disclosures. In proceedings of the Social Statistics
Section, American Statistical Association, 382-385.
326
Nigam, A.K., Kumar, P. and Gupta, V.K. (1984). Some methods of
inclusion probability proportional to size sampling. Journal of Royal
Statistical Society, B 46, 564-571.
Patterson, H. D. (1954): The errors of lattice sampling. Journal of
Royal Statistics society, (B), 16, 140-149.
Rao, J.N.K. and Nigam, A.K. (1990). Optimal controlled sampling
designs. Biometrika, 77, 807-814.
Rao, J.N.K. and Nigam, A.K. (1992). ‘Optimal’ controlled sampling:
A unified approach. International Statistical Review, 60, 89-98.
Salazar, J. J. (2005). Controlled Rounding and Cell Perturbation:
Statistical Disclosure Limitation Methods for Tabular Data.
Mathematical Programming, Ser. B 105, 583-603.
Sampford, M.R. (1967). On sampling with replacement with unequal
probabilities of selection. Biometrika, 54, 499-513.
Sande, G. (1984). Automated Cell Suppression to Preserve
Confidentiality of Business Statistics. Statistical Journal of the United
Nations ECE, 2, 33-41.
Sen, A.R. (1953). On the estimation of variance in sampling with
varying probabilities. Journal of Indian Society of Agricultural
Statistics, 5, 119-127.
327
Singh, D. (1954). On efficiency of sampling with varying
probabilities without replacement. Journal of Indian Society of
Agricultural Statistics, 6, 48-57.
Singh, D. and Chaudhary, S. S. (1986). Theory and Analysis of
Sample Survey Designs. Wiley Eastern Ltd.
Sitter, R.R. and Skinner, C. J. (1994). Multi-way stratification by
linear programming. Survey Methodology, 20, 65-73.
Srivastava, J. and Saleh, F. (1985). Need of t-designs in sampling
theory. Utilitas Mathematica, 28, 5-17.
Takeuchi, K., Yanai, H. and Mukherjee, B. N. (1983). The
Foundations
of
Multivariate Analysis. 1st Edition New Delhi:
Wiley Eastern Ltd.
Tiwari, N. and Nigam, A. K. (1993). A note on constructive procedure
for unbiased controlled rounding. Statistics and probability letters, 18,
415-420.
Tiwari, N. and Nigam, A.K. (1998). On two-dimensional optimal
controlled selection. Journal of Statistical Planning & Inference, 69,
89-100.
Tiwari, N., Nigam, A. K. and Pant, I. (2007). On an optimal
controlled nearest proportional to size sampling
methodology, Vol. 33, 87-94.
328
scheme. Survey
Walter, K.M. (1985). Introduction to variance estimation. Springer
verlay, New York.
Waterton, J.J. (1983). An exercise in controlled selection. Applied
Statistics, 32, 150-164.
Wilkerson, M. (1960): The revised city sample for the consumer price
index. Monthly labor review, 1078-1083.
Willenborg, L. C. R. J. and de Waal, T. (2001). Elements of Statistical
Disclosure Control. Lecture Notes in Statistics, 155, Springer.
Wynn, H.P. (1977). Convex sets of finite population plans. Annals of
Statistics, 5, 414-418.
Yates, F. (1960): Sampling methods for censuses and surveys, 3rd
edition. London, Charles griffin and company.
Yates, F. and Grundy, P.M. (1953). Selection without replacement
from
within strata with probability proportional to size. Journal of
Royal Statistical Society, B, 15, 253-261.
Zadeh, L. A. (1965b). Fuzzy sets. Information and control, 8(3), 338353.
329