Stratified sampling technique

The stratified sampling technique is a well-established area in statistical sam­pling (Cochran, 1966). Variance reduction by the stratified sampling technique is achieved by taking more samples in important subregions. Consider a prob­lem in which the expectation of a function g (X) is sought, where X is a random variable with a PDF fx(x), x є E. Referring to Fig. 6.13, the domain E for the random variable X is divided into M disjoint subregions Em, m = 1, 2,…, M. That is,

m

S = U Sm 0 = Sm Fl Sm! m = m

m=1

Stratified sampling technique

Let pm be the probability that random variable X will fall within the sub­region Em, that is, f fx(x)dx = pm. Therefore, it is true that Lmpm = 1.

The expectation of g (X) can be computed as

m.. M

G = g(x) fx(x) dx =^2 / g(x) fx(x) dx =^2 Gm (6.86)

m=1J Em m=1

where Gm = fw g (x) fx(x) dx.

Note that the integral for Gm can be written as

Stratified sampling technique

(6.87)

 

(6.88)

 

Stratified sampling technique

Подпись: M G =Y, G m m=l Stratified sampling technique Подпись: nm Eg( Xmi) _i = 1 Подпись: (6.89)

where nm is the number of sample points in the mth subregion, and Lmnm = n, the total number of random variates to be generated. Therefore, the estimator for G in Eq. (6.86) can be obtained as

After the number of subregions M and the total number of samples n are determined, an interesting issue for the stratified sampling is how to allo­cate the total n sample points among the M subregions such that the variance associated with G by Eq. (6.89) is minimized. A theorem shows that the optimal n*m that minimizes Var(G) in Eq. (6.89) is (Rubinstein, 1981)

Подпись: (6.90)Подпись: nm=npm&m

v^M p _

Xm’ = 1 P m’ °m!

where am is the standard deviation associated with the estimator Gm in Eq. (6.88).

In general, information about am is not available in advance. It is suggested that a pilot simulation study be made to obtain a rough estimation about the value of am, which serves as the basis in the follow-up simulation investigation to achieve the variance-reduction objective.

A simple plan for sample allocation is nm = npm after the subregions are specified. It can be shown that with this sampling plan, the variance associated with G by Eq. (6.89) is less than that from the simple random-sample technique. One efficient stratified sampling technique is systematic sampling (McGrath, 1970), in which pm = 1/M and nm = n/M. The algorithm of the systematic sampling can be described as follows:

1. Divide interval [0, 1] into M equal subintervals.

2. Within each subinterval, generate n/M uniform random numbers umi ~ U[(m — 1)/n, m/n], m = 1, 2,…, M; i = 1, 2,…, n/m.

3. Compute Xmi = F—l(Umi).

4. Calculate G according to Eq. (6.89).

Example 6.13 Referring to Example 6.7, apply the systematic sampling technique to evaluate the pump failure probability in the time interval [0, 200 h].

Solution Again, let us adopt the uniform distribution U(0, 200) and carry out the computation by the sample-mean Monte Carlo method. In the systematic sampling, the interval [0, 200] is divided into 10 equal-probability subintervals, each having a probability content of 0.1. Since h(t) = 1/200, 0 < t < 200, the end points of each subinterval can be obtained easily as

tQ = 0, t1 = 20, t2 = 40,…, t9 = 180, t10 = 200

Furthermore, let us generate nm = 200 random variates from each subinterval so that £mnm = 2000. This can be achieved by letting

20(m — 1) 20m 10 , "00”

 

U

 

for i = 1, 2,…, 200; m = 1, 2,… ,10

 

U„

 

The algorithm for estimating the pump failure probability is the following:

1. Initialize subinterval index m = 0.

2. Let m = m + 1. Generate nm = 200 standard uniform random variates {um1, um2,…, um,200}, and transform them into the random variates from the corre­sponding subinterval by tmi = 20(m — 1) + 20umi, for i = 1, 2,…, 200.

3. Compute pf, m as

Подпись: 200

pf, m = 200 5-/ ft(tmi)

mi = 1

and the associated variance as

Подпись: Var( p f ,m) =p2 s2 0 12s2

m m. m

nm 200

in which sm is the standard deviation of 200 ft(tmi) for the mth subinterval.

4. If m < 10, go to step 2; otherwise, compute the pump failure probability as

1 10

P f = 10 E p f m

m=1

and the associated standard error as

“I 1/2

 

10

 

1

 

m=1

The results from the numerical simulation are shown below:

m

p f, m

sm

m

p f, m

sm

1

0.15873

0.00071102

6

0.14659

0.00066053

2

0.15626

0.00069358

7

0.14423

0.00064361

3

0.15374

0.00069298

8

0.14194

0.00064993

4

0.15121

0.00072408

9

0.13968

0.00066746

5

0.14887

0.00065434

10

0.13742

0.00067482

All 0.14787

0.15154 x 10—5

The value of pf is extremely close to the exact solution of 0.147856.

Updated: 22 ноября, 2015 — 6:25 дп