7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 1/62
1
Zhi Ying Feng
ACTL2003 Stochastic Modelling
By Zhi Ying Feng
ContentsPart 1: Stochastic Processes ................................................................................................................. 4
Stochastic Processes ......................................................................................................................... 4
Increment Properties .................................................................................................................... 4
Markov Processes ............................................................................................................................ 4
Transition Probability .................................................................................................................. 5
Chapman-Kolmogorov Equations................................................................................................ 5
Classification of States ................................................................................................................. 6
Irreducible Markov Chain ............................................................................................................ 6
Recurrent and Transient States .................................................................................................... 6
Class Properties ............................................................................................................................ 8
Limiting Probabilities .................................................................................................................. 8
Mean Time in Transient States .................................................................................................... 9
Branching Processes .................................................................................................................... 9
Probability Generating Functions .............................................................................................. 10
Time Reversible Markov Chains ............................................................................................... 11
Exponential, Poisson and Gamma Distributions........................................................................ 11
Counting Process............................................................................................................................ 12
Poisson Process .......................................................................................................................... 13
Inter-arrival and Waiting Time .................................................................................................. 13
Order Statistics ........................................................................................................................... 13
Conditional Distribution of Arrival Time .................................................................................. 14
Thinning of Poisson Process ...................................................................................................... 14
Non-hom*ogenous Poisson Process ............................................................................................ 15
Compound Poisson Process ....................................................................................................... 15
Continuous-time Markov Chains ................................................................................................... 16
Time Spent in a State ................................................................................................................. 16
Transition Rates and Probabilities ............................................................................................. 17
Chapman-Kolmogorov Equations.............................................................................................. 18
Limiting Probabilities ................................................................................................................ 19
Embedded Markov Chain .......................................................................................................... 19
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 2/62
2
Zhi Ying Feng
Time Reversibility...................................................................................................................... 20
Birth and Death Process ................................................................................................................. 21
Transition Rates and Embedded Markov Chain ........................................................................ 21
Examples of Birth and Death Processes .................................................................................... 21
Expected Time in States ............................................................................................................. 22
Kolmogorov Equations .............................................................................................................. 23
Limiting Probabilities ................................................................................................................ 23
Application: Pure Birth Process ................................................................................................. 24
Application: Simple Sickness Model ......................................................................................... 24
Occupancy Probabilities and Time ............................................................................................ 25
First Holding Time ..................................................................................................................... 25
Non-hom*ogenous Markov Jump Processes ................................................................................... 26
Residual Holding Time .............................................................................................................. 27
Current Holding Time ................................................................................................................ 27
Part 2: Time Series ............................................................................................................................. 28
Introduction to Time Series ............................................................................................................ 28
Classical Decomposition Model ................................................................................................ 28
Moving Average Linear Filters .................................................................................................. 28
Differencing ............................................................................................................................... 29
Stationarity ................................................................................................................................. 30
Sample Statistics ........................................................................................................................ 30
Noise .......................................................................................................................................... 31
Linear Processes ......................................................................................................................... 31
Time Series Models ....................................................................................................................... 32
Moving Average Process ........................................................................................................... 32
Autoregressive Process .............................................................................................................. 33
Autoregressive Moving Average Models .................................................................................. 34
Causality..................................................................................................................................... 34
Invertibility................................................................................................................................. 35
Calculation of ACF .................................................................................................................... 35
Partial Autocorrelation Function ................................................................................................ 37
Model Building .............................................................................................................................. 38
Model Selection ......................................................................................................................... 38
Parameter Estimation ................................................................................................................. 39
Model Diagnosis ........................................................................................................................ 40
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 3/62
3
Zhi Ying Feng
Non-Stationarity ............................................................................................................................. 41
Stochastic Trends ....................................................................................................................... 41
ARIMA Model ........................................................................................................................... 42
SARIMA Model ......................................................................................................................... 42
Dickey-Fuller Test ..................................................................................................................... 43
Overdifferencing ........................................................................................................................ 44
Cointegrated Time Series ........................................................................................................... 44
Time Series Forecasting ................................................................................................................. 45
Time Series and Markov Property ............................................................................................. 45
k-step Ahead Predictor ............................................................................................................... 46
Best Linear Predictor ................................................................................................................. 47
Part 3: Brownian Motion.................................................................................................................... 48
Definitions .................................................................................................................................. 48
Properties of Brownian Motion.................................................................................................. 48
Brownian Motion and Symmetric Random Walk...................................................................... 49
Brownian Motion with Drift ...................................................................................................... 49
Geometric Brownian Motion ..................................................................................................... 49
Gaussian Processes .................................................................................................................... 50
Differential Form of Brownian Motion ..................................................................................... 50
Stochastic Differential Equations............................................................................................... 51
Stochastic Integration ................................................................................................................. 52
Part 4: Simulation............................................................................................................................... 53
Continuous Random Variables Simulation .................................................................................... 53
Pseudo-Random Numbers.......................................................................................................... 53
Inverse Transform Method......................................................................................................... 53
Discrete Random Variable Simulation ...................................................................................... 54
Acceptance-Rejection Method ................................................................................................... 55
Simulation Using Distributional Relationships.......................................................................... 56
Monte Carlo Simulation ................................................................................................................. 57
Expectation and Variance .......................................................................................................... 57
Antithetic Variables ................................................................................................................... 58
Control Variates ......................................................................................................................... 59
Importance Sampling ................................................................................................................. 60
Number of Simulations .............................................................................................................. 60
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 4/62
4
Zhi Ying Feng
Part 1: Stochastic Processes
Stochastic Processes
A stochastic process is any collection of random variables, denoted as:
,X t t T
Tis the index set of the process, usually the time parameter X(t)is the state of the process at time t
S is the state space, i.e. the set of values thatX(t)can take on
Stochastic processes can be classified according to the nature of the:
Index Set T
- Discrete time process: index set is finite or countably infinite
- Continuous time process: index set is continuous
State SpaceS
-
Discrete state space:X(t)is a discrete random variable- Continuous state space:X(t)is a continuous random variable
A sample path or a realisationof a stochastic process is a particular assignment of possible values
ofX(t)for all t ∈ T.
Increment Properties
In a Markov chain, an increment is a random variable for all 1 2,t t :
2 1X t X t
A stochastic process has independent increments if 0 1 0 1, ,..., n nX t X t X t X t X t areindependent for all 1 2 ... nt t t . Equivalently, the r.v. X t s X t and X t are independent
for alls t , i.e. future increases are independent of the past or present.
A stochastic process has stationaryincrements if 2 1 2 1andX t X t X t X t , i.e.
increments of the same length, have the same probability distribution, for all 1 2, and 0t t .
Markov Processes
A Markov processis a stochastic process that has the Markov property, where given the present
state, the future state is independent of the past states:
1 1 1 1 1 1Pr | ,..., Pr |n n n n n n n nX t x X t x X t x X t x X t x
A Markov Chainis a Markov process on a discrete index set 0,1,2,...T denoted by
, 0,1,2,...nX n
i.e. the process is in state kat time n, and a finite or countable state space:
0,1,2,..., or 0,1,2,...S n S
The Markov property for a Markov chain is given by:
1 1 0 0 1 1Pr | ,..., Pr |n n n n n n n nX x X x X x X x X x
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 5/62
5
Zhi Ying Feng
Transition Probability
The one-step transition probabilitiesof the Markov chain is the conditional probabilityof
moving to statejin one step, given that the process is in state iin the present: , 1
1Pr |n n
ij n nP X j X i
If the one-step transition probabilities do NOTdepend on time, i.e. same for all nto n+1, then the
Markov chain ishom*ogenous with stationary transition probabilities. , 1
1Pr |n n
ij ij n nP P X j X i
The transition probability matrixis the matrix consisting of all transition probabilities:
00 01 02
10 11 12
20 21 22
ij
P P P
P P PP
P P P
P
This matrix satisfies the properties:
0 for , 0,1, 2... and 1 for 0,1, 2...ij ij
all jP i j P i
Note: in the non-hom*ogenous case, time must be specified.
The n-step transition probabilityis the probability that a process in state iwill be in statejin n
steps:
Pr |n
ij n m mP X j X i
Chapman-Kolmogorov Equations
The Chapman-Kolmogorov equations computes the n+m-step transition probabilities:
for all , 0n m n m
ij ik kj
k
P P P n m
This is the sum of the transitional probabilities of reaching an intermediate state kafter nsteps, then
reaching statejafter msteps. Or in matrix multiplication:
...
n m n m
n m times
P P P P P P
Where n nijP P is the matrix consisting of the n-step transition probabilities.
Consider the matrix multiplication A*B = AB
To get the ith row of AB, multiply the ith row of A by B
To get thejth column of AB, multiply A by thejth column of A
Kolmogorov Forward Equations: start in state i, transition into state kafter nsteps, then a one-
step transition to statej
1
for all 0n n
ij ik kj
k
P P P n
Kolmogorov Backward Equations: start in state i, one-step transition into state k, then nstep
transition into statej.
1
for all 0n n
ij ik kj
k
P P P n
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 6/62
6
Zhi Ying Feng
Classification of States
Absorbing state
A state iis said to be an absorbing state if 1iiP or 0 for allijP j i . An absorbing state is a state
in which, if the process ones arrive in it, the process will stay in.
Accessible
Statejis accessible from state iif 0n
ijP for some 0n . Statejis accessible from iif there is a
positive probability that the process will be in statejat some future time, given that the process is
currently in state i. This is written as:
i j
Note that:
if and theni j j k i k
The absorbing state is the ONLYaccessible state for an absorbing state.
Communicate
States iandjcommunicates if:
and , i.e.i j j i i j
That is, there is a positive probability that if the process is currently in state i, then at a future point
in time the process will return to state iin between visiting statej. Note that:
for all
if then
if and then
i i i
i j j i
i j j k i k
The class of statesthat communicate with state iis the set:
:C i j S i j
That is, there is a positive probability that if the process is in state i, then at a future point in time the
process will return to state iin between visiting at least one state in class C
Irreducible Markov Chain
A Markov chain is irreducible if there is only ONEclass, i.e. all states communicate with each
other. Properties:
All states in a finite, irreducible Markov chain is recurrent
The probability of returning to the current state in the long run is positive The probability of being in any state in the long run is also positive
Recurrent and Transient States
Letfibe the probability that the process starting in state iwill return to state isometime in the future.
0Pr |ni ii nf P X i X i
A state is:
Recurrentif 1if
Transientif 1if
If state iis recurrent then, starting in state i, the process will return to state iat some point in the
future, and the process will enter state iinfinitely often.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 7/62
7
Zhi Ying Feng
If state iis transient then, starting in state i,the number of the times the process will enter state i
thereafter is a Bernoullirandom variable
0 if process returns to state
~ 11 if process does not return to state
i i
iX Ber p f
i
Then the probability that the process will be in state ifor exactly ntime periods can be modelled
using the geometric distribution:
1 1n
i if f
The total number of periods that the process is in state iis given by:
1 if ;, where
0 if ,
n
n n
nn
X iI I
X i
With the expected number of time periods that the process starts in state i:
0 0
0 0
| |
Pr |
1
1
n n
n n
n
n
nii
n
i
E I X i E I X i
X i X i
P
f
An alternate definition for recurrent/transient states is if a state iis:
1 1
recurrent if ; transient if n nii ii
n n
P P
That is, a transient state will only be visited a finite number of times.
Note:
In a finite state Markov chain, AT LEASTone state must be recurrent and NOT ALLstates
can be transient. Otherwise, after a finite number of times, no states will be visited!
An absorbing state is recurrent, since it will revisit itself infinite times
If state icommunicates with statejand state iis: Recurrent, then statejis also recurrent
Transient, then statejis also transient
A class of states is:
Recurrentif all states in the class are recurrent
Transientif all states in the class are transient
Closedif all of the states in the class can only lead to states WITHINthe class. Hence,
states in a closed class are recurrent
Open if the state in the class can lead to states OUTSIDEthe class. Hence states in an open
class are transient
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 8/62
8
Zhi Ying Feng
Class Properties
Period
State ihas period d(i)if d(i)is the greatest common divisor of all 1n for which 0niiP , i.e. it’s
the number of steps in all possible paths back state i, if the process starts in state i. Note that:
If 0niiP for all n, then 0d i .
If , theni j d i d j
A state with period 1 is called aperiodic
Positive Recurrent
A recurrent state iis positive recurrent if the expected time of return to itself is finite. In a finite
state Markov chain, ALL recurrent states are positive recurrent.
Ergodic
A state iis ergodic if it is positive recurrent and aperiodic.
Limiting Probabilities
Limiting probabilities is the long run probability that a process is in a certain state. For an
irreducible (only one class, all states communicate) and ergodic Markov chain, the limiting
probability exists and is independent of i, i.e. where the process starts from:
lim 0nj ij
nP j
Where j is the unique non-negative solution to the set of equations:
0 0
and 1j i ij j
i j
P
Or, in matrix form:
1 2where , ,... P
This can be interpreted as:
The probability that the process being in statejat time tis the same as t+1, as t
j is the long run proportion of time that the Markov chain is in statej
Note that the limiting probabilities may not exist at all, or only for even/odd transitions
If statejis transient and state iis recurrent, then lim 0n
ijn P andlim 1n
iin P
If the distribution of the initial states is chosen to be the limiting distribution, then the probabilities
of being in statejinitially is same as the probability of being in statejat time n:
0Pr , then Pr j n jX j X j
The mean time between visitsto statej is the expected number of transitions jjm until a Markov
chain which starts in statejto return to statej, given by the mean of geometric distribution:
1jj
j
m
That is, the proportion of time in statejequals the inverse of the mean time between visits to statej.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 9/62
9
Zhi Ying Feng
Mean Time in Transient States
For a finite state Markov chain, let its transient states be denoted by 1,2,...,T t . Let TP denote the
transition matrix that ONLYcontains the transient states. Note that the rows will sum to less than 1.
11 12 1
21 22 2
1 2
...
...
...
...
t
t
T
t t tt
P P P
P P P
P P P
P
The mean time spent in transient states ijs is the expected number of periods that the Markov
chain is in statej, given that the process starts in state i.
11 12 1
21 22 2
1 2
...
...
...
t
t
t t tt
s s s
s s s
s s s
S
Conditioning on the initial transition, we have for ,i j T ,
1
1
if
1 if
t
ik kj
k
ij t
ik ki
k
P s i j
s
P s i j
i.e. mean time spent in statejgiven the process is currently in state i, given by the transition
probability into state kstarting in state i, times by the time spent in statejstarting in state k,
summed for all possible k.
Then we have:
1
T
T
S = I P S
S I P
Note that:
11 1
det
a b d b d bA
c d c a c aA ad bc
Branching ProcessesA branching process is a Markov chain that describes the size of a population where each member
in each generation produces a random number offsprings in the next generation.
Let , 0jp j be the probability of producingjoffsprings by each individual in each
generation
Assume that the offsprings of each individual is independent of the numbers by others
LetX0be the number of individuals initially present, i.e. the zeroth generation
Individuals produced from the nth generation belong to the (n+1)th generation
LetXnbe the size of the nth generation.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 10/62
10
Zhi Ying Feng
Then , 0,1, ...nX n is a branching processwith:
00 1P and state 0 is recurrent
If 0 0p , all states other than 0 are transient, i.e. the population will either die out or
converge to infinity, since transient states are visited finite amount of times
After n+1generations, the size of the population is a random sun:
1
1
nX
n i
i
X Y
Where Ykare i.i.d. r.v. that represent the no. of offspring of theXi-th person:
Pr , for 0,1,... 1i k k
k
Y k p k p
The mean and variance of number of offspring for an individual is:
22
var
i kk
i k
k
E Y kp
Y k p
The mean and variance of number of the population of the nth generation is:
12 1
1
2
if 1var
if 1
n
nn
n
n
E X
X
n
Under the assumption that 0 1X , the probability of extinction, i.e. everyone dies, is given by:
0 0
k
k
k
p
Since the pgf of Yis 0
kY kk
G s p s
we have
0 0YG
If 1 then 0 1
If 1 then 0 1 , i.e. extinction is certain
Probability Generating Functions
LetXbe an integer-valued random variable with iP X i p for 0,1, 2...i . The p.g.f. is:
PrX xX
x
G t E t t X x
Properties:
Relationship between the probability mass functionofX:
1
!
xX
X x
t
P tp x
x t
Relationship between the moment generating functionofX:
exp logX X Xm t E Xt G t m t
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 11/62
11
Zhi Ying Feng
Time Reversible Markov Chains
Consider a stationary ergodic Markov chain with stationary probabilitiesPijand stationary
probabilities πi. If we start at time nand work backwards, the reversed sequence of statesXn, Xn-1,…
is also a Markov chain, due to independence being a symmetricrelationship.
The transition probabilities of this reversed process are:
1
1
1
1
1
Pr |
Pr ,
Pr
Pr Pr |
Pr
ij m m
m m
m
m m m
m
j ji
i
Q X j X j
X j X j
X j
X j X j X j
X j
P
A time reversible Markov chain is one where:for all ,ij ijQ P i j
For a time reversible Markov chain:
for all ,i ij j jiP P i j
Thus, the rate at which the process goes from state ito statej(LHS) is the same as the rate at which
it goes from statejto state i.
Exponential, Poisson and Gamma Distributions
A r.v.Xhas an exponential distribution if its probability density function is in the form:
if 0
0 otherwise
xe xf x
The cumulative probability distribution and the survival function are in the form:
1 if 0
0 if 0
xe x
F xx
if 01
0 if 0
xe x
S x F xx
The moment generating function is given by:
for
tXXM t E e
tt
The mean and variance of the exponential distribution is:
2
1
1var
E X
X
The hazard (or failure) rate is given by:
1
x
x
dS x
f x edxxS x F x e
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 12/62
12
Zhi Ying Feng
A key property of the exponential distribution is that it is memoryless.Xis said to be memoryless if
for alls,t>0,
Pr | Pr or;
Pr Pr Pr
X s t X t X s
X s t X t X s
Consider independent iX for 1,2,...,i n with parameter i then
If all the parameters are equal i then 1 2 ...n nY X X X has a gamma(n, )
distribution with pdf:
1
1 !n
n
tY
tf t e
n
1 2min , ,..., nX X X X also has an exponential distribution with parameter and mean:
1
1
1and
n
i ni ii
E X
The probability that iX is the smallest is given by:
1
i
n
jj
If all parameters are not equal, i.e. i j then the sum of nindependent exponential r.v. has
a hyperexponential distributionwith pdf:
1
, ,
1
wherein
ii
njx
i n i i nXi j i j i
f x C e C
Counting Process
A counting process , 0N t t represents the number of events that occur up to time t, i.e. it is a
discrete state space, continuous time stochastic process. A counting process has the properties:
0N t
N t is integer-valued
forN s N t s t , i.e. it must be non-decreasing
Fors t , N t N s is the number of events that occurred in the interval ( , ]s t
A counting process has independent incrementsif the number of events that occur in the interval
( , ]s t , N t N s , is independent of the number of events that occur up to times
A counting process has stationary incrementsif 2 1N t s N t s has the same distribution as
2 1N t N t , i.e. the number of events that occur in any interval of time period only depends on
the length of the time period.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 13/62
13
Zhi Ying Feng
Poisson Process
A Poisson processdenoted by , 0N t t is a counting process that counts the number of events
which occur at a rate λfrom time 0. It has the properties:
0 0N
Independent and stationary increments
The number of events in any interval of length thas a Poissondistribution with mean t :
Pr Pr stationary increments
!
n
t
N s t N s n N t n
te
n
Inter-arrival and Waiting Time
The inter-arrivalor holding times , 1, 2, ...nT n is the time between the 1 th
n and thn event. For a
Poisson process, the inter-arrival times are i.i.d. exponential r.v. with:
1
, Pr andn
t tT n nf t e T t e E T
The waiting time nS is the time until the thn event, given by:
1 2
1
...n
n n i
i
S T T T T
The sum of exponential random variables each with SAMEparameter has a gamma(n, )
distribution, therefore:
1
1 !n
n
tS
tf t en
Note: if Snis the aggregate of mindependent claims, then it has a gamma(n, mλ) distribution. E.g.
for a motor insurer, each insured motorist makes a claim at a rate of 0.2 per year, and there are a
total of 200 insured. ThenS100, the waiting time until the 100thevent, has the distribution:
100~ Gamma 100, 0.2 200S
Order Statistics
Let iYbe the ith smallest value among 1 2, , ..., nY Y Y , then 1 2, ,..., nY Y Y is called the order statistics. If
iY are i.i.d. continuous r.v. with pdf if y , then the joint pdf of the order statistics 1 2, ,...,
nY Y Y is:
1 2 1 2
1
, ,..., ! , where ...n
n i n
i
f y y y n f y y y y
If iY are uniformly distributed over 0,t then the joint pdf of the order statistics is:
1 2
!, , ..., n n
nf y y y
t
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 14/62
14
Zhi Ying Feng
Conditional Distribution of Arrival Time
Given that N t n , i.e. up until time t, nevents have occurred, the narrival times 1,..., nS S have the
same distribution as the order statistics corresponding to nindependent r.v. uniformly distributed
over 0,t .
Note that the following two events are equivalent:
1 1 2 2 1 1 2 2 1 1 1, ,..., , , ,..., ,n n n n n nS s S s S n N t n T s T s s T s s T t s
i.e. inter-arrival time between the 1 th
n and thn event is equal to the waiting time for nevents
minus the waiting time for 1 th
n events:
11
1 21 2
, ,..., ,, ,..., |
Pr
...
!
!
n n n
nn
s s t ss
n
t
n
f s s s nf s s s N t n
N t n
e e e
te
n
n
t
The conditional marginal distribution is:
Pr ,Pr |
Pr
Pr Pr
Pr
!
! !
n mm t ss
n t
mn m
n
N s m N t nN s m N t n
N t n
N s m N t s n m
N t n
n s e t s e
m n m t e
n st s
m t
Thinning of Poisson Process
Consider a Poisson process
, 0N t t with rate and that each time an event occurs, it is either:
type I event with probability
type II event with probability 1
p
p
And that it is independent of all other events. Let 1N t and 2N t denote the type I and type II
events occurring in [0,t]. Let the sum of the two independent process be:
1 2N t N t N t
1 , 0N t t is a Poisson process with rate p
2 , 0N t t is a Poisson process with rate 1 p
, 0N t t is a Poisson process with rate equal to the sum of the two rates, i.e.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 15/62
15
Zhi Ying Feng
Non-hom*ogenous Poisson Process
The counting process , 0N t t is a non-hom*ogenousPoisson process with intensity function
t , i.e. non-constant rate, if:
0 0N
It has independent increments It has unit jumps, i.e.
Pr 1
Pr 2
N t h N t h o h
N t h N t o h
t itself can be deterministic or a process itself. Note that if t then it is a hom*ogenous
Poisson process.
The mean value functionof a non-hom*ogenous Poisson Process is:
t
m t y dy
Then:
N t is a Poisson r.v. with mean m t
N s t N s is a Poisson r.v. with mean m s t m s
1 , 0N m t t is a hom*ogenous Poisson process with intensity 1
Compound Poisson Process
A stochastic process , 0X t t is a compound Poisson processif
1
N t
i
i
X t Y
Where:
, 0N t t is a Poisson process
, 1, 2, ...iY i are i.i.d. random variables
This is useful for insurance companies, where the uncertainty on total claim sizeX(t)is due to both
the number of claimsN(t)which follows a Poisson process, and also the claim size Y, which canhave any distribution as long as they are i.i.d. Note that claim size Yis independent of the number of
claimsN(t)
The mean and variance of a compound Poisson process is given by:
2var
i
i
E X t tE Y
X t t E Y
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 16/62
16
Zhi Ying Feng
Continuous-time Markov Chains
A continuous-time Markov chain, or a Markov jump process, is a Markov process in continuous
timewith a discrete set space. The Markov jump process , 0X t t has a continuous version of
the Markov property: for all , 0s t and nonnegative integers , , , 0i j x u u s :
,
Pr , Pr |
X t iX t s j X u x u X t s j X t i
u t
i.e. the process at time t+sis only conditional on the state at time tand independent of its past
states before time t, i.e. for 0X u x u u t . This means that time path up to the state at time t
does not matter.
The Markov jump process is a hom*ogenousprocess, i.e. all transition probabilities are stationary, if:
Pr | Pr | 0 i.e. independent of
ij
X t s j X t i X s j X i t
P s
Hence, the transition probability over a time period only depends on the duration of the time period.
The Poisson Process is a continuous time Markov chain:
Pr | Pr jumps in ( , ] | jumps in 0,
Pr jumps in ( , ] independent increments
Pr jumps in (0, ] stationary increments
!
j i
t
X t s j X s i j i s s t i s
j i s s t
j i t
tej i
Time Spent in a State
The time spent in a state ibefore the next jump, denoted asTi, for a continuous Markov chain, has
the memoryless property:
Pr | Pr , , 0
Pr , | Markov Property
Pr , 0 | 0 Stationary IncrementsPr
i i
i
T s t T s X v i s v s t X u i u s
X v i s v s t X s i
X v i v t X iT t
The only continuous distribution with the memoryless property is the exponential distribution.
Hence, the time spent in a state is exponentially distributed with:
~ exp Pr 1 iv ti i iT v T t e
With the expected time spent in stateibefore transitioning into another state is:
1i iE T v
Where viis the rate transitionwhen in state i, i.e. the transition rate out of state ito another state. It
is the sum of all the rates of jumping from state ito another statej, denoted as qij:
i ij iij iv q q
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 17/62
17
Zhi Ying Feng
Transition Rates and Probabilities
The transition probability of the continuous-time Markov chain is defined as:
, Pr |ijP t t s X t s j X t i
For a hom*ogenous or stationary process we have stationary transition probabilities:
, Pr |
Pr | 0
ij
ij
P t t s X t s j X t i
X s j X i
P s
With the initial conditions
0 if
, 01 if
ij ij
i jP s s P
i j
Transition Rates of hom*ogenous Processes
The instantaneous transition rate ijq from state ito statejof a continuous-time Markov chain is
defined as the rate of change of the transition probability over a small period of time:
00
lim if 0 0
lim1
lim if
ij
ijhij ij
ijh
t iiii i
h
P hq i j
P h P hP t
t h P hq v i j
h
Where i iiv q denote the transition rate out of state iwhen the process is in state i.Note that:
1 sum of all transition probabilities is 1
0 sum of all transition rates is 0
ijj
ijj
P
q
Therefore, we have that:
0ii ijj i
ii ij ij i
q q
q q v
Equivalently, the transition probability over a small time hcan be defined as:
if
1 if
ijij
ii
q h o h i jP h
q h o h i j
Transition Rates of Non-hom*ogenous Processes
In the non-hom*ogenous case, the transition rates are also a derivative of the transition probabilities:
,lim if
, ,, lim
, 1lim if
ij
ijhij ij
ijh
t s iiii i
h
P s s hq s i j
P s s h P s s hP s t
t h P s s hq s v s i j
h
Equivalently, the transition probability over a small time hcan be defined as:
if,
1 ifij
ijii
q s h o h i jP s s hq s h o h i j
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 18/62
18
Zhi Ying Feng
Now, ignore when the transition occurs and how long is spent in each state. Consider only the
series of states that the process transitions into. Let:
iv be the transition rate OUTof state iwhen the process is in state i
ijPbe the probability that the transition is into statej, conditional on the fact that a transition
has OCCURREDand process is currently in state i. (see embedded Markov chain)
Then, ijq , the transition rate INTOstatej, when the process is in state i,is given by:
iij ijq v P
Using Pr |PA B AA P B , where A is the event of moving out of state i, B is the event of
moving to statej. Then we also have:
ii i ij i ijj i j i
ii i ijj i
q v q v P
q v q
Therefore,
ij ij
ij
i ij
j i
q qP
v q
Chapman-Kolmogorov Equations
For a hom*ogenous continuous-time Markov chain the Chapman-Kolmogorov equationis:
for 0 and all states ,ij ik kj
k
P t s P t P s t i j
With the initial conditions:
0 if
01 if
ij
i jP
i j
The Kolmogorov’s backward equationis:
ij ik kj ik kj i ij
k k i
P t q P t q P t v P tt
The Kolmogorov’s forward equationis:
ij ik kj ik kj j ij
k k j
P t P t q P t q v P tt
Define the matrices:
ij ij ijt P t q t P t t t
P Q = P
In matrix form we have:
Chapman-Kolmogorov Equations
Kolmogorov's Backward Equations
Kolmogorov's Forward Equations
Initial Condition 0
t s t s
t tt
t tt
P P P
P QP
P P Q
P I
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 19/62
19
Zhi Ying Feng
Limiting Probabilities
The limiting probabilities of a continuous-time Markov chainPjis the long run probability, or the
long-run proportion of time that the process will be in statej, independent of the initial state:
limj ijt
P P t
To solve the limiting probabilities, if it exists, we solve the equations:
0j j kj k kj k
k j k
v P q P q P
0P Q
1k
k
P
This equation implies that the rate at which process leaves state jis equal to the rate at which the
process enters statej:
j jv P is the rate at which process leaves statejsince jP is the long-run proportion of time the
process is in statejand jv is the rate of transition out of statejwhen the process is in statej
kj kk j
q P
is the rate at which the process enters statejsince
k
P is the long-run proportion
of the time the process is in state kand kjq is the rate of transition from state kto statej
For the limiting probabilities limt ijP to exist, the conditions are:
All statescommunicate so that starting in state ithere is a positive probability of being in
statej
Markov chain is positive recurrent so that starting in any state, the mean time to return to
that state is finite
When the limiting probabilities exist, then the Markov chain is ergodic, note that aperiodic
is unnecessary as periodicity does not apply to continuous-time Markov chains
If the initial state is chosen according to the limiting distribution, then the probability of being in
statejat time tfor all thas the same distribution (hom*ogenous)
Pr 0 Pr j jX j P X t j P
Embedded Markov Chain
For a continuous-time Markov chain that is ergodic with limiting probabilities iP, the sequence of
states visited(ignoring time spent in each state) is a discrete-time Markov chain. This discrete-timeMarkov chain is known as the embedded Markov chain.
The embedded Markov chain has transition probabilities:
Pr transition to state | transition out of state
Pr transition from state to state
Pr transition out of state
ij
ij ij
i ijj i
P j i
i j
i
q q
v q
Note: in general ij ijP P t ! Do not get confused!
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 20/62
20
Zhi Ying Feng
Assuming that the embedded Markov chain is ergodic, its limiting probabilities i , i.e. the long run
proportion of transitions into state i, are the unique solutions of the set of equations:
and 1i j ji i
j i
P
The proportion of timethe continuous-time process spends in state iis given, i.e. the limiting
probabilitiesof the original continuous Markov process, can also be found using:
i i
i
j jj
vP
v
Note that i is the limiting probability of the embedded Markov chain, is the proportion of the
number of transits into state imultiplied by1 iv , i.e. the mean time spent in state iduring a visit
Time Reversibility
Going backwards, given the process is in state iat time tthe probability that the process has been in
state ifor an amount of time greater thansis
iv s
e
:
Pr Process in state throughout [ , ]Pr Process in state throughout , |
Pr
Pr Pr
Pr
i
i
v s
i t s t i t s t X t i
X t i
X t s i T s
X t i
e
Since for large t, we have the limiting probability Pr Pr iX t s i X t i P . Therefore, if
we go back in time, the amount of time the process spends in state iis also exponentially with rate
iv . Thus the reversed continuous-time Markov chain has the same transition intensities as for the
forward-time process.
The sequence of states visited by the reversed process, i.e. its embedded chain, is a discrete-time
Markov chain with transition probabilities:
j ji
ij
i
PQ
Thus, the continuous-time Markov chain will be time reversible, i.e. with same probability
structure as the original process, if the embedded chain is time reversible, i.e.
i ij j jiP P
Using the proportion of time the continuous-time chain is in state i:
i i i
i i i i
j jj jj
i ij j ji i i ij j j ji
vP Pv
vv
P P Pv P P v P
Note that ij i ijq v P , then we have an equivalent condition for time reversibility:
i ij j jiPq P q
i.e. rate at which the process goes directly from itojis the same as the rate to go directly from jto i.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 21/62
21
Zhi Ying Feng
Birth and Death Process
A birth and death processis a continuous-time Markov chain with states 0,1, 2... for which
transitions from state nmay only go into either 1n (a birth) or state 1n ( a death).
Suppose that the number of people in a population/system is n
New arrivals enter the population/system at a birth/arrival rate n
- The time until the next arrival is exponentially distributed with mean 1 n
People leave the population/system at a death/departure rate n
- The time until the next departure is exponentially distributed with mean 1 n
Transition Rates and Embedded Markov Chain
Let vibe the transition rateout of stateiwhen the process is in statei. Initially with a population
of 0, there can only be birth (population can't be negative), so 0 0
0 0i ij
i i ij i
vv q
v
For the corresponding embedded Markov chain, let the transition probabilityPijdenote the
transition probabilities between states. If the Markov chain is in state 0and a transition occurs, it
must transition into state 1, therefore:
01 1P
To derive the transition probabilities of the embedded Markov chain, consider a population of i. It
will jump to 1i if a birth occurs before death and jump to 1i if a death occurs before birth:
The time to a birthTbis exponential with rate i
The time to a deathTdis exponential with ratei
Therefore, the probability that a birth occurs before a death is given by: (minimum of ind. Exp.)
Pr ib d
i i
T T
Then the transition probabilitiesare:
, 1 , 1 , 1 ,and 1 and 0 for 1 1i ii i i i i i i k
i i i i
P P P P i k i
Also, since the time to the next jump, regardless of whether it is a birth or death, is exponential with
rate viand the rate of jumping out of state iis either a birth or death, then we have as before:
i i iv
Examples of Birth and Death Processes
Birth and death rates independent of n
A supermarket has one server and customers join a queue. Customers arrive at rate with
exponential inter-arrival time (Poisson process). The server serves at a rate with service time also
exponentially distributed. LetX(t)be the number in the queue at time t, then:
for 0
for 10 for 0
n
n
n
nn
, 1
, 1
1 0
1,2,...
0 0
1,2,...
i i
i i
for iP
for i
for iP
for i
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 22/62
22
Zhi Ying Feng
Birth and death rates dependent on n
If there are nowscheckout operators serving the queue, with customers joining a single queue and
going to the first available server, then:
for 0
0 for 0
for 1
for
n
n
n
n
n n s
s n s
, 1
, 1
for 0,1, 2...min ,
min , for 0,1,2..min ,
i i
i i
P ii s
i sP ii s
Poisson Process
A poison process is a special case of a birth and death process, with no death:
for 0
0 for 0
n
n
n
n
, 1
, 1
1 for 0,1, 2...
0 for 0,1, 2..
i i
i i
P i
P i
Population Growth
Each individual in a population gives birth at an exponential rate plus there is immigration at an
exponential rate
. Each individual has an exponential death rate
for 0
for 0
n
n
n n
n n
, 1
, 1
for 0,1,2...
for 0,1,2..
i i
i i
nP i
n n
nP i
n n
Expected Time in States
Let Tibe the time for the process, starting from state i, to enter state i+1, i.e. time until a birth.
Define the indicator functionIas:
1 if first transition is 1
0 if first transition is 1i
i iI
i i
Then the expected value of Tican be calculated by conditioning on the first transition:
1
1
1
Pr 1 Pr 0| 1 | 0
1 1
1
1
i
ii i
i i i i
i i
i
i i
i i
i i
i i i i
i i
i i i i i i
i
T
T T
T
I IT I T I
T T
Since the time to first transition, regardless of whether it is a birth or death, is exponential with rate
i i . The time taken to i+1if there is a death is the time from i-1to ithen from ito i+1. Note:
00
1T
In the case where the rates are hom*ogenous, i.e. , for alli i i :
111
1 ...
ii
iT
Then the expected time to go from state ito a higher statejis:
1 1...j i i jT T T T
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 23/62
23
Zhi Ying Feng
Kolmogorov Equations
We can use the transition rates of a death and birth process to find the Kolmogorov’s equations, and
then solve these differential equations to find the transition probabilities.
0,1 0
0,0 0 0
0, 0 for all 1i
q
q v
q i
, 1
, 1
, , otherwise zero
i i i
i i i
i i i i i
q
q
q v
Kolmogorov’s Backward Equations
0, 0 1, 0 0,
, 1, 1, ,
j j j
i j i i j i i j i i i j
P t P t P tt
P t P t P t P tt
Kolmogorov’s Forward Equations
,0 0 ,1 0 ,0
, 1 , 1 1 , 1 ,
i i i
i j j i j j i j i i i j
P t P t P tt
P t P t P t P tt
Limiting Probabilities
The limiting probabilities are determined by the balance equations:
for all and 1j j kj k k
k j k
v P q P j P
This states that the rate at which the process leaves statej(RHS) is equal to the rate at which the
process enters statej(LHS). For a birth and death process, this condition breaks down iteratively:
1 1 1 1 1 1n n n n n n n n n n nP P P P P Then in general:
0 01 11 0 2 1
1 0
1 2 2 1
1 111 0
1
2 1
...
, ...
njn
n n
jn n j
n
P P P P P
P P PP
Substituting this into the other balance equation0
1nn P
:
1 01
0 0 0 01 011 1 2 1
12 1
1
1 ... 11 ...
n
nnn n n
nn
P P P P P
Substituting back into (2) forPngives:
1 01
1 0 2 110
1 012 11
2 1
...
...
1 ...
n
n nn
nnn
n
P P
This also gives the conditionfor long run probabilities to exist, i.e.:
1 01
12 1
...n
nn
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 24/62
24
Zhi Ying Feng
Application: Pure Birth Process
The Poisson processis an example of a pure birth process, i.e.
for 0
0 for 0
n
n
n
n
In this case, re arranging the Kolmogorov backward equations for the case 0i
, 1, , , , 1,i j i j i j i j i j i jP t P t P t P t P t P tt t
Multiply both sides by te :
, , 1,t t
i j i j i je P t P t e P t t
By the product rule, this is equivalent to:
, 1,t t
i j i je P t e P t t
Integrate both sides w.r.t. from 0tos:
0, , 1, ,
, 1,0
0 0 0 for s
s ti j i j i j i j
s s t
i j i j
e P s e P e P t dt P i j
P s e P t dt
LHS is the probability of moving from state itojover the time period 0tos
RHS is the probability of the process staying in state ifor a time s t , i.e. s t
e
, then
jumping to state i+1over the time interval s t to s t dt , i.e. dt , then the probability
that the process starts in i+1and finishes injover the time interval s t tos, i.e. 1,i jP t dt ,
integrated for all possible times from 0tos, which is equivalent to tvarying from 0tos.
Application: Simple Sickness Model
In a simple sickness model, an individual can be in state 0 (healthy) or in state 1 (sick).
An individual remains healthy for an exponential time with mean1before becoming sick
An individual remains sick for an exponential time with mean1 before becoming healthy
In this birth and death process, we have:
0 1and
With all other ,i i being zeros.
Using Kolmogorov’s backward/forward equations, the transition probabilities are:
0,0
1,0
0,1
1,1
1
1
s
s
s
s
P s e
P s e
P s e
P s e
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 25/62
25
Zhi Ying Feng
Occupancy Probabilities and Time
In a simple sickness model, the occupancy probabilitiesare defined as:
0,0P s is the probability that a healthy individual stays healthy for a period ofs
1,1
P s is the probability that a sick individual remains sick for same period
By the Markov property, the time spent in state 0 or state 1 is exponential with the memoryless
property:
0 00,0
1 11,1
Pr 1 Pr 1 (1 )
Pr 1 Pr 1 (1 )
s s
s s
P s T s T s e e
P s T s T s e e
The occupancy timeO(t)is the total time that the process spends in each state during the interval
(0,t). If we define the indicator function:
1 if 1
0 if 0
X sI s
X s
Then the occupation time for being sick is:
tO t I s ds
The expected occupation time being sick given that the initial state is healthy is given by:
0,10
2
| 0 0 | 0 0
| 0 0
Pr 1| 0 0
1
1
t
t
t
t
ts
t
O t X I s ds X
I s X ds
X s X ds
P s ds
e ds
t e
First Holding Time
The first holding timeT0is the first the time process jumps out of the initial state:
0 0inf : tT t X X For a hom*ogenous Markov jump process, this is exponentially distributed with rate i which is the
rate of jumping out of state i, previously denoted as iv .
i ii ijj iv q
Thus, the first holding time has the c.d.f. and p.d.f.:
0 0 0 0Pr | Pr |i it tiT t X i e T t X i e
The probability of the state to which the process jumps from is:
0 0Pr |
ij
T iji
q
X j X i P
Where0TX is independent of 0T
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 26/62
26
Zhi Ying Feng
Non-hom*ogenous Markov Jump Processes
Let the transition ratesbe denoted by ij s , equivalent to the ijq s previously
,ij ij
t s
s P s tt
So that the non-hom*ogenous transition probabilities over a small time period of his:
if,
1 if
ij
ij
ii
h s o h i jP s s h
h s o h i j
ii ij
i j
s s
Note that this can be written as:
if,
1 if
ijij
ii
h s h o h i jP s h s
h s h o h i j
So that:
, ,ij ij
ij
P s h s P s s o hs h
h
Then by letting 0h we get:
, ,lim
, differential w.r.t
ij ij
ijh
ij
t s
P s h s P s s o hs
h
P s t ss
The Chapman-Kolmogorov equationsare:
, , , , , ,
,
ij ik kj
kP s t P s u P u t s t s u u t
s s
P P P
P I
The Kolmogorov’s forward equationsare derived by differentiating the Chapman-Kolmogorov
equation w.r.t t, then setting u t :
, , ,
, , ,
, ,
ij ik kj
k u t
kj ik kj ik j ij
k k j
P s t P s u P u tt t
t P s t t P s t v t P s t
s t s t tt
P P Q
The Kolmogorov’s backward equationsare derived by differentiating the Chapman-Kolmogorov
equation w.r.ts, then setting u s :
, , ,
, , ,
, ,
ij ik kj
k u s
ik kj ik kj j kj
k k j
P s t P s u P s ts s
s P s t s P s t v t P s t
s t s s t
s
P Q P
Where tQ is the matrix containing ij s for all i,j.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 27/62
27
Zhi Ying Feng
Residual Holding Time
The residual holding timeRsis the time betweensand the NEXTjump:
, ,s s uR w X i X i s u s w
i.e. the state remains in the same state ibetween timesands+w. Note that this is the non-
hom*ogenous case of first holding time. We can show that:
Pr | exps w
s s is
R w X i v u du
The density of |s sR w X i is given by:
Pr | 1 Pr | exps w
s s s s i is
R w X i R w X i v s w v u duw w
Define
ss s RX X
as the state the process jumps to at the next jump. The density for this is:
Pr | ,
ij
s s s
i
s wX j X i R w
v s w
Then we have the transition probability as:
, Pr |
s w
is
ij t s
v u du
i
P s t X j X i
e v s w
iI
i
s w
v s w
,
t s
Ij
I i
P s w t dw
This is the conditional probability that a process is in state jat time t, given that it started in state iat
times. This transition can be done by:
(1)Staying in state ifrom timesfor a duration of w, i.e.
s w
is
v u d ue
(2)
Jump out of state i, i.e. i s w
(3)Given that a jump has occurred, it jumps to an intermediate state I, i.e. iI is w v s w
(4)Then jump from stateItojovers+wto tand stay in statejuntil time t, i.e. ,IjP s w t
(5)Integrated over all possible wand summed for all possible states
This is the integral form of the Kolmogorov backward equation, i.e. we consider the jump (q)first
Current Holding Time
For non-hom*ogenous processes the current holding timeCtis the time between the last jump and t:
, ,t t uC w X j X j t w u t
It can be shown that:
Pr | expt
t t jt w
C w X j v u du
The density of |t tC X j is given by:
Pr | 1 Pr | expt
t t t t j jt w
C w X j C w X j v t w v duw w
Then we have the transition probability as:
, ,
t
jt w
t sv u dukj
ij ik jk j j
t wP s t P s t w v t w e dw
v t w
This is the integral form of the Kolmogorov forward equation, i.e. we consider the jump (q)last.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 28/62
28
Zhi Ying Feng
Part 2: Time Series
Introduction to Time Series
A time series is a sequence of observations that are recorded at regular time intervals, usually
discrete and evenly spaced, e.g. daily or monthly. Let tx denote the observed data:
1 2, ,... tx x x
A time series model for tx is a family of distributions to which the joint distribution of tX is
assumed to belong. Let tX denote the time series, where each tX is a r.v.:
1 1..., , , ,...t t tX X X
Therefore, 1 1, ,t t tx x x are realisations of the r.v. 1 1, ,t t tX X X
Classical Decomposition Model
The classical decomposition modeldecomposes the original dataXtinto 3 components
t t t t X T S N
1
0 0d
t t t d t jjN S S S
Where:
Ttis a deterministic trend componentthat is slowly changing and perfectly predictable
Stis a deterministic seasonal componentwith a known periodof dand perfectly
predictable. Note that the seasonal component over a complete cycle is zero, i.e. dwould be
4 if it is quarterly.
Ntis a random componentwith expected value 0, as all information should be captured in
the trend and seasonal component.Ntmay be correlated and hence partially predictable
Moving Average Linear Filters
A moving average linear filterhas the form:
1
2 1
q
t j t j t j
j j q
T a X X q
Eliminating the Seasonality
A moving average linear can also be passed throughXtto eliminate the seasonal component:
If the period dis odd, i.e. 2 1d q , use the filter:
1
1 1 1...t q t q t qX X Xd d d
If the period dis even, i.e. 2d q , use the filter
1 1
1 1 1...
2 2t q t q t q t qX X X X
d
Estimating the Trend
A moving average linear filter can be applied to estimate the trend. Consider t t tX T N where
tT t , applying a 2q+1point moving average filter toXtgives:
1 1
2 1 2 1
q q
t t j t j t j t
j q j q
T T N t j N t T q q
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 29/62
29
Zhi Ying Feng
For the classic decomposition model:
1 1 1
2 1 2 1 2 1
q q q q
t j t j t j t j t j
j q j q j q j q
T a X T S N q q q
This model works well, i.e. t tT T if:
The trend is approximately linear
The sum ofNtis close to zero
The sum of Stis zero
The estimated noise, or the residuals, is:
t t t t N x T S
Differencing
The backshift operator, or the lag operator,Bis defined by:
1t t
jt t j
BX X
B X X
The difference operator is defined by:
11t t t t X B X X X
The powers of are defined by:
1 jj
t t
t t
X B X
X X
The difference operator with lag dis defined by:
1 dd t t t t d X B X X X
Eliminating the Trend
Trend can be eliminated by using differencing. E.g. consider a linear trend:
0 1,t t t t X T N T c c t
Apply differencing to the power of one:
0 1 0 1
1
1
t t t
t
t
X T N
c c t c c t N
c N
i.e. the trend becomes a constant c1, which is stationary. This constant can be estimated by the
sample average of tX . In general, a trend of a polynomial of degree kcan be reduced to a constant
by differencing ktimes:
0 1 ... k k k k t k t t t T c c t c t X T N
Which is a random process with mean ! kk c
Differencing to Eliminate Seasonality
Seasonality can also be eliminated by differencing, by differencing once only but by a lag d
equivalent to the period of the seasonality:0d t t t d S S S
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 30/62
30
Zhi Ying Feng
Stationarity
The theory of stationary random processes is important in modelling time series, as stationarity
allows parameters to be estimated efficiently, since we can treat all samples as from the same
distribution. A non-stationary random process must be transformed into a stationary one before
analysis and modelling, i.e. by removing trend and seasonality, or by applying transformations.
A random process tX is said to be:
Integrated of order 0, i.e. 0I if tX itself is a stationary time series process
Integrated of order 1, i.e. 1I if tX is not stationary, but the increment through
differencing 1t t t t Y X X X form a stationary process
Integrated of order n, i.e. I n if 1n
tX is still not stationary, but the nth differenced series
nt tY X is a stationary time series process.
However, the deseasonalised, detrended residuals, i.e. the random component, may still contain
information, i.e. they are correlated over time. Assuming stationarity in the residuals, we can try to
fit a probability model to the residuals for forecasting purposes. To do so, we will need to look at
some sample statistics of the residuals:
Sample Statistics
Let tX be a stochastic process such that var tX for all t.
The autocovariance function (ACVF)is defined by:
,
0 , var
t t
t t t
Cov X X
Cov X X X
The autocorrelation function (ACF)is defined by:
,
,0 ,
t tt t
t t
Cov X X Corr X X
Cov X X
The covariance functionis defined to be:
,Cov X Y X X Y Y
XY X Y
Some properties of the covariance function are:
, 0
, ,
, , , , ,
Cov X a
Cov aX bY abCov X Y
Cov aX bY cU dV acCov X U adCov X V bcCov Y U bdCov Y V
A process tX is said to be weakly stationaryif:
cov ,
t
t t
X
X X
i.e. mean is constant and the covariance of the process only depend on the time difference .
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 31/62
31
Zhi Ying Feng
Noise
I.I.D Noise
tX is i.i.d. noise if tX and t hX are independently and identically distributed with mean zero, i.e.
no covariance.
2~ IID 0,tX
Assuming 2tX , as for weakly stationary series we need bounded first and second moments,
then:
2
if 0
0 if 0
X
X
hh
h
White Noise
tX is white noise with zero mean if:
2~ WN 0,tX
Where:
2
if 0
0 if 0
X
X
hh
h
IID noise is white noise, but white noise is not necessarily IID noise.
White noise is weakly stationary
Usually, we assume that the error terms have a normal distribution for the purpose of
parameter estimation and etc.
2~ 0,tX N
Linear Processes
tX is a linear process if it can be represented as:
jt j t j j t t
j j
X Z B Z B Z
Where j is a sequence of constants with jj
and 2~ WN 0,tZ .
Linear processes are stationary because they are a linear combination of stationary white noise
terms. For stationary processes, the regularity condition jj
holds, i.e. jj
is
absolutely convergent. This ensures that the infinite sum can be manipulated the same way as a
finite sum, i.e. two absolutely convergent series can be added or multiplied together.
In general, if tY is stationary and jj
holds then
t tX B Y
tX is stationary. Essentially, ALL stationary processes can be represented by a linear process
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 32/62
32
Zhi Ying Feng
Time Series Models
Moving Average Process
tX is a first-order moving average processMA(1), if there is a process 2~ WN 0,tZ and a
constant θsuch that:
1t t tX Z Z
i.e. the next term depends on the noise plus the proportion of the noise of the previous period.
The mean is:
1
t t tX Z Z
The autocovarianceACVFis:
2 2
1 1 1
21 1 2 1
1 1
,
, var 1 if 0
, var if 1
, 0 if 1
t t h
t t t t t t
t t t t t
t t t h t h
h Cov X X
Cov Z Z Z Z Z Z h
Cov Z Z Z Z Z h
Cov Z Z Z Z h
The autocorrelationsACFis:
22 2
1 if 0
if 10 11
0 if 1
h
h hh h
h
The conditional meanis stochastic and depends onXt:
1 1 1 1| , ,... | 0t t t t t t t t X X X Z Z Z Z X
The conditional varianceis constant, i.e. independent onXt:
2 21 1 1 1var | , ,... var | 1 var t t t t t t X X X Z Z X
Therefore, the conditional and unconditional mean and variance are different.
tX is a moving average process of order q, MA(q), if the process depends on its previous q
realisations of noises:
1 1 ...t t t q t qX Z Z Z
Note:
Moving average processes are stationary processes asXtis a linear combination of stationary
white noise terms
TheACFof a MA(q) process has non-zero values up until lag q, and near-zero values for all
lags greater than q
The conditional mean/variance is used for forecasts, whereas the unconditional
mean/variance is the long run results
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 33/62
33
Zhi Ying Feng
Autoregressive Process
tX is a first-order autoregressive processAR(1)if tX is stationary and there is a process
2~ WN 0,tZ , uncorrelated with allXsfor s t and a constant such that:
1t t tX X Z
i.e. the next term depends on the previous value of the process and slowly converges to its mean.Note that an AR(1) can also be written as:
1
1 1
1
1
...h h
t t h t t t h
hh j
t h t j
j
X X Z Z Z
X Z
The mean is:
1
X t tE X E Z
The autocovarianceACVFis:
2
2
2
2
if 01
cov ,
if 01
X t t h
h
h
h X X
h
The autocorrelationsACFis:
1 if 0
0 if 0
XX h
X
hhh
h
The conditional mean is stochastic and depends onXt: 1 1 1 1| , ,... | 0t t t t t t t t E X X X X E Z X X E X
The conditional variance is constant, i.e. independent onXt:
21 1 1 1var | , ,... var | var t t t t t t t X X X X Z X X
Therefore, the conditional and unconditional mean and variance are different.
tX is an autoregressive process of orderp,AR(p)if:
1 1 2 2 ...t t t p t p t X X X X Z
Note: For anAR(1)process to be stationary, 1 , so it can be expressed as a linear process.
When 1 , the process is known as a random walk.
For a higher orderAR(p)process, there will also be conditions on 1 2, ... p for stationarity
AnAR(1)process is equivalent to aMA(∞)process and aMA(1)process is equivalent to an
AR(∞)process. Theoretically,MA(∞)is equivalent to and can be converted to aAR(∞)
process
1
1
0 0
ash
h j j
t t h t j t j
j j
X X Z Z h
AnAR(p)process has, in absolute values, in general decaying non-zeroACFfor all lags.
The smaller , the fasterACFdecays. Ifis negative, thenACFwill have alternating signs.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 34/62
34
Zhi Ying Feng
Autoregressive Moving Average Models
tX is a first-order autoregressive moving average modelARMA(1,1), ifXtis stationary and:
1 1t t t t X X Z Z
Where ϕand θare constants and 2~ WN 0,tZ . Or alternatively:
t tB X B Z
Where
1 and 1B B B B
Then theARMA(1,1)equation can be written as:
t tB X B Z
Note that forXtto be stationary, the condition is the same as the AR(1) process, i.e. 1 .
tX is anARMAprocess of order (p,q), i.e.ARMA(p,q)if:
1 1 1 1... ...t t p t p t t q t q
t t
X X X Z Z Z
B X B Z
Where:
21 21 ... p
pz z z z
1 21 ... qqz z z z
The polynomialsϕ(z)and θ(z)have no common roots
TheARMA(p,q)process tX is stationary if the equation 0z has no roots on the unit circle,note that the roots can be complex i.e.:
0 for 1z z
tX is anARMA(p,q)process with drift if it is in the form:
1 1 1 1... ...t t p t p t t q t qX X X Z Z Z
If we estimate then remove the mean, then we have X as a normalARMA(p,q)process. The
drift here represents the expected change after differencing.
CausalityAnARMA(p,q)process tX is causal if there exists constants j such that j and:
t j t j
j
X Z
i.e.Xtis expressible in terms of current and past noise terms. Causal processes are a subset of
stationary processes, i.e. to be causal it must be stationary first. Causality is important in practice,
since ifXtis not causal then it depends on future noise term, which doesn’t make sense.
Theorem:
The tX satisfying t tB X B Z is causal if and only if all the roots of the equation 0z
are outside the unit circle.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 35/62
35
Zhi Ying Feng
Invertibility
AnARMA(p,q)process tX is invertible if there exists constants j such that j and:
t j t j
j
Z X
i.e.Ztis expressible in terms of current and pastXt. IfXtis not invertible then it depends on futureprocesses, again it does not make sense
Theorem:
The tX satisfying t tB X B Z is invertible if and only if all the roots of the equation
0z are outside the unit circle.
Note:
All MA processes are causal, but AR processes might not be
All AR processes are invertible, but MA processes might not be
An equivalent condition for an AR(1) process to be causal is 1
An equivalent condition for an MA(1) process to be invertible is 1
Calculation of ACF
Linear Filter Method
Consider the causalARMA(p,q)process:
1 1 1 1... ...t t p t p t t q t qX X X Z Z Z
This can be written as:
t t t j t j
j
BX Z B Z Z
B
Note that the summation is only from 0 to infinity, as the process is causal.
First step is to determine the j by equating the coefficients in the equation:
t tB B Z B Z
1 1 2 11 0 1 2 11 ... ... 1 ...p q
p qB B B B B B
Then calculate the ACF by replacing theXtby their linear filter form. For h>0:
0 0
2 2
,
,
, push index back so that both term has
since , and , 0
t h t
j t h j j t j
j j
j h t j j t j t j
j h j
j h j t t t sj
h Cov X X
Cov Z Z
Cov Z Z Z
Cov Z Z Cov Z Z
This method is convenient for MA processes, since they are easily expressed as linear processes
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 36/62
36
Zhi Ying Feng
Yule-Walker Equations
Consider the causalARMA(p,q)process:
1 1 1 1... ...t t p t p t t q t qX X X Z Z Z
First, multiply both sides byXt-hand then take the covariance:
1 1 ...t h t t h t p t h t p hCov X X Cov X X Cov X X C
Which is equivalent to:
1 1 ... p hh h h p C
Where Chis the moving average components on the RHS:
1 1 ...h t h t t h t q t h t qC Cov X Z Cov X Z Cov X Z
For 0,1,2,...h p there are a total of 1p equations which can be expressed in matrix form:
11 00
1 21 11
2 1 31 2
11
1 ...0 1 ...0
1 ... 01 0 ... 11
... 02 1 ... 2
... 11 ... 0
pp
p
p
p
p pp p
p CC
p CC
p C
Cpp p C
Thus,
given Cjand ϕj, the ACF can be computed by solving the linear equation, i.e. by taking the inverse.
Forh p , we can find the ACF through:
1 1 ... p hh h h p C
To figure out C0, C1,…Cp, if ψj’s are available, then we can use:
t h j t h j j h t j
j j h
X Z Z
This gives Chas:
1 1
1 1
2 2
...
...
h t h t t q t q
j h t j t t q t q
j h
q q h
j h j j j h
j h j
C Cov X Z Z Z
Cov Z Z Z Z
Where 0 1 and 0 0
q h
j j hj
. However, if ψj’s are available, we may as well use method 1!
Therefore, this method is more convenient for AR processes
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 37/62
37
Zhi Ying Feng
Partial Autocorrelation Function
The autocorrelation function is used to determine the no. of lags for a MA(q) process, while the
partial autocorrelation function is necessary to determine the no. of lags for a AR(p) process. In
findingACF, we set up the Yule-Walker equations and converted into matrix form to solve for h .
By omitting the first equation, we can rewrite this matrix to solve for the parameters j s:
1 1
2 2
0 1 ... 1
1 0 ... 2
1 2 ... 0
1
2
p p
p
p
p
C
p
C
Cp
Or equivalently:
p p p pC
Where the red matrix is the covariance matrix Γand Chare:
1 1, , ... ,h t h t t h t q t h t qC Cov X Z Cov X Z Cov X Z
Therefore, the parameters are given by:
1p p p pC
Note that for anAR(p)process, Chare all zeroes since the noise terms are uncorrelated with the past:1
p p p
Define the vector 11 2 ...
T
h h h hh h h , then for anAR(p)process where 1
p p p
,
we have:
If h = p, then 1,...,T
p p p which implies that pp p
If h > p, then 1,..., ,0,...,0T
h p which implies 0hh .
- BecauseAR(p)process is a special case of anAR(h)process, where 0t for t > p
If h < p, then we do not know precisely what h is
Therefore, for anAR(p)process:
1 1 ...t t p t p t X X X Z
We have:
? if
if
0 if
hh p
h p
h p
h p
The h-th partial autocorrelation function is defined by:
1 if 0
if 0hh
hh
h
Where hh is the last element in the vector 1
h h
E.g., for anAR(2)process, thePACFat lag 2 is given by:
222 where
121
22
0 1 1
1 0 2
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 38/62
38
Zhi Ying Feng
Model Building
For a stationary time series, model building can be classified into 3 stages:
Model Selection
To determine the appropriate model we need to look at the sample ACF and PACF:
The sample autocovariance function is:
1
1n
ttt
x x x xn
The sample autocorrelation function is SACF:
Note that since SPACF will be calculated from SACF, both measures will have estimation error.
Once we have graphed the sample ACF and sample PACF, we first check that the sample ACF
should quickly converge to zero, which shows that the time series is stationary.
If the sample ACF decreases slowly but steadily from a value near 1, then the data need to
be differenced before fitting the model
If the sample ACF exhibits a periodic oscillation, then there may be some seasonality still.
Then we compare the sample ACF and PACF to the theoretical ACF and PACF of different
processes to see if there is a match.
For an AR(p) process:
Sample ACF shows exponential decay towards near-zero values Sample PACF shows significant values up to lagp, then near-zero values thereafter
For an MA(q) process:
Sample ACF shows significant values up to lag q, then near-zero values thereafter
Sample PACF shows exponential decay towards near-zero values
If neither of these situations occur, then consider an ARMA(p,q) process. However, the sample
ACF and PACF of an ARMA(p,q) process is very flexible, but in general the ACF and PACF are
the sum of the ACF and PACF of an AR(p) and a MA(q) process. So for anARMA(p,q)model, it
should display: ACF that decays towards zero after lagp, either direct or oscillatory
PACF that decays towards zero after lag q, either direct or oscillatory
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 39/62
39
Zhi Ying Feng
Model Selection Criteria
When deciding the number of parameters, it is a trade-off between goodness of fit and the
simplicity of the model. More parameters mean more flexibility and a better sample fit. However,
more parameters also mean that each parameter is estimated with more uncertainty, i.e. higher
standard error.
A systematic method to choosepand q, i.e. the parameters, is to minimise an information criterion.
Information criteriahave the form:
IC , , 2 log , ,p q L P n p q
Where the first term, i.e. the log-likelihood function, always decreases with the no. of parameters.
However, the second “penalty” term, in terms of the no. of observations and the parameters, always
increases with the no. of parameters. Thus the IC seeks to balance out bias and variance.
There common IC’s are:
Information Criteria Penalty TermAIC 2 1p q
BIC 1 logp q n
AICc 2 1
2
p q n
n p q
Parameter Estimation
Mean Estimation
Let tX be a weakly stationary process with meanµ. An estimator ofµis the sample mean:
1
1 n
t
t
X Xn
This estimator is unbiased:
1
1 n
t
t
E X E Xn
This estimator is also consistent:
1 1
21 1
21 1
2
1 1var cov ,
1
1
n n
t s
t s
n n
s t
n n s
s h s
n
h n
X X X
n n
t sn
hn
n hh
n
Thus, the variance 0 as n , i.e. it is consistent, if h
h
, which is true since for a
stationary time series the ACF should eventually converge to zero regardless of whether it is AR or
MA. For a more detailed proof, see ACTL2003 Proofs.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 40/62
40
Zhi Ying Feng
Parameter Estimation
If the sufficient number of sample ACF is known, then one way to estimate the parameters in a
model, i.e. θ, ϕ, σ2,is by equating them to the theoretical ACF derived from the Yule-Walker
equations. Use this to set up equations in terms of the parameters and solve. If there are more than
one solution, choose the solution that makes the model causal and/or invertible.
Another method is to use maximum likelihood estimation. Suppose we have a set of errors that are
assumed to be normal, then theXtthemselves are also normal so:
1,..., ~ 0,T
n n nX X N
X
Where:
βis a fixed set of parameters in the model, e.g. 2,T
for MA(1) model
n is the symmetric covariance matrix of Xn, expressed in terms of the parameters
0 1 2 ... 1
1 0 1 ... 2
2 1 0 ... 3
1 2 3 ... 0
n
n
n
n
n n n
Assuming that the observations follow a multivariate normal distribution, the likelihood function is:
1
1 22
1 1exp
22 det
Tn n nn
n
L
X X
The maximum likelihood estimator is the value that maximises the likelihood functionL(β).
Under the normality assumption, we have the asymptotic distribution of the MLE:
~ , var A
N
Where the variance can be estimated by:
12 ln
varT
L
This result can be used to compute confidence intervals for the parameters and for hypothesis
testing about the parameters, e.g. whether to include certain parameters.
Model Diagnosis
The residuals of the proposed model are:
t ttZ X X
Wherêare the fitted values computed using the estimated parameters. If the proposed model is a
good approximation to the underlying time series process, then the residuals should be
approximately a white noise process. There are several methods to check this:
Plot of Residuals
If the plot ofZtagainst tshows any trend or patterns in fluctuations, then the model is inadequate.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 41/62
41
Zhi Ying Feng
SACF of Residuals
If ~ WN 0,1tZ , then it can be shown that the ACF ofZthas the distribution:
1
~ 0,Z h Nn
Therefore, at 95% significance, the sample ACF should be should be within the range:
1.960
n
Since for a white noise process, 0 for 1h h . If too many values of SACF lie outside this
range, then the model does not fit the process well and more parameters will be needed.
Ljung-Box Test
We can test the null hypothesis that ~ WN 0,1tZ using the Ljung-Box test statistic. This tests
whether jointly, all correlations at lags greater than zero are zero. Under the null hypothesis, for
large n, the Ljung-Box test statistic is:
2
2
1
2 ~h
Zh p q
j
jQ n n
n j
Where nis the no. of time series observations. In practice, his chosen to be between 15 and 30, and
nshould be large, i.e. 100n .
Using the Ljung-Box test, we reject the null hypothesis, i.e. the residual is not white noise, if:
2, 1h p qQ
Non-Stationarity
In practice, a non-stationary time series may exhibit a non-stationary level of mean, variance or
both. Transformations can be used to remove non-stationarity, e.g. taking logarithm of an
exponential trend can remove non-stationary mean, or it can smooth out the variance
Stochastic Trends
Apart from a deterministic trend or seasonality, a stochastic trend also causes non-stationarity. A
stochastic trend is when the noise terms have a permanent effect on the process. Consider the
random walk rewritten iteratively
21 0
1
where ~ WN 0,t
t t t t j t
j
X X Z X X Z Z
In this case, the effect of anyZtonXt+his the SAME for all 0h , since 1 . This is not true for
stationary processes like AR(1) or ARMA(1,1), where 1 , since depending on h,Ztwill have
different level of impact since the coefficient j changes, e.g.:
1
jt t j t j
X Z Z
Since noise terms have a lasting impact, the correlation betweenXtandXt-his relatively high, so a
distinctive feature of a random walk is a very slowly decaying positive ACF. Note that differencing
the random walk once obtains a stationary series!
11t t t t t Y B X X X Z
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 42/62
42
Zhi Ying Feng
ARIMA Model
The process tY is an autoregressive integrated moving average modelARIMA(p,d,q)with order of
integration dif:
1 d
t tB B Y B Z
That is tY becomes a stationaryARMA(p,q)process after differencing dtimes, i.e:
1 ~ ( , )d
t tW B Y ARMA p q
t tB W B Z
E.g. consider the process defined by:
1 2 3 10.6 0.3 0.1 0.25t t t t t t X X X X Z Z
This process can be re-written as:
1 2 3 1
2 3
2
1
0.6 0.3 0.1 0.25
1 0.6 0.3 0.1 1 0.25
1 1 0.4 0.1 1 0.25
d
t t t t t t
t t
t t
BB B
X X X X Z Z
B B B X B Z
B B B X B Z
Therefore, this is a ARIMA(2,1,1) process.
SARIMA Model
Suppose that tX exhibits a stochastic seasonal trend, i.e. whereXtnot only depends on
1 2, ,...t tX X but also 2, ,...t s t sX X .
To model this, we can use a SARIMA , , , ,s
p d q P D Q process, given by:
1 11 1 ... 1 1 ...P QDd s s s s s
P t Q t
AR p MA qI d I DAR P MA Q
B B B B B X B B B Z
Where AR(P), MA(Q) and I(D) are polynomials with the term sB .
E,g, consider an 12
SARIMA 1,0,0 0,1,1 process given by:
12 12
1 1 1
1 1 1t t
AR I MA
B B X B Z
This can be written as:
12 1312
12 1 13 12
1 t t t
t t t t t t
B B B X Z Z
X X X X Z Z
Note thatXtdepends onXt-12,Xt-1,Xt-13as well asZt-12.
SARIMA models can always be rewritten as an ARIMA model, with some constraints on the new
parameters. However, converting to ARIMA always requires more parameters than the SARIMAmodel, which leads to better in-the-sample fit, but worse out-of-sample fit, i.e. predictions. Thus,
when forecasting it is better to use SARIMA models than converting it to ARIMA models.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 43/62
43
Zhi Ying Feng
Dickey-Fuller Test
The Dickey-Fuller test is a unit root test, i.e. it tests whether there is a unit root in the time series.
Note that if the polynomial z has a unit root, then the time series is not stationary and requires
differencing.
Consider a causal time series process tX with:
21 ~ WN 0,t t tX t X Z Z
To test for stochastic trend, we test the hypothesis:
0 1: 1 against : 1H H
Note that if 1 then there is a unit root, which leads to a stochastic trend so Xtis not stationary.
We can write the above model as:
1 1
* *1
1
where 1
t t t t t
t t t
X X X Z
X t X Z
Then the alternative and equivalent null hypothesis is:
*0: 0H
This is known as the Dickey-Fuller Test, and the test statistic is:
*
*se
Once the parameters *, , have been estimated, reject the null hypothesis * 0 if is LESSthan
the critical value, which will be negative since * is negative. Note that rejection of the null
hypothesis implies that the time series is stationary and accepting the null hypothesis implies that
differencing is required.
The distribution of this test statistic is a non-standard distribution depending on α andβ, with
asymptotic percentiles.
Probability to the left 0.01 0.05 0.10
Standard Normal -2.33 -1.65 -1.28
DF with 0, 0 -2.58 -1.95 -1.62
DF with 0 -3.43 -2.86 -2.57DF (unconstrained) -3.96 -3.41 -3.12
Note that the DF distributions are much more spread out than the standard normal. When choosing
whether or not to do DF with 0, 0 , there is a trade-off:
If α or β is set to 0 when in fact the true values are nonzero, the test becomes inconsistent
and asymptotic critical values are no longer valid. Decisions based on the test are likely to
be wrong, i.e. it might confuse deterministic and stochastic trends
However, allowing α or β to be non-zero reduces the power of the test, i.e. harder to detect a
false null hypothesis
How to determine α and β usually depends on what type of series we have. E.g. if a linear trend
exists then we expect the difference to only have a constant, so 0, 0
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 44/62
44
Zhi Ying Feng
Overdifferencing
Let Utbe anARMA(p,q)process:
t tB U B Z
Define 1 d
t tV B U , i.e. ndifferences. Then we have:
1 1d dt t t t BV B Z B V B B Z
B
Therefore, Vtbecomes anARMA(p,q+d)process since 1 d
B B is a polynomial with order q+d.
However, this MA polynomial has a unit root so the process Vtis not invertible. Therefore, we
should avoid overdifferencing as it will give us a non-invertible process, even though it’s still
stationary.
Cointegrated Time Series
Many time series in finance and economics are non-stationary (random walks), e.g. CPI and GDP,
but at the same time do not move too far apart from each other. Cointegration is used to model non-stationary series that move together
For a bivariate process ,1 ,2, T
t t tX X X , we define:
,1
,2
dtd
t dt
XX
X
A bivariate process ,1 ,2, T
t t tX X X is integrated of order d, i.e.I(d), if dtX is stationary but
1d
tX
is not.
AnI(d)bivariate process is cointegrated if there is a cointegrating factor 1 2, T
such that
TtX becomes stationary, i.e.:
1 ,1 2 ,2 ~ 0t tX X I
If ,1tX and ,2tX are cointegrated, then:
,1tX and ,2tX are bothI(1)
t t te Y X isI(0)
Cointegration implies that there is a common stochastic trend between ,1tX and ,2tX
In many financial applications, the cointegrating vector is in the form:
1, Ta
That is,Xt,1andXt,2are random walks themselves but the difference ,1 ,2t tX aX is stationary. The a
term can be estimated by using regressing:
,1 ,2t t tX aX
Then we expect in the long run that the two processes converge to:
,1 ,2t tX aX
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 45/62
45
Zhi Ying Feng
Time Series Forecasting
Time Series and Markov Property
Recall that a process , 0,1,...tX t has the Markov property if all future states of the process
depend on its present state alone and not on any of its past states:
1 1Pr | ,... , Pr |nt s s n s t sX A X x X x X x X A X x
For all times 1 2 ... ns s s s t and all states 1 2, ,..., ,nx x x x S and all subsetsAof S
AR Processes
An AR(1) process has the Markov property, since the conditional distribution of 1nX given all
previous tX depends only on nX . However, an AR(2) process does not have the Markov property,
since the conditional distribution of 1nX given all previous tX depends on both nX and 1nX . Thus in
general, AR(p) processes do not have the Markov property forpgreater than 1.
However, for an AR(2) process if we define a vector-valued process Yby 1, T
t t tY X X , then Y
has the Markov property since the conditional distribution of 1nY given all previous tY depends only
on nY . In general, for an AR(p) process we can define a vector-valued process withpelements that
will have the Markov property.
MA Processes
A MA(q) process can never have the Markov property, even in vector form, since the distribution of
1nX depends on the value of nZ and in theory no knowledge of the value of nX or any finite
collection of 1,...,T
n n qX X will never be enough to deduce the value of nZ .However in practice,
we can estimate nZ very accurately so Markov simulation techniques still applies.
ARIMA Processes
For an ARIMA(p,d,q) process, if q is zero, i.e. no moving average component, then the process
behaves similar to AR(p) in terms of Markov property, i.e. it might not be Markov but an
appropriate vector form can be Markov.
If d is also zero and p is greater than 1 then it is essentially an AR(p) process. If both p and d are
equal to 1, then the model can be written as:
1
21 1
1 1 1 2
1 1
1
1
t t
t t
t t t t
B B X Z
B B B X Z
X X X Z
This is clearly not Markov. However, it can still be written as a vector-valued process that has the
Markov property. In general, the vector process needsp+dterms to be Markov
If q is not equal to zero, i.e. it has a moving average part, then it will never be Markov for the same
reason that MA(q) is never Markov.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 46/62
46
Zhi Ying Feng
k-step Ahead Predictor
Assume that we have the following information:
(1)
All observations for tX up until time n: 1,..., nx x
(2)
An ARMA model has been fitted to the data
(3)All parameters of the model 2, , have been estimated
(4)The process tZ is known for all up until time n
k-step ahead forecasts |n k nX is one method to forecast/predict n kX by using the given
observations up until time n:
| 1| ,...,n k n n k nX X X X
This is obtained by:
Replacing the random variables 1,..., nX X by their observed values 1,..., nx x
Replacing the random variables 1 1,...,n n kX X by their forecast values |n k nX
Replacing the random variables 1 1,...,n n kZ Z by their expectations, i.e. 0
If 1,..., nZ Z are unknown, replacing 1,..., nZ Z by the residuals 1,..., nZ Z where
1| ,...,i i nZ E Z X X
To forecast |1n kX we have:
1| 1 1
1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1
| ,...,
... ... | ,...,| ,..., ... | ,..., | ,..., ... | ,...,
... ...
n n n n
n p n p n n q n q n
n n p n p n n n q n q n
n p n p n q n
X X X X
X X Z Z Z X XX X X X X X Z X X Z X X
X X Z Z
q
To forecast |2n kX we have:
2| 2 1
1 1 2 2 1 1 2 2 1
1 1 1 2 1 1 1 1 2 1
1|1 2
| ,...,
... ... | ,...,
| ,..., ... | ,..., | ,..., ... | ,...,
...
n n n n
n p n p n n n q n q n
n n p n p n n n q n q n
n np n
X X X X
X X Z Z Z Z X X
E X X X X X X Z X X Z X X
X X
2 2...p n q n qZ Z
In practice we do not observe tZ so if there are MA terms in the model, then there are more values
of tZ than tX and there is no way of determining all of them from data. Consider the MA(1) process:
1 10.5 0.5t t t t t t X Z Z Z X Z
Then by repeated substitution we have:
1
0.5 0.5n
j n
n n j
j
Z X Z
To determine nZ we need 0Z first, one simple way is to assume that 0 0Z . If the process is invertible,
then this assumption will have negligible effect on |n k nX if nis large, since 0.5 n
will be large.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 47/62
47
Zhi Ying Feng
Best Linear Predictor
Under assumptions (1), (2) & (3), the best linear predictor n n hP X of n hX has the form:
0 1 2 1 1...n n h n n nP X a a X a X a X
We need the values of 0 ,..., na a that minimises the mean squared error:
2
n h n n hMSE X P X
The general solution is found by minimising the n+1first-order conditions:
1
1
0 0
0 0
0 0
n h n n h
n h n n h n
n n h n n h
MSE a X P X
MSE a X P X X
MSE a X P X X
Note that due to the very first condition, the expected prediction error is:
0n h n n hX P X
i.e. the prediction is unbiased.
The first equation can be simplified to:
0 1 2 ... 0na a a a
While the second can be simplified using a trick:
1 2
,
0 1 ... 1
n h n n h n n h n n h n
n h n n h n n h n n h
n h n n h n
n
X P X X X P X X
X P X X X P X
Cov X P X X
h a a a n
Then applying this trick to every subsequent MSE minimising condition, we end up with a system
of nequations (excluding the first one) that is very similar to the process of finding PACF
coefficients. This system can be represented in matrix form:
n n nh a
Where:
0 1 ... 1
1 0 ... 2
1 2 ... 0
n
n
n
n n
1
2
n
n
a
aa
a
1
1
n
h
hh
h n
Therefore, the solution is:
1n n na h
Once this is known, the a0can be found by rewriting the first equation in matrix form:
0 1 na T
1 a
Where T1 represents a column vector of 1’s.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 48/62
48
Zhi Ying Feng
Part 3: Brownian Motion
Definitions
A stochastic process , 0tX t is a Brownian motion process if:
0 0X (1)
, 0tX t has stationary and independent increments (2)
For all 0t , 2~ 0,tX N t (3)
The conditions (2) and (3) together are equivalent to:
2~ 0, for 0tX N t t
, 0tX t has stationary increments and cov , 0s t sX X X for0 s t since if two
normal r.v. have 0 covariance then they are independent’
A standard Brownian motion , 0tB t is when the volatility is one, i.e.: 1 . Any Brownian
motion , 0tX t can be standardised by setting:
tt
XB
Note that Brownian motion is continuous for all values of t.
A Brownian motion process exhibits some strange behaviour:
A Brownian motion is continuous w.r.t. time teverywhere, but differentiable nowhere
Brownian motion will eventually hit any and every real value no matter how large, or how
negative. Likewise, no matter how large, it will always (with probability 1) come back down
to zero at some future time
Once a Brownian motion hits a certain value, it immediately hits it again infinitely often,
and will continue to return after arbitrarily large times
Brownian motion is fractal, i.e. it looks the same regardless of what scale you examine it
Properties of Brownian Motion
Consider a Brownian motion , 0tX t with volatility :
For anyswith 0 s t ,XsandXtare NOTindependent. However, by independence of
increments,
ands t sX X X
are independent.
This can be shown by finding the covariance between anyXsandXt
2
cov , cov ,
cov , cov ,
s t s s t s
s s s t s
X X X X X X
X X X X X
s
For anysand t, sX and t sX X are both normally distributed, i.e.:
2
2
~ 0,
~ 0,
s
t s
X N s
X X N t s
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 49/62
49
Zhi Ying Feng
Brownian Motion and Symmetric Random Walk
Define a symmetric random walk using random variable:
1 w.p. 0.5
1 w.p. 0.5iY
With mean 0iE Y and var 1iY
Now divide the interval 0,t into equal intervals of length t so that there is a total of tt
n
intervals. Suppose that at each t , the process can either go up or down by size x . Now letX(t)be
the position at time t, then:
1 2 ... tt
X t x Y Y Y
Then, if we let x t and let 0t then the limiting process ofX(t)is a Brownian motion.
This is because: , 0tX t has independent increments, since Y’sare independent (changes in the value of
the random walk in non-overlapping time intervals are independent)
, 0tX t has stationary increments, since the distribution of the change in position of the
random walk over any time interval depends only on the length of the interval
, 0tX t has approximate normal distribution with mean 0 and variance 2t as t
converges to 0, by the Central Limit Theorem on i.i.d. random variables
Brownian Motion with Drift
A stochastic process , 0tX t is a Brownian motion process with drift coefficient and variance
parameter 2
if it satisfies:
0 0X
, 0tX t has stationary and independent increments
For all 0t , 2~ ,tX N t t
A Brownian motion with drift can be converted to a standard Brownian motion by defining:
tt
X tB
Similarly, a standard Brownian motion can be converted to a Brownian motion with drift by
defining:
t tX t B
Geometric Brownian Motion
Let , 0tX t be a Brownian motion process with drift and volatility 2
, then the process
, 0tY t defined by:
expt tY X
is called a geometric Brownian motion. This is useful when you don’t want negative values
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 50/62
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 51/62
51
Zhi Ying Feng
Now consider the behaviour of ort tdt dB dB dt :
222
2 2
var
t t t t
t t t t t t t
t B t B B
t B t B B t B B
t t
o t
Then in differential form:
0 var t tdt dB dt dB o dt
Taking o dt as zero, this is again reduced to a constant:
0tdt dB
Finally, consider 2
dt :
2
0dt o dt
The following table summarises the results:
dt tdB
dt 0 0
tdB 0 dt
Stochastic Differential Equations
Assume that a stochastic processXtsatisfies a stochastic differential equation (SDE) in the form:
, ,
t t t t
dX X t dt X t dB
Consider another stochastic process defined as
,t tY F X t
Then the stochastic differential equation satisfied by Ytis given by the Ito’s formula:
22
2
, , ,1,
2t t tt x X t x X t x X
F x t F x t F x tdY dX dt X t dt
x t x
Proof:
The derivative of the second order Taylor series expansion for a function of two variables is:
2 21
, 22x y xx yy xydf x y f dx f dy f dx f dy f dxdy
Applying this to Ytgives:
2 2 22 2
2 2
, ,
, , ,12
2
t t
t t t
t t
x X x X
t t
x X x X x X
F x t F x tdY dX dt
x t
F x t F x t F x tdX dt dX dt
x tx t
However, we know from the previous section that 2
0tdt dB dt , so:
, ,
t t t t dt dX dt X t dt X t dB
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 52/62
52
Zhi Ying Feng
And:
22
222 2
22
2
, ,
, , 2 , ,
,
,
t t t t
t t t t t t
t t
t
dX X t dt X t dB
X t dt X t dB X t X t dB dt
X t dB
X t dt
Therefore, the expansion reduces to the Ito’s formula
E.g. consider a Brownian motion , 0tX t with 0 drift and variance 2 , find the SDE for 2
ttX
We have the SDE forXt:
t tdX dB
Let 2 ,t t tY tX F X t where 2,F x t tx so the derivatives are:
2
22
, , ,2 2
F x t F x t F x ttx x t
x t x
Then using Ito’s formula, the SDE for 2ttX is given by:
2 2 2
2 2
1, 2 2
2
2
t t t t t
t t t
d tX dF X t tX dX X dt t dt
tX dB X t dt
Stochastic Integration
Consider a Brownian motion , 0tX t with zero drift and variance 2 . Letfbe a function with
continuous derivative on [a,b]. The stochastic process Ytdefined by the stochastic/Ito’sintegral is:
1
1
1
1max 0
limk k
k k
nb
t t k t t a n
kt t
Y f t dX f t X X
Where 0 1 ... na t t t b is a partition of[a,b].
This stochastic process Ytsatisfies the properties:
Mean of the Ito integral is zero, i.e.
1
1
11
max 0lim 0k k
k k
nb
t k t t a nk
t tf t dX f t X X
- Note that functions of tin this case includes stochastic variables, e.g. 0nt tB dB
Thus, the variance and expectation squared are equal:
1
1
1
22
1
1max 0
2 21 1
1max 0
2 2
var lim var
lim
k k
k k
k k
nb b
t t k t t a a n
kt t
n
k k kn
kt t
b
a
f t dX f t dX f t X X
f t t t
f t dt
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 53/62
53
Zhi Ying Feng
Part 4: Simulation
Continuous Random Variables Simulation
When simulating continuous random variables we have the probability/cumulative distribution of
the random variable. The cumulative probabilities will always have a Uniform(0,1) distribution.
Since if ~ U 0,1X then 1
F X
is distributed asF. This means that the probability that aprobability is in [a,b]is the same as the probability that a probability is in [c,d]. Therefore, the first
step in simulation is usually to generate random variables from a Uniform(0,1).
Pseudo-Random Numbers
The procedure to generate pseudo-random numbers, i.e. from a Uniform(0,1), is:
1. Start with a seedX0and specify positive integers a, cand m, which are usually given
2. Generate pseudo-random numbers recursively using:
1 modn nX aX c m
3.
1nXm
will be an approximation to a Uniform(0,1) random variable
Inverse Transform Method
Consider a continuous random variable with cumulative distribution functionFXand a Uniform(0,1)
random variable U. Define:
1XX F U
ThenXwill have the distribution function:
1
Pr
Pr
Pr since is strictly increasing
using
X
X
X X
U X
X U
F x X x
F U x
U F x F
F F x
F x F y y
Then the inverse transform procedure to generate r.v. from a cumulative distributionFis:
1.
Compute 1XF , from the p.d.f or c.d.f, if possible
2. Generate a Uniform(0,1) random variable U
3.
Set 1
XX F U
, thenXwill be from the distributionF
Note that for the inverse transform method to work, we must be able to calculate the inverse of the
c.d.f., i.e. 1XF must have an explicit expression.
E.g. to simulate an exponential random variable, first find 1XF :
1
1exp 1 exp ln 1
x
X xF x s ds x F x y
Suppose we generate a random variableUfrom Uniform(0,1). Then we set:
1 1 1
ln 1 lnx F U U U
Thereforexis a random variable with an exponential distribution.
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 54/62
54
Zhi Ying Feng
Discrete Random Variable Simulation
The inverse transform method can also be applied to simulate discrete random variables. Consider
one with probability mass function:
Pr , for 1, 2,...
1
j j
jj
X x P j
P
Then the procedure to simulate r.v. from this p.m.f. is:
1. Generate a Uniform(0,1) random variable U
2. Set:
1 1
2 1 1 2
1
1 1
if
if
if j j
j i ii i
x U P
x P U P PX
x P U P
E.g. simulate random variables from a geometric distribution.
Consider the probability mass function for a geometric random variable:
1
Pr 1 1j
jP X j p p j
Notice that:
1
1
1
1
1
1
1 Pr 1
1 1
11 geometric sum
1 1
1 1
j
i
i
i
j
j
j
P X j
p p
p p
p
p
Then we have:
1
1 1
1
1
1 1 1 1
1 1 1
j j
i ii i
j j
j j
P U P
p U p
p U p
Since Uis a Uniform(0,1) random variable, this is equivalent to:
11 1
ln 1 ln 1 ln 1
ln1 since ln 1 0
ln 1
j jp U p
j p U j p
Uj j p
p
Therefore to generate a random variable from a geometric distribution:
1. Generate a uniform random number U
2.
SetXasj, wherejis the first integer for which:
ln
ln 1
Uj
p
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 55/62
55
Zhi Ying Feng
Acceptance-Rejection Method
Suppose we have a method, e.g. inverse transform, to simulate an r.v. with densityg(y). Then we
can use this as the basis for simulating from the continuous distribution with density f(x).
The procedure to simulate random variables using the rejection method is:
1.
Choose a distributiongfor which you know you can simulate outcomes2.
Set cbe some constant such that c f y g y for ally
3. Simulate a random variable Ywith density functiong(y)
4. Simulate a Uniform(0,1) random variable U
5. Accept this as the random number, i.e. setX= Y, if:
f YU
c g Y
6. Otherwise, reject and return to step 3.
Therefore, the value forXis YNwhereNis the number of iterations until a random number is
accepted. We want to be as efficient as possible, i.e. minimise the no. of iterations, by: For efficiency, choose a densityg(y)similar tof(y), e.g. exponential to normal
For efficiency, choose the smallest value of cthat satisfies the inequality, using calculus, i.e.:
maxc f y g y
E.g. simulate the absolute value of a standard normal random variable X Z
Firstly, X Z has the density:
22
exp 22
x
f x
1. Let another random variable Ybe from the density expg x x , note that this is
comparable tof(x)
2. Choose the smallest value of csuch that c f y g y , e.g.:
212 2
max max exp2
f x
g x
xe ec
Since the exponential part will always be less than 1
3.
Generate U1and U2from Unif(0,1)4. Compute Yfrom the densitygusing inverse transform, and using U1, i.e.:
1logY U
5. Now check if:
22
2 1
1 1exp 1 exp log 1
2 2
f YU Y U
cg Y
6. If true, then set 1logX Y U , if false then return to step 3 and repeat.
7.
To now generate a standard normal random variableZ, generate U3and set:
3
3
if 0.5
if 0.5
X UZX U
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 56/62
56
Zhi Ying Feng
Simulation Using Distributional Relationships
Gamma Distribution
Recall that the waiting time to the nth event in continuous Markov chains has a Gamma distribution.
To simulate random variables from Gamma ,n distribution, the procedure is:
1.
Generate nindependent Uniform(0,1) random variables 1 2, , ..., nU U U
2.
Simulate an exponential random variable using the inverse transform method
1 1lnF U U
3. Since the sum of nindependent Exp(λ) random variables has a Gamma distribution, set:
1
1ln ~ Gamma ,
n
i
i
X U n
Chi-Squared Distribution
The sum of nstandard normal r.v. has a chi-squared distribution with ndegrees of freedom:
2 2
1 ~
n
i ni Z
Alternatively, a chi-squared distributionwith an even degree of freedom 2kis equivalent to a
Gamma , 1 2k distribution. If the degree of freedom is odd, i.e. 2k+1we can add on an extraZ2
term, whereZis standard normal. That is:
21
22 11
2 ln ~
2 ln ~
k
i ki
k
i ki
U
Z U
Poisson Distribution
Recall that the no. of events within one period where events follow an Exp(λ) distribution is Poi(λ).
To simulate random variables from a Poisson(λ) distribution, the procedure is:
1. Generate a Gamma ,n random variable using the above steps
2. Since the waiting time until the nth event has a Gamma ,n distribution, which is a sum of
independent Exp(λ) random variables, we set:
1
1max : ln 1 ~ Poi
n
i
i
X n U
i.e. the total number of arrivals within 1 period.
Normal Distribution
One method of simulation random variables from a normal distribution is the Box-Muller approach:
1.
Generate two random variables U1and U2from Uniform(0,1).
2.
Set:
1
21 2
1
21 2
2 ln cos 2
2 ln sin 2
X U U
Y U U
ThenXand Yare a pair of independent standard normal random variables
3.
A pair of uncorrelated normal random variables can be then derived using:andX Y
4. Note that ifXis normal then to generate a lognormalrandom variableLwe set XL e
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 57/62
57
Zhi Ying Feng
Monte Carlo Simulation
Suppose that 1,..., nX X X is a random vector with a given joint density function 1,..., nf x x and
we want to compute:
1 1 1... ,..., ,..., ...n n ng X g x x f x x dx dx
But with a functiongfor which this is impossible to compute. Then we can approximate thisintegral by using Monte Carlo simulation.
Let i
X and i
Y be the ithsimulated sample path ofX andY . Then the MC simulation procedure is:
1. Generate a random vector 1 1 1
1 ,..., nX X X
with joint density 1,..., nf x x and compute
1 1Y g X
2. Generate second, independent from step 1, random vector 2 2 2
1 ,..., nX X X
with joint
density 1,..., nf x x and compute 2 2Y g X
3.
Repeat this process until r(a fixed number) i.i.d. random vectors are generated:
, for 1,2,...,i i
Y g X i r
4. Estimate E g X by using the arithmetic average of the generated Y’s
1
... r
Y YY g X
r
This method works due to the strong law of large numbers:
1 ...lim
ri
r
Y YY g X
r
Expectation and Variance
Y is an unbiased estimate of g X :
1
1 1
1 1,...,
r ri i
n
i i
Y Y Y g X X r r
Since each Yiis independent and identically distributed with the distribution of 1,..., ng X X
The variance ofY is given by:
21 1
var1 1var var var
ir r
i i
i i
YY Y Y
r rr
Note that usually we do not know var
iY , so we estimate it using the sample estimate:
2
1
1var
1
ri
i
Y Y Yr
In the next 3 sections, we will describe some techniques to REDUCEthis variance
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 58/62
58
Zhi Ying Feng
Antithetic Variables
Reducing the variance of the estimate using antithetic variable involves generating estimates with
negative correlationthen adding these estimates to obtain a final estimate.
Assume that kis even and that 2r . The antithetic variates procedure is:
1.
Generate a set of nvariates 1 11 ,..., nX X and determine
1 1 11 ,..., nY g X X
2. Generate a set of nvariates 1 1
1 ,..., nX X
, which are correlated with 1 1
1 ,..., nX X and
determine 1 1 1
1 ,..., nY g X X
3. Repeat steps 1 and 2 times to form 21 , ,...,Y Y Y
and
1 2, ,...,
rY Y Y
4. Calculate the arithmetic average of the Y’s:
1 2
1 1
1 1and
ri i
i i
Y Y Y Y
5.
Use:
1 2
2
Y YY
as the final estimate for Y g X
Using this method, as long as the correlation between 1Y and 2Y are negative, then the variance will
be reduced. One example of this is by using:
X ~ Uniform(0,1) for the first set of nvariates, e.g. 1Y
1 –X. which is also Uniform(0,1) for the second set of nvariates, e.g. 2Y
Then as long asg(X)is a monotonic increasing/decreasing function, we will have a negative
correlation:
cov ,1 1X X
To show why this method reduces the variance, consider the variance of the estimator using the
antithetic variable
1 2
1 2 1 2
1 1 2
var var 2
1
var var 2 var var 4
1var 1 since var var
2
var11 using 1 but with only the first set up to
2
var1
var
var for 0
AV
i
i
i
Y YY
Y Y Y Y
Y Y Y
Y
Y
r
Y
Yr
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 59/62
59
Zhi Ying Feng
Control Variates
If we wish to evaluate the expected value g X and there is a functionfsuch that the expected
value off(X)can be evaluated analytically with:
f X
Then we can evaluate g X using:
1
g X g X a f X
g X a f X
a g X af X
Where ais a parameter we can choose.
Therefore, instead of evaluating g X directly, we evaluate (1) using a control variates:
1
1
cv
r
i i
i
g X Y
a g X af X
a g X af X r
Then, this estimator will have the variance:
21
2
1var var
var
var var 2 ,
r
cv i i
i
i i
i i i i
Y g X af X r
g X af Xr
g X a f X aCov g X f X
r r r
This variance can be minimised by solving:
var0
cvY
a
With the solution being the best choice of a:
,
vari i
i
Cov g X f X a
f X
Substituting this value back, we have that the minimised variance is:
2
,varvar
var
var
var
i iicv
i
i
Cov g X f X g XY
r r f X
g X
r
Y
Therefore, using a control variates has decreased the variance compared to the original estimator
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 60/62
60
Zhi Ying Feng
Importance Sampling
We can write:
Xg X g x f x dx
WherefXis the density ofX.
Now consider another equivalent probability density tof(x)denoted by h(x)such that the zero
probability events agree under both densities. E.g. 0f x if and only if 0h x . Let Ybe the
random variable with density function h(x).
Then we can write:
X
Y
f xg X g x h x dx
h x
f Yg Y
h Y
Then, we can simulate Yifrom density h(x)and estimate the expectation X g X with:
1
1 ri
h i
i i
f YY g Y
r h Y
If it is possible to select a probability density h(x)so that the random variable
f Xg X
h X
has a smaller variance then the estimator will be more efficient, i.e.:
var var
f Xg X g X
h X
To do this, we need to select a density h(x)such that the ratio of the two densities, i.e. f x h x is
large wheng(x)is small and vice versa.
Number of Simulations
Ideally, we want to carry out simulations as efficiently as possible since the no. of simulations
required for accuracy is quite large. Assume we generate nsamples from a known distribution. Toestimate its mean we will use the sample average as an estimator:
1
1 ri
i
Y Yr
This is an unbiased estimator, i.e.:
Y
Its variance is given by:
2varvar
iY
Yr r
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 61/62
61
Zhi Ying Feng
However, the value of var
iY is usually not known, so we estimate it using the sample variance
from the first kruns where k < nand usually at least 30. Then the estimate of the mean and variance
becomes:
2
1 1
1 1var
1
k ki i ik
i iY Y Y Y Y
k k
For large values of n, we know that by the Central Limit Theorem this estimator will be
approximately normal:
2
~ ,Y Nr
Then, we can select nsuch that the estimate is within a desired accuracy of the true mean, i.e. a
percentage of the true mean with a specified probability.
E.g. random variates are generated from a Gamma distribution:
1 for 0xf x x e x
Twenty values are generated for each sample and the mean and standard deviation of each sample
are given as:
Sample 1 2 3 4 5 6 7 8 9 10
Mean 12.01 11.79 13.43 14.01 11.44 11.19 11.24 12.42 12.91 12.29
SD 6.15 4.73 6.42 7.02 4.30 3.90 3.84 4.35 3.59 4.60
Determine the no. of simulated values required for the estimate of the mean to be to be within 5% of
the true mean with 95% certainty.
We require nsuch that:
Pr 0.05 0.95
0.05Pr 0.95
0.05Pr 0.95
0.05 0.05Pr 0.95
0.052 Pr 1 0.95
0.051.96
X
X
n n
Zn
Zn n
Zn
n
We do not know the true value forand , so we must use the sample estimates. The sample
estimate for the mean is given by averaging the mean of each sample:10 20 10
1 1 1
1 1 112.273
20 10 10iij
i j i
X X X
7/23/2019 ACTL2003 Summary Notes
http://slidepdf.com/reader/full/actl2003-summary-notes 62/62
To estimate the sample variance, use:
2
2
1
22
1 1 1
22
1
1
1
12
1
1
1
n
i i
i
n n n
i i
i i i
n
i
i
s X Xn
X X y Xn
X n Xn
So the estimated variance is given by:
10 20
22 2
1 1
110 20
10 20 1 ij
i j
x X
Where we have:
10 20 10
22 2
1 1 1
10 1022
1 1
20 1 20
19 20
4790.0596 30285.702
35075.7616
iij i
i j i
ii
i i
x s X
s X
Therefore the estimated variance and standard deviation is:
2 2135075.7616 200 12.273
199
24.876664.98765
Substituting back into the previous equation, we have:
0.051.96
0.05 12.2731.96
4.98765
254
n
n
n
We can also estimate the parameters of the Gamma distribution, since we know that:
2
varX X
Using our estimated values we have:
12.273
24 87666