ACTL2003 Summary Notes - [PDF Document] (2024)

ACTL2003 Summary Notes - [PDF Document] (1)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 1/62

1

Zhi Ying Feng

ACTL2003 Stochastic Modelling

By Zhi Ying Feng

ContentsPart 1: Stochastic Processes ................................................................................................................. 4

Stochastic Processes ......................................................................................................................... 4

Increment Properties .................................................................................................................... 4

Markov Processes ............................................................................................................................ 4

Transition Probability .................................................................................................................. 5

Chapman-Kolmogorov Equations................................................................................................ 5

Classification of States ................................................................................................................. 6

Irreducible Markov Chain ............................................................................................................ 6

Recurrent and Transient States .................................................................................................... 6

Class Properties ............................................................................................................................ 8

Limiting Probabilities .................................................................................................................. 8

Mean Time in Transient States .................................................................................................... 9

Branching Processes .................................................................................................................... 9

Probability Generating Functions .............................................................................................. 10

Time Reversible Markov Chains ............................................................................................... 11

Exponential, Poisson and Gamma Distributions........................................................................ 11

Counting Process............................................................................................................................ 12

Poisson Process .......................................................................................................................... 13

Inter-arrival and Waiting Time .................................................................................................. 13

Order Statistics ........................................................................................................................... 13

Conditional Distribution of Arrival Time .................................................................................. 14

Thinning of Poisson Process ...................................................................................................... 14

Non-hom*ogenous Poisson Process ............................................................................................ 15

Compound Poisson Process ....................................................................................................... 15

Continuous-time Markov Chains ................................................................................................... 16

Time Spent in a State ................................................................................................................. 16

Transition Rates and Probabilities ............................................................................................. 17

Chapman-Kolmogorov Equations.............................................................................................. 18

Limiting Probabilities ................................................................................................................ 19

Embedded Markov Chain .......................................................................................................... 19

ACTL2003 Summary Notes - [PDF Document] (2)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 2/62

2

Zhi Ying Feng

Time Reversibility...................................................................................................................... 20

Birth and Death Process ................................................................................................................. 21

Transition Rates and Embedded Markov Chain ........................................................................ 21

Examples of Birth and Death Processes .................................................................................... 21

Expected Time in States ............................................................................................................. 22

Kolmogorov Equations .............................................................................................................. 23

Limiting Probabilities ................................................................................................................ 23

Application: Pure Birth Process ................................................................................................. 24

Application: Simple Sickness Model ......................................................................................... 24

Occupancy Probabilities and Time ............................................................................................ 25

First Holding Time ..................................................................................................................... 25

Non-hom*ogenous Markov Jump Processes ................................................................................... 26

Residual Holding Time .............................................................................................................. 27

Current Holding Time ................................................................................................................ 27

Part 2: Time Series ............................................................................................................................. 28

Introduction to Time Series ............................................................................................................ 28

Classical Decomposition Model ................................................................................................ 28

Moving Average Linear Filters .................................................................................................. 28

Differencing ............................................................................................................................... 29

Stationarity ................................................................................................................................. 30

Sample Statistics ........................................................................................................................ 30

Noise .......................................................................................................................................... 31

Linear Processes ......................................................................................................................... 31

Time Series Models ....................................................................................................................... 32

Moving Average Process ........................................................................................................... 32

Autoregressive Process .............................................................................................................. 33

Autoregressive Moving Average Models .................................................................................. 34

Causality..................................................................................................................................... 34

Invertibility................................................................................................................................. 35

Calculation of ACF .................................................................................................................... 35

Partial Autocorrelation Function ................................................................................................ 37

Model Building .............................................................................................................................. 38

Model Selection ......................................................................................................................... 38

Parameter Estimation ................................................................................................................. 39

Model Diagnosis ........................................................................................................................ 40

ACTL2003 Summary Notes - [PDF Document] (3)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 3/62

3

Zhi Ying Feng

Non-Stationarity ............................................................................................................................. 41

Stochastic Trends ....................................................................................................................... 41

ARIMA Model ........................................................................................................................... 42

SARIMA Model ......................................................................................................................... 42

Dickey-Fuller Test ..................................................................................................................... 43

Overdifferencing ........................................................................................................................ 44

Cointegrated Time Series ........................................................................................................... 44

Time Series Forecasting ................................................................................................................. 45

Time Series and Markov Property ............................................................................................. 45

k-step Ahead Predictor ............................................................................................................... 46

Best Linear Predictor ................................................................................................................. 47

Part 3: Brownian Motion.................................................................................................................... 48

Definitions .................................................................................................................................. 48

Properties of Brownian Motion.................................................................................................. 48

Brownian Motion and Symmetric Random Walk...................................................................... 49

Brownian Motion with Drift ...................................................................................................... 49

Geometric Brownian Motion ..................................................................................................... 49

Gaussian Processes .................................................................................................................... 50

Differential Form of Brownian Motion ..................................................................................... 50

Stochastic Differential Equations............................................................................................... 51

Stochastic Integration ................................................................................................................. 52

Part 4: Simulation............................................................................................................................... 53

Continuous Random Variables Simulation .................................................................................... 53

Pseudo-Random Numbers.......................................................................................................... 53

Inverse Transform Method......................................................................................................... 53

Discrete Random Variable Simulation ...................................................................................... 54

Acceptance-Rejection Method ................................................................................................... 55

Simulation Using Distributional Relationships.......................................................................... 56

Monte Carlo Simulation ................................................................................................................. 57

Expectation and Variance .......................................................................................................... 57

Antithetic Variables ................................................................................................................... 58

Control Variates ......................................................................................................................... 59

Importance Sampling ................................................................................................................. 60

Number of Simulations .............................................................................................................. 60

ACTL2003 Summary Notes - [PDF Document] (4)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 4/62

4

Zhi Ying Feng

Part 1: Stochastic Processes

Stochastic Processes

A stochastic process is any collection of random variables, denoted as:

,X t t T

Tis the index set of the process, usually the time parameter X(t)is the state of the process at time t

S is the state space, i.e. the set of values thatX(t)can take on

Stochastic processes can be classified according to the nature of the:

Index Set T

- Discrete time process: index set is finite or countably infinite

- Continuous time process: index set is continuous

State SpaceS

-

Discrete state space:X(t)is a discrete random variable- Continuous state space:X(t)is a continuous random variable

A sample path or a realisationof a stochastic process is a particular assignment of possible values

ofX(t)for all t ∈ T.

Increment Properties

In a Markov chain, an increment is a random variable for all 1 2,t t :

2 1X t X t

A stochastic process has independent increments if 0 1 0 1, ,..., n nX t X t X t X t X t areindependent for all 1 2 ... nt t t . Equivalently, the r.v. X t s X t and X t are independent

for alls t , i.e. future increases are independent of the past or present.

A stochastic process has stationaryincrements if 2 1 2 1andX t X t X t X t , i.e.

increments of the same length, have the same probability distribution, for all 1 2, and 0t t .

Markov Processes

A Markov processis a stochastic process that has the Markov property, where given the present

state, the future state is independent of the past states:

1 1 1 1 1 1Pr | ,..., Pr |n n n n n n n nX t x X t x X t x X t x X t x

A Markov Chainis a Markov process on a discrete index set 0,1,2,...T denoted by

, 0,1,2,...nX n

i.e. the process is in state kat time n, and a finite or countable state space:

0,1,2,..., or 0,1,2,...S n S

The Markov property for a Markov chain is given by:

1 1 0 0 1 1Pr | ,..., Pr |n n n n n n n nX x X x X x X x X x

ACTL2003 Summary Notes - [PDF Document] (5)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 5/62

5

Zhi Ying Feng

Transition Probability

The one-step transition probabilitiesof the Markov chain is the conditional probabilityof

moving to statejin one step, given that the process is in state iin the present: , 1

1Pr |n n

ij n nP X j X i

If the one-step transition probabilities do NOTdepend on time, i.e. same for all nto n+1, then the

Markov chain ishom*ogenous with stationary transition probabilities. , 1

1Pr |n n

ij ij n nP P X j X i

The transition probability matrixis the matrix consisting of all transition probabilities:

00 01 02

10 11 12

20 21 22

ij

P P P

P P PP

P P P

P

This matrix satisfies the properties:

0 for , 0,1, 2... and 1 for 0,1, 2...ij ij

all jP i j P i

Note: in the non-hom*ogenous case, time must be specified.

The n-step transition probabilityis the probability that a process in state iwill be in statejin n

steps:

Pr |n

ij n m mP X j X i

Chapman-Kolmogorov Equations

The Chapman-Kolmogorov equations computes the n+m-step transition probabilities:

for all , 0n m n m

ij ik kj

k

P P P n m

This is the sum of the transitional probabilities of reaching an intermediate state kafter nsteps, then

reaching statejafter msteps. Or in matrix multiplication:

...

n m n m

n m times

P P P P P P

Where n nijP P is the matrix consisting of the n-step transition probabilities.

Consider the matrix multiplication A*B = AB

To get the ith row of AB, multiply the ith row of A by B

To get thejth column of AB, multiply A by thejth column of A

Kolmogorov Forward Equations: start in state i, transition into state kafter nsteps, then a one-

step transition to statej

1

for all 0n n

ij ik kj

k

P P P n

Kolmogorov Backward Equations: start in state i, one-step transition into state k, then nstep

transition into statej.

1

for all 0n n

ij ik kj

k

P P P n

ACTL2003 Summary Notes - [PDF Document] (6)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 6/62

6

Zhi Ying Feng

Classification of States

Absorbing state

A state iis said to be an absorbing state if 1iiP or 0 for allijP j i . An absorbing state is a state

in which, if the process ones arrive in it, the process will stay in.

Accessible

Statejis accessible from state iif 0n

ijP for some 0n . Statejis accessible from iif there is a

positive probability that the process will be in statejat some future time, given that the process is

currently in state i. This is written as:

i j

Note that:

if and theni j j k i k

The absorbing state is the ONLYaccessible state for an absorbing state.

Communicate

States iandjcommunicates if:

and , i.e.i j j i i j

That is, there is a positive probability that if the process is currently in state i, then at a future point

in time the process will return to state iin between visiting statej. Note that:

for all

if then

if and then

i i i

i j j i

i j j k i k

The class of statesthat communicate with state iis the set:

:C i j S i j

That is, there is a positive probability that if the process is in state i, then at a future point in time the

process will return to state iin between visiting at least one state in class C

Irreducible Markov Chain

A Markov chain is irreducible if there is only ONEclass, i.e. all states communicate with each

other. Properties:

All states in a finite, irreducible Markov chain is recurrent

The probability of returning to the current state in the long run is positive The probability of being in any state in the long run is also positive

Recurrent and Transient States

Letfibe the probability that the process starting in state iwill return to state isometime in the future.

0Pr |ni ii nf P X i X i

A state is:

Recurrentif 1if

Transientif 1if

If state iis recurrent then, starting in state i, the process will return to state iat some point in the

future, and the process will enter state iinfinitely often.

ACTL2003 Summary Notes - [PDF Document] (7)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 7/62

7

Zhi Ying Feng

If state iis transient then, starting in state i,the number of the times the process will enter state i

thereafter is a Bernoullirandom variable

0 if process returns to state

~ 11 if process does not return to state

i i

iX Ber p f

i

Then the probability that the process will be in state ifor exactly ntime periods can be modelled

using the geometric distribution:

1 1n

i if f

The total number of periods that the process is in state iis given by:

1 if ;, where

0 if ,

n

n n

nn

X iI I

X i

With the expected number of time periods that the process starts in state i:

0 0

0 0

| |

Pr |

1

1

n n

n n

n

n

nii

n

i

E I X i E I X i

X i X i

P

f

An alternate definition for recurrent/transient states is if a state iis:

1 1

recurrent if ; transient if n nii ii

n n

P P

That is, a transient state will only be visited a finite number of times.

Note:

In a finite state Markov chain, AT LEASTone state must be recurrent and NOT ALLstates

can be transient. Otherwise, after a finite number of times, no states will be visited!

An absorbing state is recurrent, since it will revisit itself infinite times

If state icommunicates with statejand state iis: Recurrent, then statejis also recurrent

Transient, then statejis also transient

A class of states is:

Recurrentif all states in the class are recurrent

Transientif all states in the class are transient

Closedif all of the states in the class can only lead to states WITHINthe class. Hence,

states in a closed class are recurrent

Open if the state in the class can lead to states OUTSIDEthe class. Hence states in an open

class are transient

ACTL2003 Summary Notes - [PDF Document] (8)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 8/62

8

Zhi Ying Feng

Class Properties

Period

State ihas period d(i)if d(i)is the greatest common divisor of all 1n for which 0niiP , i.e. it’s

the number of steps in all possible paths back state i, if the process starts in state i. Note that:

If 0niiP for all n, then 0d i .

If , theni j d i d j

A state with period 1 is called aperiodic

Positive Recurrent

A recurrent state iis positive recurrent if the expected time of return to itself is finite. In a finite

state Markov chain, ALL recurrent states are positive recurrent.

Ergodic

A state iis ergodic if it is positive recurrent and aperiodic.

Limiting Probabilities

Limiting probabilities is the long run probability that a process is in a certain state. For an

irreducible (only one class, all states communicate) and ergodic Markov chain, the limiting

probability exists and is independent of i, i.e. where the process starts from:

lim 0nj ij

nP j

Where j is the unique non-negative solution to the set of equations:

0 0

and 1j i ij j

i j

P

Or, in matrix form:

1 2where , ,... P

This can be interpreted as:

The probability that the process being in statejat time tis the same as t+1, as t

j is the long run proportion of time that the Markov chain is in statej

Note that the limiting probabilities may not exist at all, or only for even/odd transitions

If statejis transient and state iis recurrent, then lim 0n

ijn P andlim 1n

iin P

If the distribution of the initial states is chosen to be the limiting distribution, then the probabilities

of being in statejinitially is same as the probability of being in statejat time n:

0Pr , then Pr j n jX j X j

The mean time between visitsto statej is the expected number of transitions jjm until a Markov

chain which starts in statejto return to statej, given by the mean of geometric distribution:

1jj

j

m

That is, the proportion of time in statejequals the inverse of the mean time between visits to statej.

ACTL2003 Summary Notes - [PDF Document] (9)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 9/62

9

Zhi Ying Feng

Mean Time in Transient States

For a finite state Markov chain, let its transient states be denoted by 1,2,...,T t . Let TP denote the

transition matrix that ONLYcontains the transient states. Note that the rows will sum to less than 1.

11 12 1

21 22 2

1 2

...

...

...

...

t

t

T

t t tt

P P P

P P P

P P P

P

The mean time spent in transient states ijs is the expected number of periods that the Markov

chain is in statej, given that the process starts in state i.

11 12 1

21 22 2

1 2

...

...

...

t

t

t t tt

s s s

s s s

s s s

S

Conditioning on the initial transition, we have for ,i j T ,

1

1

if

1 if

t

ik kj

k

ij t

ik ki

k

P s i j

s

P s i j

i.e. mean time spent in statejgiven the process is currently in state i, given by the transition

probability into state kstarting in state i, times by the time spent in statejstarting in state k,

summed for all possible k.

Then we have:

1

T

T

S = I P S

S I P

Note that:

11 1

det

a b d b d bA

c d c a c aA ad bc

Branching ProcessesA branching process is a Markov chain that describes the size of a population where each member

in each generation produces a random number offsprings in the next generation.

Let , 0jp j be the probability of producingjoffsprings by each individual in each

generation

Assume that the offsprings of each individual is independent of the numbers by others

LetX0be the number of individuals initially present, i.e. the zeroth generation

Individuals produced from the nth generation belong to the (n+1)th generation

LetXnbe the size of the nth generation.

ACTL2003 Summary Notes - [PDF Document] (10)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 10/62

10

Zhi Ying Feng

Then , 0,1, ...nX n is a branching processwith:

00 1P and state 0 is recurrent

If 0 0p , all states other than 0 are transient, i.e. the population will either die out or

converge to infinity, since transient states are visited finite amount of times

After n+1generations, the size of the population is a random sun:

1

1

nX

n i

i

X Y

Where Ykare i.i.d. r.v. that represent the no. of offspring of theXi-th person:

Pr , for 0,1,... 1i k k

k

Y k p k p

The mean and variance of number of offspring for an individual is:

22

var

i kk

i k

k

E Y kp

Y k p

The mean and variance of number of the population of the nth generation is:

12 1

1

2

if 1var

if 1

n

nn

n

n

E X

X

n

Under the assumption that 0 1X , the probability of extinction, i.e. everyone dies, is given by:

0 0

k

k

k

p

Since the pgf of Yis 0

kY kk

G s p s

we have

0 0YG

If 1 then 0 1

If 1 then 0 1 , i.e. extinction is certain

Probability Generating Functions

LetXbe an integer-valued random variable with iP X i p for 0,1, 2...i . The p.g.f. is:

PrX xX

x

G t E t t X x

Properties:

Relationship between the probability mass functionofX:

1

!

xX

X x

t

P tp x

x t

Relationship between the moment generating functionofX:

exp logX X Xm t E Xt G t m t

ACTL2003 Summary Notes - [PDF Document] (11)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 11/62

11

Zhi Ying Feng

Time Reversible Markov Chains

Consider a stationary ergodic Markov chain with stationary probabilitiesPijand stationary

probabilities πi. If we start at time nand work backwards, the reversed sequence of statesXn, Xn-1,…

is also a Markov chain, due to independence being a symmetricrelationship.

The transition probabilities of this reversed process are:

1

1

1

1

1

Pr |

Pr ,

Pr

Pr Pr |

Pr

ij m m

m m

m

m m m

m

j ji

i

Q X j X j

X j X j

X j

X j X j X j

X j

P

A time reversible Markov chain is one where:for all ,ij ijQ P i j

For a time reversible Markov chain:

for all ,i ij j jiP P i j

Thus, the rate at which the process goes from state ito statej(LHS) is the same as the rate at which

it goes from statejto state i.

Exponential, Poisson and Gamma Distributions

A r.v.Xhas an exponential distribution if its probability density function is in the form:

if 0

0 otherwise

xe xf x

The cumulative probability distribution and the survival function are in the form:

1 if 0

0 if 0

xe x

F xx

if 01

0 if 0

xe x

S x F xx

The moment generating function is given by:

for

tXXM t E e

tt

The mean and variance of the exponential distribution is:

2

1

1var

E X

X

The hazard (or failure) rate is given by:

1

x

x

dS x

f x edxxS x F x e

ACTL2003 Summary Notes - [PDF Document] (12)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 12/62

12

Zhi Ying Feng

A key property of the exponential distribution is that it is memoryless.Xis said to be memoryless if

for alls,t>0,

Pr | Pr or;

Pr Pr Pr

X s t X t X s

X s t X t X s

Consider independent iX for 1,2,...,i n with parameter i then

If all the parameters are equal i then 1 2 ...n nY X X X has a gamma(n, )

distribution with pdf:

1

1 !n

n

tY

tf t e

n

1 2min , ,..., nX X X X also has an exponential distribution with parameter and mean:

1

1

1and

n

i ni ii

E X

The probability that iX is the smallest is given by:

1

i

n

jj

If all parameters are not equal, i.e. i j then the sum of nindependent exponential r.v. has

a hyperexponential distributionwith pdf:

1

, ,

1

wherein

ii

njx

i n i i nXi j i j i

f x C e C

Counting Process

A counting process , 0N t t represents the number of events that occur up to time t, i.e. it is a

discrete state space, continuous time stochastic process. A counting process has the properties:

0N t

N t is integer-valued

forN s N t s t , i.e. it must be non-decreasing

Fors t , N t N s is the number of events that occurred in the interval ( , ]s t

A counting process has independent incrementsif the number of events that occur in the interval

( , ]s t , N t N s , is independent of the number of events that occur up to times

A counting process has stationary incrementsif 2 1N t s N t s has the same distribution as

2 1N t N t , i.e. the number of events that occur in any interval of time period only depends on

the length of the time period.

ACTL2003 Summary Notes - [PDF Document] (13)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 13/62

13

Zhi Ying Feng

Poisson Process

A Poisson processdenoted by , 0N t t is a counting process that counts the number of events

which occur at a rate λfrom time 0. It has the properties:

0 0N

Independent and stationary increments

The number of events in any interval of length thas a Poissondistribution with mean t :

Pr Pr stationary increments

!

n

t

N s t N s n N t n

te

n

Inter-arrival and Waiting Time

The inter-arrivalor holding times , 1, 2, ...nT n is the time between the 1 th

n and thn event. For a

Poisson process, the inter-arrival times are i.i.d. exponential r.v. with:

1

, Pr andn

t tT n nf t e T t e E T

The waiting time nS is the time until the thn event, given by:

1 2

1

...n

n n i

i

S T T T T

The sum of exponential random variables each with SAMEparameter has a gamma(n, )

distribution, therefore:

1

1 !n

n

tS

tf t en

Note: if Snis the aggregate of mindependent claims, then it has a gamma(n, mλ) distribution. E.g.

for a motor insurer, each insured motorist makes a claim at a rate of 0.2 per year, and there are a

total of 200 insured. ThenS100, the waiting time until the 100thevent, has the distribution:

100~ Gamma 100, 0.2 200S

Order Statistics

Let iYbe the ith smallest value among 1 2, , ..., nY Y Y , then 1 2, ,..., nY Y Y is called the order statistics. If

iY are i.i.d. continuous r.v. with pdf if y , then the joint pdf of the order statistics 1 2, ,...,

nY Y Y is:

1 2 1 2

1

, ,..., ! , where ...n

n i n

i

f y y y n f y y y y

If iY are uniformly distributed over 0,t then the joint pdf of the order statistics is:

1 2

!, , ..., n n

nf y y y

t

ACTL2003 Summary Notes - [PDF Document] (14)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 14/62

14

Zhi Ying Feng

Conditional Distribution of Arrival Time

Given that N t n , i.e. up until time t, nevents have occurred, the narrival times 1,..., nS S have the

same distribution as the order statistics corresponding to nindependent r.v. uniformly distributed

over 0,t .

Note that the following two events are equivalent:

1 1 2 2 1 1 2 2 1 1 1, ,..., , , ,..., ,n n n n n nS s S s S n N t n T s T s s T s s T t s

i.e. inter-arrival time between the 1 th

n and thn event is equal to the waiting time for nevents

minus the waiting time for 1 th

n events:

11

1 21 2

, ,..., ,, ,..., |

Pr

...

!

!

n n n

nn

s s t ss

n

t

n

f s s s nf s s s N t n

N t n

e e e

te

n

n

t

The conditional marginal distribution is:

Pr ,Pr |

Pr

Pr Pr

Pr

!

! !

n mm t ss

n t

mn m

n

N s m N t nN s m N t n

N t n

N s m N t s n m

N t n

n s e t s e

m n m t e

n st s

m t

Thinning of Poisson Process

Consider a Poisson process

, 0N t t with rate and that each time an event occurs, it is either:

type I event with probability

type II event with probability 1

p

p

And that it is independent of all other events. Let 1N t and 2N t denote the type I and type II

events occurring in [0,t]. Let the sum of the two independent process be:

1 2N t N t N t

1 , 0N t t is a Poisson process with rate p

2 , 0N t t is a Poisson process with rate 1 p

, 0N t t is a Poisson process with rate equal to the sum of the two rates, i.e.

ACTL2003 Summary Notes - [PDF Document] (15)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 15/62

15

Zhi Ying Feng

Non-hom*ogenous Poisson Process

The counting process , 0N t t is a non-hom*ogenousPoisson process with intensity function

t , i.e. non-constant rate, if:

0 0N

It has independent increments It has unit jumps, i.e.

Pr 1

Pr 2

N t h N t h o h

N t h N t o h

t itself can be deterministic or a process itself. Note that if t then it is a hom*ogenous

Poisson process.

The mean value functionof a non-hom*ogenous Poisson Process is:

t

m t y dy

Then:

N t is a Poisson r.v. with mean m t

N s t N s is a Poisson r.v. with mean m s t m s

1 , 0N m t t is a hom*ogenous Poisson process with intensity 1

Compound Poisson Process

A stochastic process , 0X t t is a compound Poisson processif

1

N t

i

i

X t Y

Where:

, 0N t t is a Poisson process

, 1, 2, ...iY i are i.i.d. random variables

This is useful for insurance companies, where the uncertainty on total claim sizeX(t)is due to both

the number of claimsN(t)which follows a Poisson process, and also the claim size Y, which canhave any distribution as long as they are i.i.d. Note that claim size Yis independent of the number of

claimsN(t)

The mean and variance of a compound Poisson process is given by:

2var

i

i

E X t tE Y

X t t E Y

ACTL2003 Summary Notes - [PDF Document] (16)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 16/62

16

Zhi Ying Feng

Continuous-time Markov Chains

A continuous-time Markov chain, or a Markov jump process, is a Markov process in continuous

timewith a discrete set space. The Markov jump process , 0X t t has a continuous version of

the Markov property: for all , 0s t and nonnegative integers , , , 0i j x u u s :

,

Pr , Pr |

X t iX t s j X u x u X t s j X t i

u t

i.e. the process at time t+sis only conditional on the state at time tand independent of its past

states before time t, i.e. for 0X u x u u t . This means that time path up to the state at time t

does not matter.

The Markov jump process is a hom*ogenousprocess, i.e. all transition probabilities are stationary, if:

Pr | Pr | 0 i.e. independent of

ij

X t s j X t i X s j X i t

P s

Hence, the transition probability over a time period only depends on the duration of the time period.

The Poisson Process is a continuous time Markov chain:

Pr | Pr jumps in ( , ] | jumps in 0,

Pr jumps in ( , ] independent increments

Pr jumps in (0, ] stationary increments

!

j i

t

X t s j X s i j i s s t i s

j i s s t

j i t

tej i

Time Spent in a State

The time spent in a state ibefore the next jump, denoted asTi, for a continuous Markov chain, has

the memoryless property:

Pr | Pr , , 0

Pr , | Markov Property

Pr , 0 | 0 Stationary IncrementsPr

i i

i

T s t T s X v i s v s t X u i u s

X v i s v s t X s i

X v i v t X iT t

The only continuous distribution with the memoryless property is the exponential distribution.

Hence, the time spent in a state is exponentially distributed with:

~ exp Pr 1 iv ti i iT v T t e

With the expected time spent in stateibefore transitioning into another state is:

1i iE T v

Where viis the rate transitionwhen in state i, i.e. the transition rate out of state ito another state. It

is the sum of all the rates of jumping from state ito another statej, denoted as qij:

i ij iij iv q q

ACTL2003 Summary Notes - [PDF Document] (17)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 17/62

17

Zhi Ying Feng

Transition Rates and Probabilities

The transition probability of the continuous-time Markov chain is defined as:

, Pr |ijP t t s X t s j X t i

For a hom*ogenous or stationary process we have stationary transition probabilities:

, Pr |

Pr | 0

ij

ij

P t t s X t s j X t i

X s j X i

P s

With the initial conditions

0 if

, 01 if

ij ij

i jP s s P

i j

Transition Rates of hom*ogenous Processes

The instantaneous transition rate ijq from state ito statejof a continuous-time Markov chain is

defined as the rate of change of the transition probability over a small period of time:

00

lim if 0 0

lim1

lim if

ij

ijhij ij

ijh

t iiii i

h

P hq i j

P h P hP t

t h P hq v i j

h

Where i iiv q denote the transition rate out of state iwhen the process is in state i.Note that:

1 sum of all transition probabilities is 1

0 sum of all transition rates is 0

ijj

ijj

P

q

Therefore, we have that:

0ii ijj i

ii ij ij i

q q

q q v

Equivalently, the transition probability over a small time hcan be defined as:

if

1 if

ijij

ii

q h o h i jP h

q h o h i j

Transition Rates of Non-hom*ogenous Processes

In the non-hom*ogenous case, the transition rates are also a derivative of the transition probabilities:

,lim if

, ,, lim

, 1lim if

ij

ijhij ij

ijh

t s iiii i

h

P s s hq s i j

P s s h P s s hP s t

t h P s s hq s v s i j

h

Equivalently, the transition probability over a small time hcan be defined as:

if,

1 ifij

ijii

q s h o h i jP s s hq s h o h i j

ACTL2003 Summary Notes - [PDF Document] (18)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 18/62

18

Zhi Ying Feng

Now, ignore when the transition occurs and how long is spent in each state. Consider only the

series of states that the process transitions into. Let:

iv be the transition rate OUTof state iwhen the process is in state i

ijPbe the probability that the transition is into statej, conditional on the fact that a transition

has OCCURREDand process is currently in state i. (see embedded Markov chain)

Then, ijq , the transition rate INTOstatej, when the process is in state i,is given by:

iij ijq v P

Using Pr |PA B AA P B , where A is the event of moving out of state i, B is the event of

moving to statej. Then we also have:

ii i ij i ijj i j i

ii i ijj i

q v q v P

q v q

Therefore,

ij ij

ij

i ij

j i

q qP

v q

Chapman-Kolmogorov Equations

For a hom*ogenous continuous-time Markov chain the Chapman-Kolmogorov equationis:

for 0 and all states ,ij ik kj

k

P t s P t P s t i j

With the initial conditions:

0 if

01 if

ij

i jP

i j

The Kolmogorov’s backward equationis:

ij ik kj ik kj i ij

k k i

P t q P t q P t v P tt

The Kolmogorov’s forward equationis:

ij ik kj ik kj j ij

k k j

P t P t q P t q v P tt

Define the matrices:

ij ij ijt P t q t P t t t

P Q = P

In matrix form we have:

Chapman-Kolmogorov Equations

Kolmogorov's Backward Equations

Kolmogorov's Forward Equations

Initial Condition 0

t s t s

t tt

t tt

P P P

P QP

P P Q

P I

ACTL2003 Summary Notes - [PDF Document] (19)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 19/62

19

Zhi Ying Feng

Limiting Probabilities

The limiting probabilities of a continuous-time Markov chainPjis the long run probability, or the

long-run proportion of time that the process will be in statej, independent of the initial state:

limj ijt

P P t

To solve the limiting probabilities, if it exists, we solve the equations:

0j j kj k kj k

k j k

v P q P q P

0P Q

1k

k

P

This equation implies that the rate at which process leaves state jis equal to the rate at which the

process enters statej:

j jv P is the rate at which process leaves statejsince jP is the long-run proportion of time the

process is in statejand jv is the rate of transition out of statejwhen the process is in statej

kj kk j

q P

is the rate at which the process enters statejsince

k

P is the long-run proportion

of the time the process is in state kand kjq is the rate of transition from state kto statej

For the limiting probabilities limt ijP to exist, the conditions are:

All statescommunicate so that starting in state ithere is a positive probability of being in

statej

Markov chain is positive recurrent so that starting in any state, the mean time to return to

that state is finite

When the limiting probabilities exist, then the Markov chain is ergodic, note that aperiodic

is unnecessary as periodicity does not apply to continuous-time Markov chains

If the initial state is chosen according to the limiting distribution, then the probability of being in

statejat time tfor all thas the same distribution (hom*ogenous)

Pr 0 Pr j jX j P X t j P

Embedded Markov Chain

For a continuous-time Markov chain that is ergodic with limiting probabilities iP, the sequence of

states visited(ignoring time spent in each state) is a discrete-time Markov chain. This discrete-timeMarkov chain is known as the embedded Markov chain.

The embedded Markov chain has transition probabilities:

Pr transition to state | transition out of state

Pr transition from state to state

Pr transition out of state

ij

ij ij

i ijj i

P j i

i j

i

q q

v q

Note: in general ij ijP P t ! Do not get confused!

ACTL2003 Summary Notes - [PDF Document] (20)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 20/62

20

Zhi Ying Feng

Assuming that the embedded Markov chain is ergodic, its limiting probabilities i , i.e. the long run

proportion of transitions into state i, are the unique solutions of the set of equations:

and 1i j ji i

j i

P

The proportion of timethe continuous-time process spends in state iis given, i.e. the limiting

probabilitiesof the original continuous Markov process, can also be found using:

i i

i

j jj

vP

v

Note that i is the limiting probability of the embedded Markov chain, is the proportion of the

number of transits into state imultiplied by1 iv , i.e. the mean time spent in state iduring a visit

Time Reversibility

Going backwards, given the process is in state iat time tthe probability that the process has been in

state ifor an amount of time greater thansis

iv s

e

:

Pr Process in state throughout [ , ]Pr Process in state throughout , |

Pr

Pr Pr

Pr

i

i

v s

i t s t i t s t X t i

X t i

X t s i T s

X t i

e

Since for large t, we have the limiting probability Pr Pr iX t s i X t i P . Therefore, if

we go back in time, the amount of time the process spends in state iis also exponentially with rate

iv . Thus the reversed continuous-time Markov chain has the same transition intensities as for the

forward-time process.

The sequence of states visited by the reversed process, i.e. its embedded chain, is a discrete-time

Markov chain with transition probabilities:

j ji

ij

i

PQ

Thus, the continuous-time Markov chain will be time reversible, i.e. with same probability

structure as the original process, if the embedded chain is time reversible, i.e.

i ij j jiP P

Using the proportion of time the continuous-time chain is in state i:

i i i

i i i i

j jj jj

i ij j ji i i ij j j ji

vP Pv

vv

P P Pv P P v P

Note that ij i ijq v P , then we have an equivalent condition for time reversibility:

i ij j jiPq P q

i.e. rate at which the process goes directly from itojis the same as the rate to go directly from jto i.

ACTL2003 Summary Notes - [PDF Document] (21)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 21/62

21

Zhi Ying Feng

Birth and Death Process

A birth and death processis a continuous-time Markov chain with states 0,1, 2... for which

transitions from state nmay only go into either 1n (a birth) or state 1n ( a death).

Suppose that the number of people in a population/system is n

New arrivals enter the population/system at a birth/arrival rate n

- The time until the next arrival is exponentially distributed with mean 1 n

People leave the population/system at a death/departure rate n

- The time until the next departure is exponentially distributed with mean 1 n

Transition Rates and Embedded Markov Chain

Let vibe the transition rateout of stateiwhen the process is in statei. Initially with a population

of 0, there can only be birth (population can't be negative), so 0 0

0 0i ij

i i ij i

vv q

v

For the corresponding embedded Markov chain, let the transition probabilityPijdenote the

transition probabilities between states. If the Markov chain is in state 0and a transition occurs, it

must transition into state 1, therefore:

01 1P

To derive the transition probabilities of the embedded Markov chain, consider a population of i. It

will jump to 1i if a birth occurs before death and jump to 1i if a death occurs before birth:

The time to a birthTbis exponential with rate i

The time to a deathTdis exponential with ratei

Therefore, the probability that a birth occurs before a death is given by: (minimum of ind. Exp.)

Pr ib d

i i

T T

Then the transition probabilitiesare:

, 1 , 1 , 1 ,and 1 and 0 for 1 1i ii i i i i i i k

i i i i

P P P P i k i

Also, since the time to the next jump, regardless of whether it is a birth or death, is exponential with

rate viand the rate of jumping out of state iis either a birth or death, then we have as before:

i i iv

Examples of Birth and Death Processes

Birth and death rates independent of n

A supermarket has one server and customers join a queue. Customers arrive at rate with

exponential inter-arrival time (Poisson process). The server serves at a rate with service time also

exponentially distributed. LetX(t)be the number in the queue at time t, then:

for 0

for 10 for 0

n

n

n

nn

, 1

, 1

1 0

1,2,...

0 0

1,2,...

i i

i i

for iP

for i

for iP

for i

ACTL2003 Summary Notes - [PDF Document] (22)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 22/62

22

Zhi Ying Feng

Birth and death rates dependent on n

If there are nowscheckout operators serving the queue, with customers joining a single queue and

going to the first available server, then:

for 0

0 for 0

for 1

for

n

n

n

n

n n s

s n s

, 1

, 1

for 0,1, 2...min ,

min , for 0,1,2..min ,

i i

i i

P ii s

i sP ii s

Poisson Process

A poison process is a special case of a birth and death process, with no death:

for 0

0 for 0

n

n

n

n

, 1

, 1

1 for 0,1, 2...

0 for 0,1, 2..

i i

i i

P i

P i

Population Growth

Each individual in a population gives birth at an exponential rate plus there is immigration at an

exponential rate

. Each individual has an exponential death rate

for 0

for 0

n

n

n n

n n

, 1

, 1

for 0,1,2...

for 0,1,2..

i i

i i

nP i

n n

nP i

n n

Expected Time in States

Let Tibe the time for the process, starting from state i, to enter state i+1, i.e. time until a birth.

Define the indicator functionIas:

1 if first transition is 1

0 if first transition is 1i

i iI

i i

Then the expected value of Tican be calculated by conditioning on the first transition:

1

1

1

Pr 1 Pr 0| 1 | 0

1 1

1

1

i

ii i

i i i i

i i

i

i i

i i

i i

i i i i

i i

i i i i i i

i

T

T T

T

I IT I T I

T T

Since the time to first transition, regardless of whether it is a birth or death, is exponential with rate

i i . The time taken to i+1if there is a death is the time from i-1to ithen from ito i+1. Note:

00

1T

In the case where the rates are hom*ogenous, i.e. , for alli i i :

111

1 ...

ii

iT

Then the expected time to go from state ito a higher statejis:

1 1...j i i jT T T T

ACTL2003 Summary Notes - [PDF Document] (23)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 23/62

23

Zhi Ying Feng

Kolmogorov Equations

We can use the transition rates of a death and birth process to find the Kolmogorov’s equations, and

then solve these differential equations to find the transition probabilities.

0,1 0

0,0 0 0

0, 0 for all 1i

q

q v

q i

, 1

, 1

, , otherwise zero

i i i

i i i

i i i i i

q

q

q v

Kolmogorov’s Backward Equations

0, 0 1, 0 0,

, 1, 1, ,

j j j

i j i i j i i j i i i j

P t P t P tt

P t P t P t P tt

Kolmogorov’s Forward Equations

,0 0 ,1 0 ,0

, 1 , 1 1 , 1 ,

i i i

i j j i j j i j i i i j

P t P t P tt

P t P t P t P tt

Limiting Probabilities

The limiting probabilities are determined by the balance equations:

for all and 1j j kj k k

k j k

v P q P j P

This states that the rate at which the process leaves statej(RHS) is equal to the rate at which the

process enters statej(LHS). For a birth and death process, this condition breaks down iteratively:

1 1 1 1 1 1n n n n n n n n n n nP P P P P Then in general:

0 01 11 0 2 1

1 0

1 2 2 1

1 111 0

1

2 1

...

, ...

njn

n n

jn n j

n

P P P P P

P P PP

Substituting this into the other balance equation0

1nn P

:

1 01

0 0 0 01 011 1 2 1

12 1

1

1 ... 11 ...

n

nnn n n

nn

P P P P P

Substituting back into (2) forPngives:

1 01

1 0 2 110

1 012 11

2 1

...

...

1 ...

n

n nn

nnn

n

P P

This also gives the conditionfor long run probabilities to exist, i.e.:

1 01

12 1

...n

nn

ACTL2003 Summary Notes - [PDF Document] (24)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 24/62

24

Zhi Ying Feng

Application: Pure Birth Process

The Poisson processis an example of a pure birth process, i.e.

for 0

0 for 0

n

n

n

n

In this case, re arranging the Kolmogorov backward equations for the case 0i

, 1, , , , 1,i j i j i j i j i j i jP t P t P t P t P t P tt t

Multiply both sides by te :

, , 1,t t

i j i j i je P t P t e P t t

By the product rule, this is equivalent to:

, 1,t t

i j i je P t e P t t

Integrate both sides w.r.t. from 0tos:

0, , 1, ,

, 1,0

0 0 0 for s

s ti j i j i j i j

s s t

i j i j

e P s e P e P t dt P i j

P s e P t dt

LHS is the probability of moving from state itojover the time period 0tos

RHS is the probability of the process staying in state ifor a time s t , i.e. s t

e

, then

jumping to state i+1over the time interval s t to s t dt , i.e. dt , then the probability

that the process starts in i+1and finishes injover the time interval s t tos, i.e. 1,i jP t dt ,

integrated for all possible times from 0tos, which is equivalent to tvarying from 0tos.

Application: Simple Sickness Model

In a simple sickness model, an individual can be in state 0 (healthy) or in state 1 (sick).

An individual remains healthy for an exponential time with mean1before becoming sick

An individual remains sick for an exponential time with mean1 before becoming healthy

In this birth and death process, we have:

0 1and

With all other ,i i being zeros.

Using Kolmogorov’s backward/forward equations, the transition probabilities are:

0,0

1,0

0,1

1,1

1

1

s

s

s

s

P s e

P s e

P s e

P s e

ACTL2003 Summary Notes - [PDF Document] (25)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 25/62

25

Zhi Ying Feng

Occupancy Probabilities and Time

In a simple sickness model, the occupancy probabilitiesare defined as:

0,0P s is the probability that a healthy individual stays healthy for a period ofs

1,1

P s is the probability that a sick individual remains sick for same period

By the Markov property, the time spent in state 0 or state 1 is exponential with the memoryless

property:

0 00,0

1 11,1

Pr 1 Pr 1 (1 )

Pr 1 Pr 1 (1 )

s s

s s

P s T s T s e e

P s T s T s e e

The occupancy timeO(t)is the total time that the process spends in each state during the interval

(0,t). If we define the indicator function:

1 if 1

0 if 0

X sI s

X s

Then the occupation time for being sick is:

tO t I s ds

The expected occupation time being sick given that the initial state is healthy is given by:

0,10

2

| 0 0 | 0 0

| 0 0

Pr 1| 0 0

1

1

t

t

t

t

ts

t

O t X I s ds X

I s X ds

X s X ds

P s ds

e ds

t e

First Holding Time

The first holding timeT0is the first the time process jumps out of the initial state:

0 0inf : tT t X X For a hom*ogenous Markov jump process, this is exponentially distributed with rate i which is the

rate of jumping out of state i, previously denoted as iv .

i ii ijj iv q

Thus, the first holding time has the c.d.f. and p.d.f.:

0 0 0 0Pr | Pr |i it tiT t X i e T t X i e

The probability of the state to which the process jumps from is:

0 0Pr |

ij

T iji

q

X j X i P

Where0TX is independent of 0T

ACTL2003 Summary Notes - [PDF Document] (26)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 26/62

26

Zhi Ying Feng

Non-hom*ogenous Markov Jump Processes

Let the transition ratesbe denoted by ij s , equivalent to the ijq s previously

,ij ij

t s

s P s tt

So that the non-hom*ogenous transition probabilities over a small time period of his:

if,

1 if

ij

ij

ii

h s o h i jP s s h

h s o h i j

ii ij

i j

s s

Note that this can be written as:

if,

1 if

ijij

ii

h s h o h i jP s h s

h s h o h i j

So that:

, ,ij ij

ij

P s h s P s s o hs h

h

Then by letting 0h we get:

, ,lim

, differential w.r.t

ij ij

ijh

ij

t s

P s h s P s s o hs

h

P s t ss

The Chapman-Kolmogorov equationsare:

, , , , , ,

,

ij ik kj

kP s t P s u P u t s t s u u t

s s

P P P

P I

The Kolmogorov’s forward equationsare derived by differentiating the Chapman-Kolmogorov

equation w.r.t t, then setting u t :

, , ,

, , ,

, ,

ij ik kj

k u t

kj ik kj ik j ij

k k j

P s t P s u P u tt t

t P s t t P s t v t P s t

s t s t tt

P P Q

The Kolmogorov’s backward equationsare derived by differentiating the Chapman-Kolmogorov

equation w.r.ts, then setting u s :

, , ,

, , ,

, ,

ij ik kj

k u s

ik kj ik kj j kj

k k j

P s t P s u P s ts s

s P s t s P s t v t P s t

s t s s t

s

P Q P

Where tQ is the matrix containing ij s for all i,j.

ACTL2003 Summary Notes - [PDF Document] (27)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 27/62

27

Zhi Ying Feng

Residual Holding Time

The residual holding timeRsis the time betweensand the NEXTjump:

, ,s s uR w X i X i s u s w

i.e. the state remains in the same state ibetween timesands+w. Note that this is the non-

hom*ogenous case of first holding time. We can show that:

Pr | exps w

s s is

R w X i v u du

The density of |s sR w X i is given by:

Pr | 1 Pr | exps w

s s s s i is

R w X i R w X i v s w v u duw w

Define

ss s RX X

as the state the process jumps to at the next jump. The density for this is:

Pr | ,

ij

s s s

i

s wX j X i R w

v s w

Then we have the transition probability as:

, Pr |

s w

is

ij t s

v u du

i

P s t X j X i

e v s w

iI

i

s w

v s w

,

t s

Ij

I i

P s w t dw

This is the conditional probability that a process is in state jat time t, given that it started in state iat

times. This transition can be done by:

(1)Staying in state ifrom timesfor a duration of w, i.e.

s w

is

v u d ue

(2)

Jump out of state i, i.e. i s w

(3)Given that a jump has occurred, it jumps to an intermediate state I, i.e. iI is w v s w

(4)Then jump from stateItojovers+wto tand stay in statejuntil time t, i.e. ,IjP s w t

(5)Integrated over all possible wand summed for all possible states

This is the integral form of the Kolmogorov backward equation, i.e. we consider the jump (q)first

Current Holding Time

For non-hom*ogenous processes the current holding timeCtis the time between the last jump and t:

, ,t t uC w X j X j t w u t

It can be shown that:

Pr | expt

t t jt w

C w X j v u du

The density of |t tC X j is given by:

Pr | 1 Pr | expt

t t t t j jt w

C w X j C w X j v t w v duw w

Then we have the transition probability as:

, ,

t

jt w

t sv u dukj

ij ik jk j j

t wP s t P s t w v t w e dw

v t w

This is the integral form of the Kolmogorov forward equation, i.e. we consider the jump (q)last.

ACTL2003 Summary Notes - [PDF Document] (28)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 28/62

28

Zhi Ying Feng

Part 2: Time Series

Introduction to Time Series

A time series is a sequence of observations that are recorded at regular time intervals, usually

discrete and evenly spaced, e.g. daily or monthly. Let tx denote the observed data:

1 2, ,... tx x x

A time series model for tx is a family of distributions to which the joint distribution of tX is

assumed to belong. Let tX denote the time series, where each tX is a r.v.:

1 1..., , , ,...t t tX X X

Therefore, 1 1, ,t t tx x x are realisations of the r.v. 1 1, ,t t tX X X

Classical Decomposition Model

The classical decomposition modeldecomposes the original dataXtinto 3 components

t t t t X T S N

1

0 0d

t t t d t jjN S S S

Where:

Ttis a deterministic trend componentthat is slowly changing and perfectly predictable

Stis a deterministic seasonal componentwith a known periodof dand perfectly

predictable. Note that the seasonal component over a complete cycle is zero, i.e. dwould be

4 if it is quarterly.

Ntis a random componentwith expected value 0, as all information should be captured in

the trend and seasonal component.Ntmay be correlated and hence partially predictable

Moving Average Linear Filters

A moving average linear filterhas the form:

1

2 1

q

t j t j t j

j j q

T a X X q

Eliminating the Seasonality

A moving average linear can also be passed throughXtto eliminate the seasonal component:

If the period dis odd, i.e. 2 1d q , use the filter:

1

1 1 1...t q t q t qX X Xd d d

If the period dis even, i.e. 2d q , use the filter

1 1

1 1 1...

2 2t q t q t q t qX X X X

d

Estimating the Trend

A moving average linear filter can be applied to estimate the trend. Consider t t tX T N where

tT t , applying a 2q+1point moving average filter toXtgives:

1 1

2 1 2 1

q q

t t j t j t j t

j q j q

T T N t j N t T q q

ACTL2003 Summary Notes - [PDF Document] (29)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 29/62

29

Zhi Ying Feng

For the classic decomposition model:

1 1 1

2 1 2 1 2 1

q q q q

t j t j t j t j t j

j q j q j q j q

T a X T S N q q q

This model works well, i.e. t tT T if:

The trend is approximately linear

The sum ofNtis close to zero

The sum of Stis zero

The estimated noise, or the residuals, is:

t t t t N x T S

Differencing

The backshift operator, or the lag operator,Bis defined by:

1t t

jt t j

BX X

B X X

The difference operator is defined by:

11t t t t X B X X X

The powers of are defined by:

1 jj

t t

t t

X B X

X X

The difference operator with lag dis defined by:

1 dd t t t t d X B X X X

Eliminating the Trend

Trend can be eliminated by using differencing. E.g. consider a linear trend:

0 1,t t t t X T N T c c t

Apply differencing to the power of one:

0 1 0 1

1

1

t t t

t

t

X T N

c c t c c t N

c N

i.e. the trend becomes a constant c1, which is stationary. This constant can be estimated by the

sample average of tX . In general, a trend of a polynomial of degree kcan be reduced to a constant

by differencing ktimes:

0 1 ... k k k k t k t t t T c c t c t X T N

Which is a random process with mean ! kk c

Differencing to Eliminate Seasonality

Seasonality can also be eliminated by differencing, by differencing once only but by a lag d

equivalent to the period of the seasonality:0d t t t d S S S

ACTL2003 Summary Notes - [PDF Document] (30)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 30/62

30

Zhi Ying Feng

Stationarity

The theory of stationary random processes is important in modelling time series, as stationarity

allows parameters to be estimated efficiently, since we can treat all samples as from the same

distribution. A non-stationary random process must be transformed into a stationary one before

analysis and modelling, i.e. by removing trend and seasonality, or by applying transformations.

A random process tX is said to be:

Integrated of order 0, i.e. 0I if tX itself is a stationary time series process

Integrated of order 1, i.e. 1I if tX is not stationary, but the increment through

differencing 1t t t t Y X X X form a stationary process

Integrated of order n, i.e. I n if 1n

tX is still not stationary, but the nth differenced series

nt tY X is a stationary time series process.

However, the deseasonalised, detrended residuals, i.e. the random component, may still contain

information, i.e. they are correlated over time. Assuming stationarity in the residuals, we can try to

fit a probability model to the residuals for forecasting purposes. To do so, we will need to look at

some sample statistics of the residuals:

Sample Statistics

Let tX be a stochastic process such that var tX for all t.

The autocovariance function (ACVF)is defined by:

,

0 , var

t t

t t t

Cov X X

Cov X X X

The autocorrelation function (ACF)is defined by:

,

,0 ,

t tt t

t t

Cov X X Corr X X

Cov X X

The covariance functionis defined to be:

,Cov X Y X X Y Y

XY X Y

Some properties of the covariance function are:

, 0

, ,

, , , , ,

Cov X a

Cov aX bY abCov X Y

Cov aX bY cU dV acCov X U adCov X V bcCov Y U bdCov Y V

A process tX is said to be weakly stationaryif:

cov ,

t

t t

X

X X

i.e. mean is constant and the covariance of the process only depend on the time difference .

ACTL2003 Summary Notes - [PDF Document] (31)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 31/62

31

Zhi Ying Feng

Noise

I.I.D Noise

tX is i.i.d. noise if tX and t hX are independently and identically distributed with mean zero, i.e.

no covariance.

2~ IID 0,tX

Assuming 2tX , as for weakly stationary series we need bounded first and second moments,

then:

2

if 0

0 if 0

X

X

hh

h

White Noise

tX is white noise with zero mean if:

2~ WN 0,tX

Where:

2

if 0

0 if 0

X

X

hh

h

IID noise is white noise, but white noise is not necessarily IID noise.

White noise is weakly stationary

Usually, we assume that the error terms have a normal distribution for the purpose of

parameter estimation and etc.

2~ 0,tX N

Linear Processes

tX is a linear process if it can be represented as:

jt j t j j t t

j j

X Z B Z B Z

Where j is a sequence of constants with jj

and 2~ WN 0,tZ .

Linear processes are stationary because they are a linear combination of stationary white noise

terms. For stationary processes, the regularity condition jj

holds, i.e. jj

is

absolutely convergent. This ensures that the infinite sum can be manipulated the same way as a

finite sum, i.e. two absolutely convergent series can be added or multiplied together.

In general, if tY is stationary and jj

holds then

t tX B Y

tX is stationary. Essentially, ALL stationary processes can be represented by a linear process

ACTL2003 Summary Notes - [PDF Document] (32)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 32/62

32

Zhi Ying Feng

Time Series Models

Moving Average Process

tX is a first-order moving average processMA(1), if there is a process 2~ WN 0,tZ and a

constant θsuch that:

1t t tX Z Z

i.e. the next term depends on the noise plus the proportion of the noise of the previous period.

The mean is:

1

t t tX Z Z

The autocovarianceACVFis:

2 2

1 1 1

21 1 2 1

1 1

,

, var 1 if 0

, var if 1

, 0 if 1

t t h

t t t t t t

t t t t t

t t t h t h

h Cov X X

Cov Z Z Z Z Z Z h

Cov Z Z Z Z Z h

Cov Z Z Z Z h

The autocorrelationsACFis:

22 2

1 if 0

if 10 11

0 if 1

h

h hh h

h

The conditional meanis stochastic and depends onXt:

1 1 1 1| , ,... | 0t t t t t t t t X X X Z Z Z Z X

The conditional varianceis constant, i.e. independent onXt:

2 21 1 1 1var | , ,... var | 1 var t t t t t t X X X Z Z X

Therefore, the conditional and unconditional mean and variance are different.

tX is a moving average process of order q, MA(q), if the process depends on its previous q

realisations of noises:

1 1 ...t t t q t qX Z Z Z

Note:

Moving average processes are stationary processes asXtis a linear combination of stationary

white noise terms

TheACFof a MA(q) process has non-zero values up until lag q, and near-zero values for all

lags greater than q

The conditional mean/variance is used for forecasts, whereas the unconditional

mean/variance is the long run results

ACTL2003 Summary Notes - [PDF Document] (33)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 33/62

33

Zhi Ying Feng

Autoregressive Process

tX is a first-order autoregressive processAR(1)if tX is stationary and there is a process

2~ WN 0,tZ , uncorrelated with allXsfor s t and a constant such that:

1t t tX X Z

i.e. the next term depends on the previous value of the process and slowly converges to its mean.Note that an AR(1) can also be written as:

1

1 1

1

1

...h h

t t h t t t h

hh j

t h t j

j

X X Z Z Z

X Z

The mean is:

1

X t tE X E Z

The autocovarianceACVFis:

2

2

2

2

if 01

cov ,

if 01

X t t h

h

h

h X X

h

The autocorrelationsACFis:

1 if 0

0 if 0

XX h

X

hhh

h

The conditional mean is stochastic and depends onXt: 1 1 1 1| , ,... | 0t t t t t t t t E X X X X E Z X X E X

The conditional variance is constant, i.e. independent onXt:

21 1 1 1var | , ,... var | var t t t t t t t X X X X Z X X

Therefore, the conditional and unconditional mean and variance are different.

tX is an autoregressive process of orderp,AR(p)if:

1 1 2 2 ...t t t p t p t X X X X Z

Note: For anAR(1)process to be stationary, 1 , so it can be expressed as a linear process.

When 1 , the process is known as a random walk.

For a higher orderAR(p)process, there will also be conditions on 1 2, ... p for stationarity

AnAR(1)process is equivalent to aMA(∞)process and aMA(1)process is equivalent to an

AR(∞)process. Theoretically,MA(∞)is equivalent to and can be converted to aAR(∞)

process

1

1

0 0

ash

h j j

t t h t j t j

j j

X X Z Z h

AnAR(p)process has, in absolute values, in general decaying non-zeroACFfor all lags.

The smaller , the fasterACFdecays. Ifis negative, thenACFwill have alternating signs.

ACTL2003 Summary Notes - [PDF Document] (34)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 34/62

34

Zhi Ying Feng

Autoregressive Moving Average Models

tX is a first-order autoregressive moving average modelARMA(1,1), ifXtis stationary and:

1 1t t t t X X Z Z

Where ϕand θare constants and 2~ WN 0,tZ . Or alternatively:

t tB X B Z

Where

1 and 1B B B B

Then theARMA(1,1)equation can be written as:

t tB X B Z

Note that forXtto be stationary, the condition is the same as the AR(1) process, i.e. 1 .

tX is anARMAprocess of order (p,q), i.e.ARMA(p,q)if:

1 1 1 1... ...t t p t p t t q t q

t t

X X X Z Z Z

B X B Z

Where:

21 21 ... p

pz z z z

1 21 ... qqz z z z

The polynomialsϕ(z)and θ(z)have no common roots

TheARMA(p,q)process tX is stationary if the equation 0z has no roots on the unit circle,note that the roots can be complex i.e.:

0 for 1z z

tX is anARMA(p,q)process with drift if it is in the form:

1 1 1 1... ...t t p t p t t q t qX X X Z Z Z

If we estimate then remove the mean, then we have X as a normalARMA(p,q)process. The

drift here represents the expected change after differencing.

CausalityAnARMA(p,q)process tX is causal if there exists constants j such that j and:

t j t j

j

X Z

i.e.Xtis expressible in terms of current and past noise terms. Causal processes are a subset of

stationary processes, i.e. to be causal it must be stationary first. Causality is important in practice,

since ifXtis not causal then it depends on future noise term, which doesn’t make sense.

Theorem:

The tX satisfying t tB X B Z is causal if and only if all the roots of the equation 0z

are outside the unit circle.

ACTL2003 Summary Notes - [PDF Document] (35)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 35/62

35

Zhi Ying Feng

Invertibility

AnARMA(p,q)process tX is invertible if there exists constants j such that j and:

t j t j

j

Z X

i.e.Ztis expressible in terms of current and pastXt. IfXtis not invertible then it depends on futureprocesses, again it does not make sense

Theorem:

The tX satisfying t tB X B Z is invertible if and only if all the roots of the equation

0z are outside the unit circle.

Note:

All MA processes are causal, but AR processes might not be

All AR processes are invertible, but MA processes might not be

An equivalent condition for an AR(1) process to be causal is 1

An equivalent condition for an MA(1) process to be invertible is 1

Calculation of ACF

Linear Filter Method

Consider the causalARMA(p,q)process:

1 1 1 1... ...t t p t p t t q t qX X X Z Z Z

This can be written as:

t t t j t j

j

BX Z B Z Z

B

Note that the summation is only from 0 to infinity, as the process is causal.

First step is to determine the j by equating the coefficients in the equation:

t tB B Z B Z

1 1 2 11 0 1 2 11 ... ... 1 ...p q

p qB B B B B B

Then calculate the ACF by replacing theXtby their linear filter form. For h>0:

0 0

2 2

,

,

, push index back so that both term has

since , and , 0

t h t

j t h j j t j

j j

j h t j j t j t j

j h j

j h j t t t sj

h Cov X X

Cov Z Z

Cov Z Z Z

Cov Z Z Cov Z Z

This method is convenient for MA processes, since they are easily expressed as linear processes

ACTL2003 Summary Notes - [PDF Document] (36)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 36/62

36

Zhi Ying Feng

Yule-Walker Equations

Consider the causalARMA(p,q)process:

1 1 1 1... ...t t p t p t t q t qX X X Z Z Z

First, multiply both sides byXt-hand then take the covariance:

1 1 ...t h t t h t p t h t p hCov X X Cov X X Cov X X C

Which is equivalent to:

1 1 ... p hh h h p C

Where Chis the moving average components on the RHS:

1 1 ...h t h t t h t q t h t qC Cov X Z Cov X Z Cov X Z

For 0,1,2,...h p there are a total of 1p equations which can be expressed in matrix form:

11 00

1 21 11

2 1 31 2

11

1 ...0 1 ...0

1 ... 01 0 ... 11

... 02 1 ... 2

... 11 ... 0

pp

p

p

p

p pp p

p CC

p CC

p C

Cpp p C

Thus,

given Cjand ϕj, the ACF can be computed by solving the linear equation, i.e. by taking the inverse.

Forh p , we can find the ACF through:

1 1 ... p hh h h p C

To figure out C0, C1,…Cp, if ψj’s are available, then we can use:

t h j t h j j h t j

j j h

X Z Z

This gives Chas:

1 1

1 1

2 2

...

...

h t h t t q t q

j h t j t t q t q

j h

q q h

j h j j j h

j h j

C Cov X Z Z Z

Cov Z Z Z Z

Where 0 1 and 0 0

q h

j j hj

. However, if ψj’s are available, we may as well use method 1!

Therefore, this method is more convenient for AR processes

ACTL2003 Summary Notes - [PDF Document] (37)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 37/62

37

Zhi Ying Feng

Partial Autocorrelation Function

The autocorrelation function is used to determine the no. of lags for a MA(q) process, while the

partial autocorrelation function is necessary to determine the no. of lags for a AR(p) process. In

findingACF, we set up the Yule-Walker equations and converted into matrix form to solve for h .

By omitting the first equation, we can rewrite this matrix to solve for the parameters j s:

1 1

2 2

0 1 ... 1

1 0 ... 2

1 2 ... 0

1

2

p p

p

p

p

C

p

C

Cp

Or equivalently:

p p p pC

Where the red matrix is the covariance matrix Γand Chare:

1 1, , ... ,h t h t t h t q t h t qC Cov X Z Cov X Z Cov X Z

Therefore, the parameters are given by:

1p p p pC

Note that for anAR(p)process, Chare all zeroes since the noise terms are uncorrelated with the past:1

p p p

Define the vector 11 2 ...

T

h h h hh h h , then for anAR(p)process where 1

p p p

,

we have:

If h = p, then 1,...,T

p p p which implies that pp p

If h > p, then 1,..., ,0,...,0T

h p which implies 0hh .

- BecauseAR(p)process is a special case of anAR(h)process, where 0t for t > p

If h < p, then we do not know precisely what h is

Therefore, for anAR(p)process:

1 1 ...t t p t p t X X X Z

We have:

? if

if

0 if

hh p

h p

h p

h p

The h-th partial autocorrelation function is defined by:

1 if 0

if 0hh

hh

h

Where hh is the last element in the vector 1

h h

E.g., for anAR(2)process, thePACFat lag 2 is given by:

222 where

121

22

0 1 1

1 0 2

ACTL2003 Summary Notes - [PDF Document] (38)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 38/62

38

Zhi Ying Feng

Model Building

For a stationary time series, model building can be classified into 3 stages:

Model Selection

To determine the appropriate model we need to look at the sample ACF and PACF:

The sample autocovariance function is:

1

1n

ttt

x x x xn

The sample autocorrelation function is SACF:

Note that since SPACF will be calculated from SACF, both measures will have estimation error.

Once we have graphed the sample ACF and sample PACF, we first check that the sample ACF

should quickly converge to zero, which shows that the time series is stationary.

If the sample ACF decreases slowly but steadily from a value near 1, then the data need to

be differenced before fitting the model

If the sample ACF exhibits a periodic oscillation, then there may be some seasonality still.

Then we compare the sample ACF and PACF to the theoretical ACF and PACF of different

processes to see if there is a match.

For an AR(p) process:

Sample ACF shows exponential decay towards near-zero values Sample PACF shows significant values up to lagp, then near-zero values thereafter

For an MA(q) process:

Sample ACF shows significant values up to lag q, then near-zero values thereafter

Sample PACF shows exponential decay towards near-zero values

If neither of these situations occur, then consider an ARMA(p,q) process. However, the sample

ACF and PACF of an ARMA(p,q) process is very flexible, but in general the ACF and PACF are

the sum of the ACF and PACF of an AR(p) and a MA(q) process. So for anARMA(p,q)model, it

should display: ACF that decays towards zero after lagp, either direct or oscillatory

PACF that decays towards zero after lag q, either direct or oscillatory

ACTL2003 Summary Notes - [PDF Document] (39)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 39/62

39

Zhi Ying Feng

Model Selection Criteria

When deciding the number of parameters, it is a trade-off between goodness of fit and the

simplicity of the model. More parameters mean more flexibility and a better sample fit. However,

more parameters also mean that each parameter is estimated with more uncertainty, i.e. higher

standard error.

A systematic method to choosepand q, i.e. the parameters, is to minimise an information criterion.

Information criteriahave the form:

IC , , 2 log , ,p q L P n p q

Where the first term, i.e. the log-likelihood function, always decreases with the no. of parameters.

However, the second “penalty” term, in terms of the no. of observations and the parameters, always

increases with the no. of parameters. Thus the IC seeks to balance out bias and variance.

There common IC’s are:

Information Criteria Penalty TermAIC 2 1p q

BIC 1 logp q n

AICc 2 1

2

p q n

n p q

Parameter Estimation

Mean Estimation

Let tX be a weakly stationary process with meanµ. An estimator ofµis the sample mean:

1

1 n

t

t

X Xn

This estimator is unbiased:

1

1 n

t

t

E X E Xn

This estimator is also consistent:

1 1

21 1

21 1

2

1 1var cov ,

1

1

n n

t s

t s

n n

s t

n n s

s h s

n

h n

X X X

n n

t sn

hn

n hh

n

Thus, the variance 0 as n , i.e. it is consistent, if h

h

, which is true since for a

stationary time series the ACF should eventually converge to zero regardless of whether it is AR or

MA. For a more detailed proof, see ACTL2003 Proofs.

ACTL2003 Summary Notes - [PDF Document] (40)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 40/62

40

Zhi Ying Feng

Parameter Estimation

If the sufficient number of sample ACF is known, then one way to estimate the parameters in a

model, i.e. θ, ϕ, σ2,is by equating them to the theoretical ACF derived from the Yule-Walker

equations. Use this to set up equations in terms of the parameters and solve. If there are more than

one solution, choose the solution that makes the model causal and/or invertible.

Another method is to use maximum likelihood estimation. Suppose we have a set of errors that are

assumed to be normal, then theXtthemselves are also normal so:

1,..., ~ 0,T

n n nX X N

X

Where:

βis a fixed set of parameters in the model, e.g. 2,T

for MA(1) model

n is the symmetric covariance matrix of Xn, expressed in terms of the parameters

0 1 2 ... 1

1 0 1 ... 2

2 1 0 ... 3

1 2 3 ... 0

n

n

n

n

n n n

Assuming that the observations follow a multivariate normal distribution, the likelihood function is:

1

1 22

1 1exp

22 det

Tn n nn

n

L

X X

The maximum likelihood estimator is the value that maximises the likelihood functionL(β).

Under the normality assumption, we have the asymptotic distribution of the MLE:

~ , var A

N

Where the variance can be estimated by:

12 ln

varT

L

This result can be used to compute confidence intervals for the parameters and for hypothesis

testing about the parameters, e.g. whether to include certain parameters.

Model Diagnosis

The residuals of the proposed model are:

t ttZ X X

Wherêare the fitted values computed using the estimated parameters. If the proposed model is a

good approximation to the underlying time series process, then the residuals should be

approximately a white noise process. There are several methods to check this:

Plot of Residuals

If the plot ofZtagainst tshows any trend or patterns in fluctuations, then the model is inadequate.

ACTL2003 Summary Notes - [PDF Document] (41)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 41/62

41

Zhi Ying Feng

SACF of Residuals

If ~ WN 0,1tZ , then it can be shown that the ACF ofZthas the distribution:

1

~ 0,Z h Nn

Therefore, at 95% significance, the sample ACF should be should be within the range:

1.960

n

Since for a white noise process, 0 for 1h h . If too many values of SACF lie outside this

range, then the model does not fit the process well and more parameters will be needed.

Ljung-Box Test

We can test the null hypothesis that ~ WN 0,1tZ using the Ljung-Box test statistic. This tests

whether jointly, all correlations at lags greater than zero are zero. Under the null hypothesis, for

large n, the Ljung-Box test statistic is:

2

2

1

2 ~h

Zh p q

j

jQ n n

n j

Where nis the no. of time series observations. In practice, his chosen to be between 15 and 30, and

nshould be large, i.e. 100n .

Using the Ljung-Box test, we reject the null hypothesis, i.e. the residual is not white noise, if:

2, 1h p qQ

Non-Stationarity

In practice, a non-stationary time series may exhibit a non-stationary level of mean, variance or

both. Transformations can be used to remove non-stationarity, e.g. taking logarithm of an

exponential trend can remove non-stationary mean, or it can smooth out the variance

Stochastic Trends

Apart from a deterministic trend or seasonality, a stochastic trend also causes non-stationarity. A

stochastic trend is when the noise terms have a permanent effect on the process. Consider the

random walk rewritten iteratively

21 0

1

where ~ WN 0,t

t t t t j t

j

X X Z X X Z Z

In this case, the effect of anyZtonXt+his the SAME for all 0h , since 1 . This is not true for

stationary processes like AR(1) or ARMA(1,1), where 1 , since depending on h,Ztwill have

different level of impact since the coefficient j changes, e.g.:

1

jt t j t j

X Z Z

Since noise terms have a lasting impact, the correlation betweenXtandXt-his relatively high, so a

distinctive feature of a random walk is a very slowly decaying positive ACF. Note that differencing

the random walk once obtains a stationary series!

11t t t t t Y B X X X Z

ACTL2003 Summary Notes - [PDF Document] (42)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 42/62

42

Zhi Ying Feng

ARIMA Model

The process tY is an autoregressive integrated moving average modelARIMA(p,d,q)with order of

integration dif:

1 d

t tB B Y B Z

That is tY becomes a stationaryARMA(p,q)process after differencing dtimes, i.e:

1 ~ ( , )d

t tW B Y ARMA p q

t tB W B Z

E.g. consider the process defined by:

1 2 3 10.6 0.3 0.1 0.25t t t t t t X X X X Z Z

This process can be re-written as:

1 2 3 1

2 3

2

1

0.6 0.3 0.1 0.25

1 0.6 0.3 0.1 1 0.25

1 1 0.4 0.1 1 0.25

d

t t t t t t

t t

t t

BB B

X X X X Z Z

B B B X B Z

B B B X B Z

Therefore, this is a ARIMA(2,1,1) process.

SARIMA Model

Suppose that tX exhibits a stochastic seasonal trend, i.e. whereXtnot only depends on

1 2, ,...t tX X but also 2, ,...t s t sX X .

To model this, we can use a SARIMA , , , ,s

p d q P D Q process, given by:

1 11 1 ... 1 1 ...P QDd s s s s s

P t Q t

AR p MA qI d I DAR P MA Q

B B B B B X B B B Z

Where AR(P), MA(Q) and I(D) are polynomials with the term sB .

E,g, consider an 12

SARIMA 1,0,0 0,1,1 process given by:

12 12

1 1 1

1 1 1t t

AR I MA

B B X B Z

This can be written as:

12 1312

12 1 13 12

1 t t t

t t t t t t

B B B X Z Z

X X X X Z Z

Note thatXtdepends onXt-12,Xt-1,Xt-13as well asZt-12.

SARIMA models can always be rewritten as an ARIMA model, with some constraints on the new

parameters. However, converting to ARIMA always requires more parameters than the SARIMAmodel, which leads to better in-the-sample fit, but worse out-of-sample fit, i.e. predictions. Thus,

when forecasting it is better to use SARIMA models than converting it to ARIMA models.

ACTL2003 Summary Notes - [PDF Document] (43)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 43/62

43

Zhi Ying Feng

Dickey-Fuller Test

The Dickey-Fuller test is a unit root test, i.e. it tests whether there is a unit root in the time series.

Note that if the polynomial z has a unit root, then the time series is not stationary and requires

differencing.

Consider a causal time series process tX with:

21 ~ WN 0,t t tX t X Z Z

To test for stochastic trend, we test the hypothesis:

0 1: 1 against : 1H H

Note that if 1 then there is a unit root, which leads to a stochastic trend so Xtis not stationary.

We can write the above model as:

1 1

* *1

1

where 1

t t t t t

t t t

X X X Z

X t X Z

Then the alternative and equivalent null hypothesis is:

*0: 0H

This is known as the Dickey-Fuller Test, and the test statistic is:

*

*se

Once the parameters *, , have been estimated, reject the null hypothesis * 0 if is LESSthan

the critical value, which will be negative since * is negative. Note that rejection of the null

hypothesis implies that the time series is stationary and accepting the null hypothesis implies that

differencing is required.

The distribution of this test statistic is a non-standard distribution depending on α andβ, with

asymptotic percentiles.

Probability to the left 0.01 0.05 0.10

Standard Normal -2.33 -1.65 -1.28

DF with 0, 0 -2.58 -1.95 -1.62

DF with 0 -3.43 -2.86 -2.57DF (unconstrained) -3.96 -3.41 -3.12

Note that the DF distributions are much more spread out than the standard normal. When choosing

whether or not to do DF with 0, 0 , there is a trade-off:

If α or β is set to 0 when in fact the true values are nonzero, the test becomes inconsistent

and asymptotic critical values are no longer valid. Decisions based on the test are likely to

be wrong, i.e. it might confuse deterministic and stochastic trends

However, allowing α or β to be non-zero reduces the power of the test, i.e. harder to detect a

false null hypothesis

How to determine α and β usually depends on what type of series we have. E.g. if a linear trend

exists then we expect the difference to only have a constant, so 0, 0

ACTL2003 Summary Notes - [PDF Document] (44)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 44/62

44

Zhi Ying Feng

Overdifferencing

Let Utbe anARMA(p,q)process:

t tB U B Z

Define 1 d

t tV B U , i.e. ndifferences. Then we have:

1 1d dt t t t BV B Z B V B B Z

B

Therefore, Vtbecomes anARMA(p,q+d)process since 1 d

B B is a polynomial with order q+d.

However, this MA polynomial has a unit root so the process Vtis not invertible. Therefore, we

should avoid overdifferencing as it will give us a non-invertible process, even though it’s still

stationary.

Cointegrated Time Series

Many time series in finance and economics are non-stationary (random walks), e.g. CPI and GDP,

but at the same time do not move too far apart from each other. Cointegration is used to model non-stationary series that move together

For a bivariate process ,1 ,2, T

t t tX X X , we define:

,1

,2

dtd

t dt

XX

X

A bivariate process ,1 ,2, T

t t tX X X is integrated of order d, i.e.I(d), if dtX is stationary but

1d

tX

is not.

AnI(d)bivariate process is cointegrated if there is a cointegrating factor 1 2, T

such that

TtX becomes stationary, i.e.:

1 ,1 2 ,2 ~ 0t tX X I

If ,1tX and ,2tX are cointegrated, then:

,1tX and ,2tX are bothI(1)

t t te Y X isI(0)

Cointegration implies that there is a common stochastic trend between ,1tX and ,2tX

In many financial applications, the cointegrating vector is in the form:

1, Ta

That is,Xt,1andXt,2are random walks themselves but the difference ,1 ,2t tX aX is stationary. The a

term can be estimated by using regressing:

,1 ,2t t tX aX

Then we expect in the long run that the two processes converge to:

,1 ,2t tX aX

ACTL2003 Summary Notes - [PDF Document] (45)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 45/62

45

Zhi Ying Feng

Time Series Forecasting

Time Series and Markov Property

Recall that a process , 0,1,...tX t has the Markov property if all future states of the process

depend on its present state alone and not on any of its past states:

1 1Pr | ,... , Pr |nt s s n s t sX A X x X x X x X A X x

For all times 1 2 ... ns s s s t and all states 1 2, ,..., ,nx x x x S and all subsetsAof S

AR Processes

An AR(1) process has the Markov property, since the conditional distribution of 1nX given all

previous tX depends only on nX . However, an AR(2) process does not have the Markov property,

since the conditional distribution of 1nX given all previous tX depends on both nX and 1nX . Thus in

general, AR(p) processes do not have the Markov property forpgreater than 1.

However, for an AR(2) process if we define a vector-valued process Yby 1, T

t t tY X X , then Y

has the Markov property since the conditional distribution of 1nY given all previous tY depends only

on nY . In general, for an AR(p) process we can define a vector-valued process withpelements that

will have the Markov property.

MA Processes

A MA(q) process can never have the Markov property, even in vector form, since the distribution of

1nX depends on the value of nZ and in theory no knowledge of the value of nX or any finite

collection of 1,...,T

n n qX X will never be enough to deduce the value of nZ .However in practice,

we can estimate nZ very accurately so Markov simulation techniques still applies.

ARIMA Processes

For an ARIMA(p,d,q) process, if q is zero, i.e. no moving average component, then the process

behaves similar to AR(p) in terms of Markov property, i.e. it might not be Markov but an

appropriate vector form can be Markov.

If d is also zero and p is greater than 1 then it is essentially an AR(p) process. If both p and d are

equal to 1, then the model can be written as:

1

21 1

1 1 1 2

1 1

1

1

t t

t t

t t t t

B B X Z

B B B X Z

X X X Z

This is clearly not Markov. However, it can still be written as a vector-valued process that has the

Markov property. In general, the vector process needsp+dterms to be Markov

If q is not equal to zero, i.e. it has a moving average part, then it will never be Markov for the same

reason that MA(q) is never Markov.

ACTL2003 Summary Notes - [PDF Document] (46)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 46/62

46

Zhi Ying Feng

k-step Ahead Predictor

Assume that we have the following information:

(1)

All observations for tX up until time n: 1,..., nx x

(2)

An ARMA model has been fitted to the data

(3)All parameters of the model 2, , have been estimated

(4)The process tZ is known for all up until time n

k-step ahead forecasts |n k nX is one method to forecast/predict n kX by using the given

observations up until time n:

| 1| ,...,n k n n k nX X X X

This is obtained by:

Replacing the random variables 1,..., nX X by their observed values 1,..., nx x

Replacing the random variables 1 1,...,n n kX X by their forecast values |n k nX

Replacing the random variables 1 1,...,n n kZ Z by their expectations, i.e. 0

If 1,..., nZ Z are unknown, replacing 1,..., nZ Z by the residuals 1,..., nZ Z where

1| ,...,i i nZ E Z X X

To forecast |1n kX we have:

1| 1 1

1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1

| ,...,

... ... | ,...,| ,..., ... | ,..., | ,..., ... | ,...,

... ...

n n n n

n p n p n n q n q n

n n p n p n n n q n q n

n p n p n q n

X X X X

X X Z Z Z X XX X X X X X Z X X Z X X

X X Z Z

q

To forecast |2n kX we have:

2| 2 1

1 1 2 2 1 1 2 2 1

1 1 1 2 1 1 1 1 2 1

1|1 2

| ,...,

... ... | ,...,

| ,..., ... | ,..., | ,..., ... | ,...,

...

n n n n

n p n p n n n q n q n

n n p n p n n n q n q n

n np n

X X X X

X X Z Z Z Z X X

E X X X X X X Z X X Z X X

X X

2 2...p n q n qZ Z

In practice we do not observe tZ so if there are MA terms in the model, then there are more values

of tZ than tX and there is no way of determining all of them from data. Consider the MA(1) process:

1 10.5 0.5t t t t t t X Z Z Z X Z

Then by repeated substitution we have:

1

0.5 0.5n

j n

n n j

j

Z X Z

To determine nZ we need 0Z first, one simple way is to assume that 0 0Z . If the process is invertible,

then this assumption will have negligible effect on |n k nX if nis large, since 0.5 n

will be large.

ACTL2003 Summary Notes - [PDF Document] (47)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 47/62

47

Zhi Ying Feng

Best Linear Predictor

Under assumptions (1), (2) & (3), the best linear predictor n n hP X of n hX has the form:

0 1 2 1 1...n n h n n nP X a a X a X a X

We need the values of 0 ,..., na a that minimises the mean squared error:

2

n h n n hMSE X P X

The general solution is found by minimising the n+1first-order conditions:

1

1

0 0

0 0

0 0

n h n n h

n h n n h n

n n h n n h

MSE a X P X

MSE a X P X X

MSE a X P X X

Note that due to the very first condition, the expected prediction error is:

0n h n n hX P X

i.e. the prediction is unbiased.

The first equation can be simplified to:

0 1 2 ... 0na a a a

While the second can be simplified using a trick:

1 2

,

0 1 ... 1

n h n n h n n h n n h n

n h n n h n n h n n h

n h n n h n

n

X P X X X P X X

X P X X X P X

Cov X P X X

h a a a n

Then applying this trick to every subsequent MSE minimising condition, we end up with a system

of nequations (excluding the first one) that is very similar to the process of finding PACF

coefficients. This system can be represented in matrix form:

n n nh a

Where:

0 1 ... 1

1 0 ... 2

1 2 ... 0

n

n

n

n n

1

2

n

n

a

aa

a

1

1

n

h

hh

h n

Therefore, the solution is:

1n n na h

Once this is known, the a0can be found by rewriting the first equation in matrix form:

0 1 na T

1 a

Where T1 represents a column vector of 1’s.

ACTL2003 Summary Notes - [PDF Document] (48)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 48/62

48

Zhi Ying Feng

Part 3: Brownian Motion

Definitions

A stochastic process , 0tX t is a Brownian motion process if:

0 0X (1)

, 0tX t has stationary and independent increments (2)

For all 0t , 2~ 0,tX N t (3)

The conditions (2) and (3) together are equivalent to:

2~ 0, for 0tX N t t

, 0tX t has stationary increments and cov , 0s t sX X X for0 s t since if two

normal r.v. have 0 covariance then they are independent’

A standard Brownian motion , 0tB t is when the volatility is one, i.e.: 1 . Any Brownian

motion , 0tX t can be standardised by setting:

tt

XB

Note that Brownian motion is continuous for all values of t.

A Brownian motion process exhibits some strange behaviour:

A Brownian motion is continuous w.r.t. time teverywhere, but differentiable nowhere

Brownian motion will eventually hit any and every real value no matter how large, or how

negative. Likewise, no matter how large, it will always (with probability 1) come back down

to zero at some future time

Once a Brownian motion hits a certain value, it immediately hits it again infinitely often,

and will continue to return after arbitrarily large times

Brownian motion is fractal, i.e. it looks the same regardless of what scale you examine it

Properties of Brownian Motion

Consider a Brownian motion , 0tX t with volatility :

For anyswith 0 s t ,XsandXtare NOTindependent. However, by independence of

increments,

ands t sX X X

are independent.

This can be shown by finding the covariance between anyXsandXt

2

cov , cov ,

cov , cov ,

s t s s t s

s s s t s

X X X X X X

X X X X X

s

For anysand t, sX and t sX X are both normally distributed, i.e.:

2

2

~ 0,

~ 0,

s

t s

X N s

X X N t s

ACTL2003 Summary Notes - [PDF Document] (49)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 49/62

49

Zhi Ying Feng

Brownian Motion and Symmetric Random Walk

Define a symmetric random walk using random variable:

1 w.p. 0.5

1 w.p. 0.5iY

With mean 0iE Y and var 1iY

Now divide the interval 0,t into equal intervals of length t so that there is a total of tt

n

intervals. Suppose that at each t , the process can either go up or down by size x . Now letX(t)be

the position at time t, then:

1 2 ... tt

X t x Y Y Y

Then, if we let x t and let 0t then the limiting process ofX(t)is a Brownian motion.

This is because: , 0tX t has independent increments, since Y’sare independent (changes in the value of

the random walk in non-overlapping time intervals are independent)

, 0tX t has stationary increments, since the distribution of the change in position of the

random walk over any time interval depends only on the length of the interval

, 0tX t has approximate normal distribution with mean 0 and variance 2t as t

converges to 0, by the Central Limit Theorem on i.i.d. random variables

Brownian Motion with Drift

A stochastic process , 0tX t is a Brownian motion process with drift coefficient and variance

parameter 2

if it satisfies:

0 0X

, 0tX t has stationary and independent increments

For all 0t , 2~ ,tX N t t

A Brownian motion with drift can be converted to a standard Brownian motion by defining:

tt

X tB

Similarly, a standard Brownian motion can be converted to a Brownian motion with drift by

defining:

t tX t B

Geometric Brownian Motion

Let , 0tX t be a Brownian motion process with drift and volatility 2

, then the process

, 0tY t defined by:

expt tY X

is called a geometric Brownian motion. This is useful when you don’t want negative values

ACTL2003 Summary Notes - [PDF Document] (50)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 50/62

ACTL2003 Summary Notes - [PDF Document] (51)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 51/62

51

Zhi Ying Feng

Now consider the behaviour of ort tdt dB dB dt :

222

2 2

var

t t t t

t t t t t t t

t B t B B

t B t B B t B B

t t

o t

Then in differential form:

0 var t tdt dB dt dB o dt

Taking o dt as zero, this is again reduced to a constant:

0tdt dB

Finally, consider 2

dt :

2

0dt o dt

The following table summarises the results:

dt tdB

dt 0 0

tdB 0 dt

Stochastic Differential Equations

Assume that a stochastic processXtsatisfies a stochastic differential equation (SDE) in the form:

, ,

t t t t

dX X t dt X t dB

Consider another stochastic process defined as

,t tY F X t

Then the stochastic differential equation satisfied by Ytis given by the Ito’s formula:

22

2

, , ,1,

2t t tt x X t x X t x X

F x t F x t F x tdY dX dt X t dt

x t x

Proof:

The derivative of the second order Taylor series expansion for a function of two variables is:

2 21

, 22x y xx yy xydf x y f dx f dy f dx f dy f dxdy

Applying this to Ytgives:

2 2 22 2

2 2

, ,

, , ,12

2

t t

t t t

t t

x X x X

t t

x X x X x X

F x t F x tdY dX dt

x t

F x t F x t F x tdX dt dX dt

x tx t

However, we know from the previous section that 2

0tdt dB dt , so:

, ,

t t t t dt dX dt X t dt X t dB

ACTL2003 Summary Notes - [PDF Document] (52)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 52/62

52

Zhi Ying Feng

And:

22

222 2

22

2

, ,

, , 2 , ,

,

,

t t t t

t t t t t t

t t

t

dX X t dt X t dB

X t dt X t dB X t X t dB dt

X t dB

X t dt

Therefore, the expansion reduces to the Ito’s formula

E.g. consider a Brownian motion , 0tX t with 0 drift and variance 2 , find the SDE for 2

ttX

We have the SDE forXt:

t tdX dB

Let 2 ,t t tY tX F X t where 2,F x t tx so the derivatives are:

2

22

, , ,2 2

F x t F x t F x ttx x t

x t x

Then using Ito’s formula, the SDE for 2ttX is given by:

2 2 2

2 2

1, 2 2

2

2

t t t t t

t t t

d tX dF X t tX dX X dt t dt

tX dB X t dt

Stochastic Integration

Consider a Brownian motion , 0tX t with zero drift and variance 2 . Letfbe a function with

continuous derivative on [a,b]. The stochastic process Ytdefined by the stochastic/Ito’sintegral is:

1

1

1

1max 0

limk k

k k

nb

t t k t t a n

kt t

Y f t dX f t X X

Where 0 1 ... na t t t b is a partition of[a,b].

This stochastic process Ytsatisfies the properties:

Mean of the Ito integral is zero, i.e.

1

1

11

max 0lim 0k k

k k

nb

t k t t a nk

t tf t dX f t X X

- Note that functions of tin this case includes stochastic variables, e.g. 0nt tB dB

Thus, the variance and expectation squared are equal:

1

1

1

22

1

1max 0

2 21 1

1max 0

2 2

var lim var

lim

k k

k k

k k

nb b

t t k t t a a n

kt t

n

k k kn

kt t

b

a

f t dX f t dX f t X X

f t t t

f t dt

ACTL2003 Summary Notes - [PDF Document] (53)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 53/62

53

Zhi Ying Feng

Part 4: Simulation

Continuous Random Variables Simulation

When simulating continuous random variables we have the probability/cumulative distribution of

the random variable. The cumulative probabilities will always have a Uniform(0,1) distribution.

Since if ~ U 0,1X then 1

F X

is distributed asF. This means that the probability that aprobability is in [a,b]is the same as the probability that a probability is in [c,d]. Therefore, the first

step in simulation is usually to generate random variables from a Uniform(0,1).

Pseudo-Random Numbers

The procedure to generate pseudo-random numbers, i.e. from a Uniform(0,1), is:

1. Start with a seedX0and specify positive integers a, cand m, which are usually given

2. Generate pseudo-random numbers recursively using:

1 modn nX aX c m

3.

1nXm

will be an approximation to a Uniform(0,1) random variable

Inverse Transform Method

Consider a continuous random variable with cumulative distribution functionFXand a Uniform(0,1)

random variable U. Define:

1XX F U

ThenXwill have the distribution function:

1

Pr

Pr

Pr since is strictly increasing

using

X

X

X X

U X

X U

F x X x

F U x

U F x F

F F x

F x F y y

Then the inverse transform procedure to generate r.v. from a cumulative distributionFis:

1.

Compute 1XF , from the p.d.f or c.d.f, if possible

2. Generate a Uniform(0,1) random variable U

3.

Set 1

XX F U

, thenXwill be from the distributionF

Note that for the inverse transform method to work, we must be able to calculate the inverse of the

c.d.f., i.e. 1XF must have an explicit expression.

E.g. to simulate an exponential random variable, first find 1XF :

1

1exp 1 exp ln 1

x

X xF x s ds x F x y

Suppose we generate a random variableUfrom Uniform(0,1). Then we set:

1 1 1

ln 1 lnx F U U U

Thereforexis a random variable with an exponential distribution.

ACTL2003 Summary Notes - [PDF Document] (54)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 54/62

54

Zhi Ying Feng

Discrete Random Variable Simulation

The inverse transform method can also be applied to simulate discrete random variables. Consider

one with probability mass function:

Pr , for 1, 2,...

1

j j

jj

X x P j

P

Then the procedure to simulate r.v. from this p.m.f. is:

1. Generate a Uniform(0,1) random variable U

2. Set:

1 1

2 1 1 2

1

1 1

if

if

if j j

j i ii i

x U P

x P U P PX

x P U P

E.g. simulate random variables from a geometric distribution.

Consider the probability mass function for a geometric random variable:

1

Pr 1 1j

jP X j p p j

Notice that:

1

1

1

1

1

1

1 Pr 1

1 1

11 geometric sum

1 1

1 1

j

i

i

i

j

j

j

P X j

p p

p p

p

p

Then we have:

1

1 1

1

1

1 1 1 1

1 1 1

j j

i ii i

j j

j j

P U P

p U p

p U p

Since Uis a Uniform(0,1) random variable, this is equivalent to:

11 1

ln 1 ln 1 ln 1

ln1 since ln 1 0

ln 1

j jp U p

j p U j p

Uj j p

p

Therefore to generate a random variable from a geometric distribution:

1. Generate a uniform random number U

2.

SetXasj, wherejis the first integer for which:

ln

ln 1

Uj

p

ACTL2003 Summary Notes - [PDF Document] (55)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 55/62

55

Zhi Ying Feng

Acceptance-Rejection Method

Suppose we have a method, e.g. inverse transform, to simulate an r.v. with densityg(y). Then we

can use this as the basis for simulating from the continuous distribution with density f(x).

The procedure to simulate random variables using the rejection method is:

1.

Choose a distributiongfor which you know you can simulate outcomes2.

Set cbe some constant such that c f y g y for ally

3. Simulate a random variable Ywith density functiong(y)

4. Simulate a Uniform(0,1) random variable U

5. Accept this as the random number, i.e. setX= Y, if:

f YU

c g Y

6. Otherwise, reject and return to step 3.

Therefore, the value forXis YNwhereNis the number of iterations until a random number is

accepted. We want to be as efficient as possible, i.e. minimise the no. of iterations, by: For efficiency, choose a densityg(y)similar tof(y), e.g. exponential to normal

For efficiency, choose the smallest value of cthat satisfies the inequality, using calculus, i.e.:

maxc f y g y

E.g. simulate the absolute value of a standard normal random variable X Z

Firstly, X Z has the density:

22

exp 22

x

f x

1. Let another random variable Ybe from the density expg x x , note that this is

comparable tof(x)

2. Choose the smallest value of csuch that c f y g y , e.g.:

212 2

max max exp2

f x

g x

xe ec

Since the exponential part will always be less than 1

3.

Generate U1and U2from Unif(0,1)4. Compute Yfrom the densitygusing inverse transform, and using U1, i.e.:

1logY U

5. Now check if:

22

2 1

1 1exp 1 exp log 1

2 2

f YU Y U

cg Y

6. If true, then set 1logX Y U , if false then return to step 3 and repeat.

7.

To now generate a standard normal random variableZ, generate U3and set:

3

3

if 0.5

if 0.5

X UZX U

ACTL2003 Summary Notes - [PDF Document] (56)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 56/62

56

Zhi Ying Feng

Simulation Using Distributional Relationships

Gamma Distribution

Recall that the waiting time to the nth event in continuous Markov chains has a Gamma distribution.

To simulate random variables from Gamma ,n distribution, the procedure is:

1.

Generate nindependent Uniform(0,1) random variables 1 2, , ..., nU U U

2.

Simulate an exponential random variable using the inverse transform method

1 1lnF U U

3. Since the sum of nindependent Exp(λ) random variables has a Gamma distribution, set:

1

1ln ~ Gamma ,

n

i

i

X U n

Chi-Squared Distribution

The sum of nstandard normal r.v. has a chi-squared distribution with ndegrees of freedom:

2 2

1 ~

n

i ni Z

Alternatively, a chi-squared distributionwith an even degree of freedom 2kis equivalent to a

Gamma , 1 2k distribution. If the degree of freedom is odd, i.e. 2k+1we can add on an extraZ2

term, whereZis standard normal. That is:

21

22 11

2 ln ~

2 ln ~

k

i ki

k

i ki

U

Z U

Poisson Distribution

Recall that the no. of events within one period where events follow an Exp(λ) distribution is Poi(λ).

To simulate random variables from a Poisson(λ) distribution, the procedure is:

1. Generate a Gamma ,n random variable using the above steps

2. Since the waiting time until the nth event has a Gamma ,n distribution, which is a sum of

independent Exp(λ) random variables, we set:

1

1max : ln 1 ~ Poi

n

i

i

X n U

i.e. the total number of arrivals within 1 period.

Normal Distribution

One method of simulation random variables from a normal distribution is the Box-Muller approach:

1.

Generate two random variables U1and U2from Uniform(0,1).

2.

Set:

1

21 2

1

21 2

2 ln cos 2

2 ln sin 2

X U U

Y U U

ThenXand Yare a pair of independent standard normal random variables

3.

A pair of uncorrelated normal random variables can be then derived using:andX Y

4. Note that ifXis normal then to generate a lognormalrandom variableLwe set XL e

ACTL2003 Summary Notes - [PDF Document] (57)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 57/62

57

Zhi Ying Feng

Monte Carlo Simulation

Suppose that 1,..., nX X X is a random vector with a given joint density function 1,..., nf x x and

we want to compute:

1 1 1... ,..., ,..., ...n n ng X g x x f x x dx dx

But with a functiongfor which this is impossible to compute. Then we can approximate thisintegral by using Monte Carlo simulation.

Let i

X and i

Y be the ithsimulated sample path ofX andY . Then the MC simulation procedure is:

1. Generate a random vector 1 1 1

1 ,..., nX X X

with joint density 1,..., nf x x and compute

1 1Y g X

2. Generate second, independent from step 1, random vector 2 2 2

1 ,..., nX X X

with joint

density 1,..., nf x x and compute 2 2Y g X

3.

Repeat this process until r(a fixed number) i.i.d. random vectors are generated:

, for 1,2,...,i i

Y g X i r

4. Estimate E g X by using the arithmetic average of the generated Y’s

1

... r

Y YY g X

r

This method works due to the strong law of large numbers:

1 ...lim

ri

r

Y YY g X

r

Expectation and Variance

Y is an unbiased estimate of g X :

1

1 1

1 1,...,

r ri i

n

i i

Y Y Y g X X r r

Since each Yiis independent and identically distributed with the distribution of 1,..., ng X X

The variance ofY is given by:

21 1

var1 1var var var

ir r

i i

i i

YY Y Y

r rr

Note that usually we do not know var

iY , so we estimate it using the sample estimate:

2

1

1var

1

ri

i

Y Y Yr

In the next 3 sections, we will describe some techniques to REDUCEthis variance

ACTL2003 Summary Notes - [PDF Document] (58)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 58/62

58

Zhi Ying Feng

Antithetic Variables

Reducing the variance of the estimate using antithetic variable involves generating estimates with

negative correlationthen adding these estimates to obtain a final estimate.

Assume that kis even and that 2r . The antithetic variates procedure is:

1.

Generate a set of nvariates 1 11 ,..., nX X and determine

1 1 11 ,..., nY g X X

2. Generate a set of nvariates 1 1

1 ,..., nX X

, which are correlated with 1 1

1 ,..., nX X and

determine 1 1 1

1 ,..., nY g X X

3. Repeat steps 1 and 2 times to form 21 , ,...,Y Y Y

and

1 2, ,...,

rY Y Y

4. Calculate the arithmetic average of the Y’s:

1 2

1 1

1 1and

ri i

i i

Y Y Y Y

5.

Use:

1 2

2

Y YY

as the final estimate for Y g X

Using this method, as long as the correlation between 1Y and 2Y are negative, then the variance will

be reduced. One example of this is by using:

X ~ Uniform(0,1) for the first set of nvariates, e.g. 1Y

1 –X. which is also Uniform(0,1) for the second set of nvariates, e.g. 2Y

Then as long asg(X)is a monotonic increasing/decreasing function, we will have a negative

correlation:

cov ,1 1X X

To show why this method reduces the variance, consider the variance of the estimator using the

antithetic variable

1 2

1 2 1 2

1 1 2

var var 2

1

var var 2 var var 4

1var 1 since var var

2

var11 using 1 but with only the first set up to

2

var1

var

var for 0

AV

i

i

i

Y YY

Y Y Y Y

Y Y Y

Y

Y

r

Y

Yr

ACTL2003 Summary Notes - [PDF Document] (59)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 59/62

59

Zhi Ying Feng

Control Variates

If we wish to evaluate the expected value g X and there is a functionfsuch that the expected

value off(X)can be evaluated analytically with:

f X

Then we can evaluate g X using:

1

g X g X a f X

g X a f X

a g X af X

Where ais a parameter we can choose.

Therefore, instead of evaluating g X directly, we evaluate (1) using a control variates:

1

1

cv

r

i i

i

g X Y

a g X af X

a g X af X r

Then, this estimator will have the variance:

21

2

1var var

var

var var 2 ,

r

cv i i

i

i i

i i i i

Y g X af X r

g X af Xr

g X a f X aCov g X f X

r r r

This variance can be minimised by solving:

var0

cvY

a

With the solution being the best choice of a:

,

vari i

i

Cov g X f X a

f X

Substituting this value back, we have that the minimised variance is:

2

,varvar

var

var

var

i iicv

i

i

Cov g X f X g XY

r r f X

g X

r

Y

Therefore, using a control variates has decreased the variance compared to the original estimator

ACTL2003 Summary Notes - [PDF Document] (60)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 60/62

60

Zhi Ying Feng

Importance Sampling

We can write:

Xg X g x f x dx

WherefXis the density ofX.

Now consider another equivalent probability density tof(x)denoted by h(x)such that the zero

probability events agree under both densities. E.g. 0f x if and only if 0h x . Let Ybe the

random variable with density function h(x).

Then we can write:

X

Y

f xg X g x h x dx

h x

f Yg Y

h Y

Then, we can simulate Yifrom density h(x)and estimate the expectation X g X with:

1

1 ri

h i

i i

f YY g Y

r h Y

If it is possible to select a probability density h(x)so that the random variable

f Xg X

h X

has a smaller variance then the estimator will be more efficient, i.e.:

var var

f Xg X g X

h X

To do this, we need to select a density h(x)such that the ratio of the two densities, i.e. f x h x is

large wheng(x)is small and vice versa.

Number of Simulations

Ideally, we want to carry out simulations as efficiently as possible since the no. of simulations

required for accuracy is quite large. Assume we generate nsamples from a known distribution. Toestimate its mean we will use the sample average as an estimator:

1

1 ri

i

Y Yr

This is an unbiased estimator, i.e.:

Y

Its variance is given by:

2varvar

iY

Yr r

ACTL2003 Summary Notes - [PDF Document] (61)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 61/62

61

Zhi Ying Feng

However, the value of var

iY is usually not known, so we estimate it using the sample variance

from the first kruns where k < nand usually at least 30. Then the estimate of the mean and variance

becomes:

2

1 1

1 1var

1

k ki i ik

i iY Y Y Y Y

k k

For large values of n, we know that by the Central Limit Theorem this estimator will be

approximately normal:

2

~ ,Y Nr

Then, we can select nsuch that the estimate is within a desired accuracy of the true mean, i.e. a

percentage of the true mean with a specified probability.

E.g. random variates are generated from a Gamma distribution:

1 for 0xf x x e x

Twenty values are generated for each sample and the mean and standard deviation of each sample

are given as:

Sample 1 2 3 4 5 6 7 8 9 10

Mean 12.01 11.79 13.43 14.01 11.44 11.19 11.24 12.42 12.91 12.29

SD 6.15 4.73 6.42 7.02 4.30 3.90 3.84 4.35 3.59 4.60

Determine the no. of simulated values required for the estimate of the mean to be to be within 5% of

the true mean with 95% certainty.

We require nsuch that:

Pr 0.05 0.95

0.05Pr 0.95

0.05Pr 0.95

0.05 0.05Pr 0.95

0.052 Pr 1 0.95

0.051.96

X

X

n n

Zn

Zn n

Zn

n

We do not know the true value forand , so we must use the sample estimates. The sample

estimate for the mean is given by averaging the mean of each sample:10 20 10

1 1 1

1 1 112.273

20 10 10iij

i j i

X X X

ACTL2003 Summary Notes - [PDF Document] (62)

7/23/2019 ACTL2003 Summary Notes

http://slidepdf.com/reader/full/actl2003-summary-notes 62/62

To estimate the sample variance, use:

2

2

1

22

1 1 1

22

1

1

1

12

1

1

1

n

i i

i

n n n

i i

i i i

n

i

i

s X Xn

X X y Xn

X n Xn

So the estimated variance is given by:

10 20

22 2

1 1

110 20

10 20 1 ij

i j

x X

Where we have:

10 20 10

22 2

1 1 1

10 1022

1 1

20 1 20

19 20

4790.0596 30285.702

35075.7616

iij i

i j i

ii

i i

x s X

s X

Therefore the estimated variance and standard deviation is:

2 2135075.7616 200 12.273

199

24.876664.98765

Substituting back into the previous equation, we have:

0.051.96

0.05 12.2731.96

4.98765

254

n

n

n

We can also estimate the parameters of the Gamma distribution, since we know that:

2

varX X

Using our estimated values we have:

12.273

24 87666

ACTL2003 Summary Notes - [PDF Document] (2024)

FAQs

Can you get ChatGPT to Summarise a PDF? ›

Yes, ChatGPT can summarize PDF files using its PDF summarization feature, which is available in ChatGPT Plus. Can I give ChatGPT a PDF? Yes, you can provide ChatGPT with a PDF document for summarization. Simply drag and drop the PDF document into ChatGPT, and it will be ready for summarization.

Can ChatGPT read a PDF document? ›

Can ChatGPT read and summarize a PDF? Absolutely. Just remember: change your document from its original PDF form to plain text before feeding it to ChatGPT.

Can Microsoft Copilot summarize PDF documents? ›

Copilot for OneDrive will take on the role of a research assistant of sorts, being able to both find, summarize, and extract information from a wide range of files. These include text documents (Word and rich text), presentations, spreadsheets, HTML pages, PDF files, and more.

How many PDF pages can GPT read? ›

GPTs can take VERY long PDFs - over 900 pages! (Tested in the Playground) : r/ChatGPTPro.

Can OpenAI summarize a PDF? ›

Extract and Summarize PDF Text Using OpenAI. This workflow extracts text from PDFs and uses OpenAI to summarize it, ideal for quickly understanding complex documents. Bardeen automates your repetitive tasks without code. Install the Chrome extension for free.

How to use ChatGPT effectively in PDF? ›

To chat with your PDF on ChatGPT, you need to use a ChatGPT plugin like the AskYourPDF plugin. You will first install the AskYourPDF plugin for ChatGPT, then upload your PDF to ChatGPT and start asking questions about your PDF by using the document ID to identify your PDF.

Can I upload a PDF to Copilot? ›

You can upload a file in Microsoft Copilot for Security to allow Copilot to refer to the contents of the file for responses when you prompt Copilot and mention your "uploaded files."

What is the best PDF AI summarizer? ›

What is the best PDF summarizer? When it comes to PDF summarizers, Hypotenuse AI makes it the best option for quickly and accurately summarizing PDF files. It allows users to get the most important information from lengthy PDFs—without you having to scour through walls of text.

Can GPT-4 extract data from PDF? ›

No requirement to train a custom model: GPT-4 Vision is a pre-trained model that can be used to extract structured data from PDF documents without the need to train a custom model for your specific document types.

Is paying for ChatGPT 4 worth it? ›

The free tier of ChatGPT is good, but GPT-4, at $20 per month via ChatGPT Plus, can be a good deal smarter and more accurate. GPT-4, OpenAI's most powerful artificial intelligence large language model (LLM), is available through a subscription to ChatGPT Plus, which costs $20 a month.

What is the limit of GPT PDF? ›

Each uploaded PDF file must be within the 512 MB size limit.

How to get AI to summarize a PDF? ›

Open a PDF file and select the Generative summary icon in the navigation panel. Alternatively, you can select All tools > Generative summary. If prompted, select Get started. An overview of the document is generated with section headings and summaries.

Is there a way to summarize a PDF? ›

Summarize a PDF in 3 easy steps.

You might think it's difficult to get a concise summary of a lengthy document, but AI makes it easy with only three simple steps: Import your text or PDF document. Allow the AI PDF summarizer to examine the content. Copy and paste the summary into a new document.

Can ChatGPT summarize a PowerPoint? ›

One of the key capabilities of ChatGPT is its ability to summarize text and various types of PPT. PPT Summary with ChatGPT by providing a prompt and inputting the desired PPT, users can leverage ChatGPT's language understanding and generation capabilities to obtain concise summaries.

References

Top Articles
Latest Posts
Article information

Author: Pres. Carey Rath

Last Updated:

Views: 6179

Rating: 4 / 5 (41 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Pres. Carey Rath

Birthday: 1997-03-06

Address: 14955 Ledner Trail, East Rodrickfort, NE 85127-8369

Phone: +18682428114917

Job: National Technology Representative

Hobby: Sand art, Drama, Web surfing, Cycling, Brazilian jiu-jitsu, Leather crafting, Creative writing

Introduction: My name is Pres. Carey Rath, I am a faithful, funny, vast, joyous, lively, brave, glamorous person who loves writing and wants to share my knowledge and understanding with you.