Good morning, and welcome to this the 24th
lecture of the course on, “Stochastic Hydrology”. In the last class that is the lecture number
23, we essentially discussed the Markov chains. And in the previous class to that we introduced
the Markov chains, the transition probabilities and then we also discussed steady state Markov
chains and understood how to derive the steady state probabilities of the Markov chain. Then
in the previous lecture we also discussed a few examples on Markov chains.
So far, the journey has been on discussing the fundamentals on stochastic processes with
a few applications through numerical examples and so on. For example, we also discussed
in earlier classes the probability distributions, joint probabilities, conditional probabilities
and then data generation, data forecasting, analysis in the frequency domain, spectral
analysis, and then the time series analysis, where we discussed in detail the ARIMA type
of models and then their applications, tool data generation, hydrologic data generation
as well as hydrologic forecasting. Now onwards, the focus of the course will
slightly shift in the sense that we will take specific applications of many of these techniques
that we have learned, for various problems that we encounter in hydrology. Be it the
flood forecasting or determination of design storms or run links of drought analysis and
so forth. So, now onwards, we will specifically pick up few topics of direct application,
relevant to hydrology, and then discuss how we use the theory that we have learned so
far, on their applications; on the applications that we would be interested in.
Now, some of these may have been covered also in your applied hydrology course or the basic
hydrology course. For example, we may start with flood frequency analysis today, which
may have been covered in your basic hydrology course, but still, because in as much as these
topics, use the theory that you have covered in the stochastic process. Of course, we will
still cover even at the risk of repetition of some of these topics, so that the fundamentals
of the theory that we have covered in their applications will be clear.
We will start today with the frequency, with the topic frequency analysis. Right at the
outside, I would like to mention that we will be talking as far as this course is concerned
of at site frequency analysis. In the sense, that we are not talking about special distribution
of the data or the regional frequency analysis, which is a different topic all together, this
is slightly advanced. But, we will be discussing in today’s class and may be subsequent two
to three classes, will be discussing, the frequency analysis of the data that is gathered
at a particular location. This we call it as at site frequency analysis.
Say, for example, we may have a rain gauge and we have been continuously recording the
rainfall at a particular location. That type of data will be using for frequency analysis
or the stream gauge data and are recorded at a particular stream location the stream
gauge location. What do we mean by frequency analysis? Most
of the hydrologic problems are characterized by occurrence of extreme events. For example,
we may be discussing about floods, which are really high extreme flows or maybe we are
interested in really low flows, which define either your hydrologic draughts or which may
be critical for your water quality concentrations. We may be interested in really higher intensities
of rainfall, which are necessary for design of storm water drainage in a city and so on
and so forth. So, we will be interested really in the extreme
values, either the high extremes or on the low extremes. So, for a while, now we will
depart from our normal processes and then focus on really the extreme processes. If you take any probability distribution that
we have been seeing, so far, let us say we look at some probability distribution like
this, either a completely symmetrical or a non symmetrical distribution.
If you look at the extreme high values, let say this is your random variable, the values
at the random variable takes and this is your f of x, in our notation. There is a small
tail here towards the right, which describes the extreme values. The probability associated
with these extreme values are all small, however, if you look at the distribution of these extremes
themselves, either on this side or in this side, these distributions will have some interesting
characteristics. We will be interested in really looking at the distribution at the
lower extreme or the higher extreme of the distribution of the parent population. This
we call as a parent population. So, we are deriving the extreme values or
the distribution of the extreme values from a given parent distribution. What I mean by
that is? Let us say your random variable X is your stream flows at a particular location.
And we may be interested in particular values of stream flows or we may be interested in
the event that X is greater than or equal to a given value of x, let us say X 1 or some
such thing. And we may be interested in probability of
X, being greater than and equal to x 1, for the flood flows. In which case, we would be
interested in getting the distributions of the really extreme flows, which are derived
from the parent distribution of the stream flow random variable X. So, we derive the
extreme value distributions from a given parent distribution. In other words, the frequency
analysis that we will be doing, will be a function of, and will be dependent on the
parent distribution from which we are deriving the extreme value distributions. So, with
that background, let us see what we really mean by frequency analysis and where it is
useful. As I mentioned, we may be frequently encountering
severe storms, floods, droughts, low water quality situations, high demand situations,
low flow situations, low rainfall situations and so on. These are really extreme events,
in the sense that they do not normally occur. In fact, the more the severity of the event,
the less frequent it is to occur. For example, we may be interested in the highest level
of flood that can occur at a particular location. This may be least frequent compare to other
events that we may be interested in that location. So, the magnitude of an extreme event is inversely
proportional to its frequency of occurrence. What do I mean by that? Let us say that we
are interested in the magnitude of a flood, which on an average occurs every 10 years.
This magnitude will be smaller than a flood event that occurs once in about 100 years.
So, as the frequency increases, the magnitude decreases. Once in 10 years is the higher
frequency compare to once in 100 years and therefore, the flood magnitude that may occur
once in about 10 years will be smaller than the flood magnitude that may occurs in about
100 years. With this information in place, we carry out
what is called as the frequency analysis. Now, the frequency analysis is a procedure
of estimating the frequency or the probability of occurrence of extreme events. That is we
may be interested in answering the question like in how many years a flood of a given
magnitude is likely to occur on an average or what is a probability that in a given year
a flood of a given magnitude will occur. These are the types of questions that we would be
interest in answering to frequency analysis. So, the objective of the frequency analysis
is to relate the magnitude of the extreme events to the frequency of occurrences. Let
say a magnitude of 50000 cubic meters per second of flood of flood discharge. We would
be interested in answering once in about how many such an event may occur. So, that is
how we relate the magnitude of the extreme events to the frequency of occurrence. By
frequency of occurrence, I mean, once in how many years the particular event is likely
to occur. As I mentioned earlier, we will be doing the
frequency analysis at a given site. This is different from the regional frequency analysis
where we interested in the frequencies of these events over a large region consisting
of several data points. Let say you may be interested in south India, peninsular region,
what kind of flood frequencies are likely to be present in that particular region. When
we are doing that we may consider the flow data from several stream gauges located at
the entire region. That is the regional frequency analysis. But, in this course, for the time
being, we are focusing on at site frequency analysis, where we would be collecting data
from given stream gauge or given rain gauge. Then we will be carrying out the frequency
analysis for that particular stream gauge or the rain gauge.
A basic assumption in doing this frequency analysis is that the data that we have, let
say the stream flow data at a particular stream gauge. This is assumed to be stochastic, it
is space independent and it is time independent. Now, for the same rain gauge, we are essentially
looking at these two independent these two assumptions. Mainly, that the data are independent
and they are identically distributed. So, that the data you want to analyze for frequency
analysis must satisfy these two conditions for the procedures that we are going to introduce
now. This is called as IID. I also used this term earlier, in the case of time series analysis,
IID is independent and identically distributed. Now, how do we ensure independence? Let say,
we are looking at really high flows that have occurred in a year. Now, you may be gagging
the data or you may be observing the data for every 15 minutes let say, in a day. And
you have a continuous monitoring of data for at the intervals of 15 minutes. So, you will
have 4 such data per hour and 24 hours and then in a year you will have 4 into 24 hours
into 365 days. So, those many data points you have in a given year. Now, if you put
simply a threshold. Let say that flood values exceeding a certain amount or stream flow
values exceeding a certain threshold value of stream flow.
Let say 10000 cusecs or some such things. Then in a given year you may have several
such values, which all exceed a given threshold of 10000 cusecs. If you collect such values
then it not is really independent, they may not be independent. For example, a flood that
occurs in a given year may have saturated the soil completely and therefore, any rainfall
that occurs subsequently may cause a flood of a high magnitude, any high rainfall that
occurs subsequently may cause a flood of higher high magnitude. Therefore, both of these floods
that I am talking about may cross the threshold value that we have specified and therefore,
they are not independent events. However, if you choose the maximum value that
has occurred, maximum flood that has occur in a year, which means out of all these values
that you collected, 4 values per hour into 24 hours per day into 365 days. Those many
values and out of these values you pick up just one value, which is the highest value
that have occurred in that particular year. Like this, for every year, you have one value,
which is the maximum value that has occurred in that particular year.
If you choose the annual maximum values, then this can be reasonably assumed to be independent
of each other. For example, in one year the maximum value may have occurred during July.
In another year, it may have occurred during October, and these two may not have any relationship
at all. And therefore, they can be safely assumed to be independent of each other. Therefore,
when we are choosing the data for the frequency analysis it is important for us to choose
that particular type of data for which the assumption of independence can be made reasonable.
Then what do we need by identical distribution. Remember, we are talking of the same random
variable here. We are not talking about x and y being independent of each other. We
are talking about, let say flow at a particular location, and therefore, we have a time series,
time series of flow. So, in that sense, identical distribution would mean time homogeneity;
that means, you have a time series and then this should be homogeneous among time. Contrast
this with space homogeneity, which will talk about on regional frequency analysis. We are
not interested in space homogeneity in as per as frequency analysis of at site data
is concerned. we are talking about time homogeneity. So, we select that particular data such that
the assumptions of independence and identical distributions are satisfied. These are the
two i most important assumptions that we make in frequency analysis. As I said the assumption of independence is
achieved by selecting the annual maximum or minimum of the variable being analyzed, as
successive observations from year to year can be assumed to be independent. The assumption
of identical distribution or time homogeneity is achieved by selecting observation from
the same population. What I mean by that same population is that the data that you are analyzing,
must have come from the same water shed, which has not gone through any drastic changes in
recent years or during the period of the data. For example, sudden urbanization has not taken
place or the characteristics of the watershed have essentially remained unchanged during
the period of the data. Then also the gauge that you have used for the data, they have
not undergone any changes. For example, the type of gauge has not changed or the gauge
has not been shifted from one place to another place in the same watershed and so on.
So, we are looking at a same type of rain gauge, the same rain or stream gauge for the
entire period of data and the watershed has not gone any specific changes, during the
period of data. This will ensure that the time homogeneity is maintained for the data. So, once you have ensured this, we will look
at the purposes, for which we will use frequency analysis results. As I mentioned, we would
be getting answers to questions such as what is the return period or the average recurrence
time that elapses between a critical events occurring in two successions, occurring successively.
Let say that these year a flood of a given magnitude have occurred, then we will be interested
in what is the probability of it will occur again next year or will be interested in the
average time that may elapse between occurrence of this event successively. Does it occur
once in five years, once in ten years? and so on.
Now, these kinds of questions are of extreme relevance or high relevance in designs of
dams, bridges, culverts and flood control devices and so on. When we build a dam, we
may want to design it for a flood of a return period of 100 years; that means you want to
build a capacity, which can pass the discharge that corresponds to a flood, which returns
once in about 100 years or once in about 200 years and so on.
So, right at the design stage, you will make a decision on what is a type of flood discharge
that you would like to pass through this particular spill way and that discharge corresponds to
your design decision of the return period. And the decision on the return period itself
will depend on the type of designs that you are making.
For example, for the dams, bridges, culverts, etcetera you may be interested in 100 years
return period, whereas, the flood controlling devices, you may be interested in more frequent
events, for example, once in 10 years, once in 15 year and so on. And when we come to
urban flooding, your designs are based on much more frequent events and therefore, lesser
discharges, smaller discharges. For example, urban storm water drainage, we may be interested
in return periods of once in 5 years, once in 10 years and so on.
So, these are the return period, in the sense that the magnitudes of the discharges or the
rainfall intensities that we talking about will appear once in about those many years
on an average. We may be also interested in drought frequency and magnitude or we may
be talking about meterologic drought or hydrologic drought. Meterologic draught are related to
rainfall essentially and then the hydrologic draught are related with the soil moisture
and water availability, in terms of the stream flow and the reservoir levels and so on.
So, we may be interested in the lower end of the distribution time; that means, the
low stream flows, low rain fall and low soil moisture levels and so on. This will be useful
in agricultural planning. So, the frequency analysis really provides useful information
for planning either for the floods or for the droughts. And in most of the hydrologic
designs we use the results provided by the frequency analysis. So, how do we go about this? I just talked
about the return period. Let us go through the concept of return period more formally
now. Let say that we are interested in an extreme event, which we define as the random
variable, the event that the random variable X takes on a value greater than or equal to
a threshold level x T. Random variable of stream flow being greater than or equal to
a value of 5000 cubic meters per second, discharged being greater than or equal to a value of
5000 cubic meters per second. Now, that is the event we call it as extreme event; that
means we put a threshold and then say that any time the threshold is crossed call that
as an extreme event. Then we define the time between the occurrences
of X greater than or equal to x T. That is the event X greater than or equal to x T has
occurred now, and then a time has elapsed and then again it occurs, another time has
elapsed and then again it occurs and so on. So, the elapsed time between the two consecutive
occurrences of X greater than equal to x T is called as the recurrence interval. And
we denote this as by tau. We denote the recurrence interval by tau.
Then in a long period, there may be several recurrences of this event. X greater than
equal to x T has occurred in let say year number one, it occurred again year number
14, again in year number 15, occurred again year number 29, and so on. So, the recurrence
interval keeps on changing in a record of the time series. The expected value of tau
is denoted by E tau is therefore, the average number of years in which the events X greater
than or equal to x T returns. Let say you had a record of 50 years and then you are
interested in getting the event or the return period corresponding to an event X being greater
than or equal to 5000 cusecs, when we are talking about the discharge. And then in these
50 years of record, the average recurrence interval or the expected value of the recurrence
interval may turn out to be, let say 10 years. And that is what we call it as a return period.
So, expected value of tau is the return period T of the event X greater than equal to x T.
This concept of the return period is used to describe the likelihood of occurrences
of extreme events. In fact, in many times, we confuse the concept of return period with
the exact time at which the event occurs. For example, I may when I say an event has
a return period of five years. It does not mean that in the next five years it has to
happen or it has to happen once. Let say that five year return period event have occurred
this year. It does not mean that same five year return
period will not occur in the next year. In fact, every year has the same probability
of this five year return period occurrence, which will show it to be one by five in the
subsequent slides. So, do not confuse the concept of return period with the time that
it takes for that particular event to return exactly. It only indicates an average recurrence
time; that means, over a long period of time on an average this event will return once
in about so many years, which is the return period of the event. Let say, to drive home that point, we will
take annual maximum discharge at a particular river for 45 years, so from 1950 to 1994,
45 years are there, and the Q that we indicating here is the maximum discharge; that means,
out of all the values that we have collected over here, we have picked up one value per
year. That is our maximum discharge and the units are in cumec, which is cubic meter per
second. So, you have 45 such values. Let say that we want to put a threshold of 1500; that
means, we say that an extreme event has occurred whenever it has crossed a 1500 cubic meters
per second. So, we draw the time series first on the x
axis you show the year and y axis is the discharge Q. W draw the time series and then draw a
line 1500 cusec, I am sorry 1500 cumec; cubic meters per second. Any flow that exceeds this
Q, 1500 cumec, we count it as a success, in as far as our definition of extremes values
concerned; that means, whenever that event occurs we count it as a success, whenever
it has not occur we count it as a failure. So, there are nine such events as we can see
here. From the data, whenever the value 1500 has exceeded, we call it as a success. So,
1, 2, 3, 4, 5, 6, 7, 8, 9; 9 times in the 45 years record, the value of 1500 has been
exceeded. What do we do with this information? We will now calculate the recurrence interval;
that means, in 1952 it occurred once. When did it occur next? It occurred next in 1956.
So, there is a lapse of four years. Then when did it occur next. It occurred in 1957. So,
the there is a lapse of one year, again a lapse of one year. So, this event of X being
greater than or equal to 1500 had a recurrence interval of 4 years here, had a recurrence
interval of one year here, recurrence interval of one year here, then it again occurred in
61. So, 3 years and so on. So, this is what we count here. So, there is a recurrence interval of four
years between years 1952 to 1956. Then one year, one year, then 58 to 72 sorry it was
1958 after that it occurred in 72, not in 61. It occurred in 72. So, there is a 14 years
gap there. Then it again occurred after four years, in 1976. So, there is a four year gap
here. So, we calculate, we count the recurrence interval in years like this and then list
it, which means what, in the record that we had seen the recurrence interval is varying
between 1 to 14 years. The event returned sometimes in one year,
sometimes as frequently as in one year, as soon as in one year and as distantly as 14
years. So, it valid between one to 14 years and there are 8 such intervals, because we
talked about 9 events between them is 8 such intervals and there is a 40 year period in
this recurrence interval. So, what is an average recurrence interval? Average recurrence interval
is in this 40 years, eight times, it has eight intervals are there. Therefore, the average
recurrence interval is 40 divided by 8 which are five years. This is an average recurrence
interval. This is eight recurrence intervals covering
a period of 40 years between the first and last occurrence. So, that is the area of obtaining
the recurrence interval. We leave out the first one. So, after the first one has occurred,
how many times it has recurred? And what is the average recurrence interval. That is what
we are talking about. And therefore, the average recurrence interval comes out to be five years. The return period of a given magnitude is
defined as the average recurrence interval between events equaling or exceeding a specified
magnitude. So, we specified a magnitude of 1500 cubic meters per second and then we recon
the recurrence intervals of such an event occurring; that means, X being greater than
or equal to 1500. How frequently it occurred in the record? Having occurred once, how much
time elapse between its occurrences again? That is what a recurrence interval is and
then we computed the average recurrence interval. So, in the record of 45 years the average
recurrence interval turns out to be 5 years, and this 40 that we have used is the number
of years between the first occurrence and the last occurrence. So, after the first occurrence
has been noticed, we start counting the recurrence interval.
This average recurrence interval is in fact the return period of that particular event.
So, for this example the return period of the event that the flow magnitude equals or
exceeds the magnitude of 1500 cumec, cubic meter per second, is Q max, is 5 years, which
means once in about 5 years, you can expect this particular magnitude to be equal or exceeded.
We put this more formally through the probability now. So, this is the event we interested in, X
being greater than or equal to x T. The subscript T here denotes the return period. As your
return period changes your x T changes or as your flow magnitude here x T changes your
return period changes. And therefore, we denote this with a subscript of T, indicating that
the flow here or the event here is dependent on the return period T.
So, we denote this probability p as probability that X is greater than or equal to x T. Then
what we do is whenever this event has occurred, X being greater than or equal to x T in the
record, whenever it has happened, we count it as a success. If that event has not happened,
has not occurred then we count it as failure. So, any event X is less than x T will be reckoned
as a failure. Now, because we define p is equal to probability of X being greater than
or equal to x T, the success occurs with the probability of p, and then the failure occurs
with the probability of one minus p. So, the event X is greater than or equal to x T has
a probability of p. The event X less than x T have the probability of one minus p. Because the events are all independent, all
the observations are all independent. The probability of the recurrence interval of
duration is tau. Now, what we are saying that once the event has occurred here, what is
the probability that it occurs again only after tau years, because there is a concept
of return period. What is the probability that it returns after tau years? What is that
mean? That means, in first year it does not occur, second year it does not occur, third
year it does not occur and so on, the tau minus one year it does not occur, but the
tau eth year, it occurs. That particular event occurs. And this probability
is the probability that the event does not occur is one minus p, does not occur here
is one minus p etcetera up to tau minus one, and on the tau th year it occurs with a probability
of p. Because the events are independent, the probability of such a sequence occurring
is one minus p into one minis p into etcetera, one minus p, tau minus one times into p. So,
that is what the probability here is. So, the recurrence interval of duration tau is
the product of probabilities of tau minus one failures, up to this point, followed by
a success in the tau th time period. So, that will be one minus p to the power tau minus
one into p. So, this is the probability of the event recurring
once in tau years. Now, we have defined the return period to be the expected value of
the recurrence interval. Now, this is the probability of recurrence interval. The expected
value of the recurrence interval is what we are interested in.
So, let us recall how we define the expected value? Remember, we wrote a expected value
of random variable, when the variable is discrete, we wrote this as X p of X for I is equal to
1 to infinity. Same way we are now defining it for the random variable tau, expected value
of tau, the value of tau itself is multiplied by the probability. So, this is the probability
of occurrence, this is the probability. So, X into p of x that gives the expected value.
So, this is how we are getting the expected value of tau.
If we expand the equation; that means, tau is equal to 1 we put. Then 1 into 1 minus
p to the power 0 into p, that is p. Then tau is equal to 2, we put 2 into 1 minus p to
the power 2 minus 1, which is 1 into p, plus 3 into 1 minus p whole square into p and so
on. So, this is how we expand the expected value of tau and this we write is as, p you
take it common, 1 plus 2 into 1 minus p into 1 minus p whole square and so on. So, this
is the expected value of tau. Now, the terms within the bracket here, namely 1 plus 2 into
1 minus p plus 3 into 1 minus p whole square plus 4 into etcetera, now, this can be equated
to a series. There is an expression within the bracket.
We can write this as, this particular expression as one plus x to the power n is equal to and
this is expanded as one plus n x plus n into n minus one by two factorial x square plus
etcetera, where we are putting x is equal to minus one minus p and n is equal to minus
two. So, in this if you put x is equal to minus one minus p and n is equal to minus
two, you can verify that you will get the same expression as it is in the bracket here.
With this now, what we are writing for example, one plus x to the power n the terms in the
bracket. So, one minus, X is minus one by minus p, to the power n is minus 2. Also, we write is as expected value of tau
is equal to p that is this p divided by 1 minus of 1 minus p to the power of two. That
will be p by p square, this becomes p square and therefore, is equal to 1 by p. Therefore,
expected value of tau, we are showing it as one by p. Recall that p is a probability of
X being greater than equal to x T or a probability of success. So, because of expected value
of recurrence interval tau is in fact, the return period, we write it as T is equal to
one by p. Now this is an important and very useful result.
That is the return period is equal to one divided by probability of X being greater
than equal to x T. .
And mostly we write it on the other way round; we write p is equal to one by T. That is the
probability of occurrence of the event is equal to one divided by T. So, specify the
return period, immediately you will get the probability, and that probability we use for
all over design and so on. So, we are saying here the probability of occurrence of an event
in any observation is the inverse of its return period, that is probability of X being greater
than or equal to x T is equal to one by T. Therefore, the T year return period event
is X is greater than equal to x T and it occurs on as average once in T years. Say for example, we look at this a same example
one that we considered, the same data 45 years data and the same threshold value 1500 years
1500 cubic meters per second and we know that the return period of event is five years.
We just obtained this return period to be five years as I showed here. So, this is average
return interval or return period. So, this is five years.
The return period is associated with the particular event. So, we are saying X is greater than
equal to 1500 cumec has a return period of five years. The event X is greater than equal
to 1500 cumecs has a return period of five years. Now, we will use that and answer the
question what is the probability that the annual maximum discharge in the river will
exceed or equal 1500 cumec for the data in example one.
So, we simply put p is equal p of X greater than equal to 1500, probability of X greater
than equal to 1500 is one by T, which is equal to 1 by 5, which is 0.2. So, this is how we
obtain the probabilities for a given return period. Let say our return period is forty
then; obviously, this 1500 will be different. As the return period increases, your magnitude
also will increase. That is a 40 year flood will be greater than a five year flood, a
100 year flood will be greater than a 40 year flood and so on. So, this 1500 we have written
as x T, which is associated with the return period T. So, from the return period you can
obtain probability. Now, related to this, an interesting and useful
question that we need to answer is what is the probability that a T year return period
event will occur at least once in ten years? There is a certain understanding that is necessary
here. Let say we are designing a critical structure or let say a nuclear reactor or
some such things and you want to protect this structure against flooding. And this nuclear
reactor may last for let say 1000 years or some such things. and the flooding that you
want to the flood discharge that you want to prevent it from may be a 1500 year flood
or 2000 year flood or so on or a PMF, a probable maximum flood.
So, we may fix the return period for the flood, which may be smaller than or may be higher
than the time at which it is required, the time for which it is required. So, we may
be interested in getting the probability that in the life span of this particular structure
that we are planning. What is a probability that a T year return period event will occur?
To give you another simpler example let us say that you are building a dam. The dam is
meant to last for 100 years, but, the structure of the dam you may have designed for a 1000
year flood event or the forces that correspond to 1000 year flood, which is different from
spillway discharge capacity. Spillway discharge capacity may be a 100 year flood capacity,
but the structure itself may be designed to be standing the forces that are caused by
a thousand year flood, the dynamic forces and so on.
And therefore, you will be interested in answering the question. What is the probability that
a 1000 year flood event will occur in the span of 100 year life time of the reservoir?
Therefore, this question is relevant. What is a probability of a T year return period
will occur at least once in N years? So, let us try to answer this question.
We will take the complementary event; that means, at least once in N years, the complementary
event to that is that it has never, it never happens in the N year period. So, we will
consider the complementary event that in N years the event X greater than equal to x
T does not occur at all. If it occur even once, we are saying that it has occurs at
least once, and therefore, it should not occur at all for the event to be complemented.
So, we first consider the situation where the event X greater than equal to x T does
not occur in the next, in the N year period. What is the probability of this? That is in
the first year it does not occur. In the second year it does not occur. Third year it does
not occur and so on. For all the N years this event does not occur. What is the probability
that it does not occur in a given year. It is one minus p, because p is the probability
that it occurs in a given year, and therefore, the events that are all independent. Therefore,
the probability that X greater than equal to x T does not occur in any of the N year
is one minus p into one minus p etcetera N times. Because one minus p is the probability
that in a year the X greater than equal to x T does not occur. So, this will be equal to 1 minus p to the
power N. The event that the situation that it occurs at least once in the N year period
is simply one minus this particular probability. That is 1 minus the probability that it does
not occur in any of the N years. So, that will be 1 minus 1 minus p to the power N and
what is p? So, p is the probability of X being greater than equal to x T, which is simply
equal to one by T as we have seen. So, the required probability is one minus one minus
p to the power N, which one minus one minus one by T to the power N.
So, this is the probability that a T year return period event occurs at least once in
N years and this is a important result, because we will be using this kind of results when
we want to assess what is the risk that we are taking in building hydrologic structures,
hydrologic designs. That is we may have built it for 1000 year return period, but we would
also be interested in getting the probability at that 1000 year return period occurs. In
fact, in the next 100 years and that will give you the measure of the risk that you
are taking in designing such a structure. So, we will use a simple example to drive
home this point. let say we have got a five year return period that is the return period
corresponding to a discharge of 1500 cumec is five years. We will answer the question
what is the probability that this five year return period occurs at least once in the
next five years? Intuitively, you may feel that this probability is one, because we are
saying that this is a return period of five years and therefore, in the next five years
it will occur once. But, as you can see here, the probability is one minus one minus one
over T to the power N. Therefore, this turns out to be this T is five years, return period
is five years and this is N we are interested in answering the question at least once in
the next five years therefore, this turns out to be 0.672.
So, the fact that the return period is five years does not mean that it will occur at
least once in the next five years. The probability that it will occur at least once in next five
years will is 0.672. So, to summarize them we have introduced the concept of frequency
analysis and we are talking about at station frequency analysis in the sense that we are
not concerned about the spatial or the regional frequency analysis here. We are talking about
a frequency analysis with respect to a given data, given data time series, the data that
we use for frequency analysis should be independent and identically distributed.
The requirement that they are identically distributed may be ensured by using the data
from the same population, because we are talking about time homogeneity here. By same population,
I mean the watershed should not be disturbed during the period of record nor should the
gauge either the rain gauge or the stream gauge that you are interested in be shifted
or changed or the major changes have taken place in the way of recording of the data.
The requirement of independence may be ensured by looking at annual maximum series or annual
minimum series, where every year you pick up one value out of the large number of values
that have been recorded; be it the maximum or the minimum. Then from year to year you
can assume them to be independent. Once we have the correct data we use do the frequency
analysis, we introduce the concept of average recurrence interval.
We call an event X greater than or equal to x T, as an extreme event and then we look
at the recurrence or the time that elapses between the occurrences between consecutive
occurrence of such event and recon it as a recurrence interval. We compute the average
recurrence interval and then call it as the return period. The return period indicates
that on an average such an event will occur once in about T years, where T is the recurrence
interval, T is the return period. Then we also examine the relationship between the
probability and the return period and we showed that the probability of X being greater than
equal to x T is in fact one by T, where T is the return period. Then we answer the question
of what is a probability that in a given N year record the T year event will occur at
least once and that is the solution that we have obtained here.
So, in the next class, we will continue this discussion further and use this frequency
factor to carry out the recurrence in turn to carry out the frequency analysis. We will
also see how the frequency analysis is different or uses different value for different parent
distributions. As I mentioned in the beginning of the class, the frequency analysis is dependent
on how we define their extreme values and this extreme values are derived from a parent
distributions. We are trying to look at extreme value distribution, which are derived from
a given parent distribution. So, thank you for your attention. We will continue the discussion
in the next class.