Fukushima and Macondo are parallel happenings in two major energy industries with similar themes and consequences
Character: extreme events with large public reaction
Risk and Impacts: all underestimated beforehand
Perceived: major disasters harming environment Consequence: massive damage to reputation and companies Reaction: new safety requirements and inspections
Enhanced: emergency preparedness
Implications : widespread
We need to be able to explain data, improve safety, reduce risk and make predictions
Some simple questions to pose and try to answer:
?What is the risk of a major accident ?
?When technology or designs change how does safety change?
?What do the past events imply?
?How can predictions be made ?
?What risks are tolerable or acceptable?
?How are these risks to be managed ?
?How safe are the operating crews?
?What is a cost effective improvement?
?What about unknowns?
?What is the present knowledge?
?What if anything should be done differently in the future?
?How can or should the industry operate?
?……………..?
Fundamental idea and postulate
?Risk is caused by uncertainty, and the measure of uncertainty is
probability
?Modern systems and structural failures do not just involve mechanics,
components and statistics
?All modern systems include people whose contribution dominates, thus
making failures complex , while barriers will be penetrated
?To understand and predict failures it is essential to include people: their
actions, mistakes, skills, decisions, responses, learning and motivation ?Therefore ,we must explicitly include learned behavior(s) with increasing
experience and risk exposure
?Based on systems outcome data, we developed a unifying emergent
theory of learning thus avoiding excessive complication
?Treat all outcomes as occurring with some uncertainty (probability ) and
hence predictability
?Also treat rare events, “fat tails” and unknowns as a minimum attainable ?Aim is to predict and hence manage future risk and their consequences
UNRESTRICTED / ILLIMITé
Managing Risk:
Elements of a General Emergent Theory
?All failures include the human contribution, and we all (systems and
individuals ) follow a learning curve
?“Rare” events occur or re-occur on average at about the same maximum
interval achieved by all other modern systems (universality of failure )
?It is all about predicting probability, where the “Fat Tail” is due to the
human contribution
?Failure predictions, including rare or unknown events, can be described
with the same methods and measures used for all existing and known homo-technological systems
?With future (increasing) risk exposure/experience, extrapolations of
standard statistical, “power laws” and Pareto distributions will grossly under predict risk (missing unknowns, black swans and the risk
plateau)
?The relevant risk exposure and experience measures must be chosen to
provide relative predictions of risk(uncertainty) , failure and learning tren
What about catastrophic failures: Random? Human? Tolerable? Avoidable? Predictable?
What do such unexpected failures all have in common -apart from costing billions?
All failures include the inseparable human element-we design systems to assumed failure modes, safety margins and accident scenarios, with added safety precautions , and then operate them until some unforeseen failure occurs-why are we then surprised?
The black balls are observed outcomes–what can we learn from the rare and the unexpected?
Risk is measured by our uncertainty-the measure of uncertainty is probability
Are there “Tolerable Risk” Boundaries ?
8
The Learning Hypothesis
Human learn from their mistakes, continually correcting errors and their mental “rules” based on experience, as
an inseparable part of the total system.
The rate of decrease of the rate of outcomes with experience or risk exposure is taken as proportional to the rate so,
∝
With always a finite minimum rate, and a learning constant, k,
Integrating gives the solution to the Minimum Error Rate Equation as a rate that decreases exponentially as,
The Paradoxes of Learning Lessons ?Paradox 1
Without having the events we want to avoid -we cannot learn
?Paradox 2
All events are preventable -but only afterwards
?Paradox 3
All events are acceptable to society -until they actually occur
?Paradox 4
Rare events do not allow prior learning -so surprise us all
Corollary
Systems and societies “behave” and reflect the humans
learning , rule revising and error correcting within them -but
regrettably having no “perfect learning” affects risk perception
Predicting failure: measure for experience and risk exposure varies with the system
System/ Technology Experience or Risk
Exposure
Outcomes
Commercial Aircraft Flight hours Fatal crashes and
Near Misses
Offshore Oil Rigs Production
amounts Spills, fires and explosions
Power Grids Outage duration Probability and
time of non-
recovery Rocket Launches Launch count X
Burn time
Launch failure
Software/ Procedures Testing number or
time
Faults and errors
Manufacturing and Market Share Production or
sales quantity
Product cost or
price reduction
Commercial Aircraft Near Miss Rates
1 per 200,000h
350
Reported Near Miss Rates
R a t e p e r 100,000h (I R )
(US 1987-1998 Canada 1989-1998 UK 1990-1999)
01
2
3
4
5
6
50
100
150
200
250
300
Accumulated Experience (MFhrs)
US NMAC
Canada Airprox UK Airprox
NMAC learning curve model
Data Sources: FAA,CAA and TSB
The Learning Hypothesis describes the Universal Learning Curve that the Data Show
E* = exp-3N*
See paper and references for details and list of systems studied
Need surrogate for experience and risk measure
Reflects what we know about our risk exposure and learning
Identical to Laws of Practice, so systems reflect people within them
Syllables
Ball tosses
Upsidedown writing
Typesetting
Coding
0.60.70.80.91
Non-dimensional practice ,t*
Therefore, Practice = Experience, and repeated trials, t ≡ε
Hence, external system outcomes reflect individual learning
Learning from experience: knowing the failure rate, the prior probability of failure uses standard reliability definitions
The outcome probability is just the cumulative distribution function, CDF, conventionally written as F(τ), the fraction that fails by τ, so:
p(τ) ≡ F(τ) = 1-exp -∫λdτ
where the failure rate λ(τ) = h(τ) = f(τ)/R(τ) = {1/(1-F)}dF/d τ, where f(τ) = dF/d τ.
Carrying out the integration from an initial experience, ε
0, to any interval, ε, we
obtain the probability of an outcome as the double exponential:
p(τ) = 1 –exp {(λ-λ
m )/k –λ
m
τ)}
where, from the minimum error rate equation (the MERE), the failure rate is
λ(τ) = λ
m + (λ
-λ
m
) exp -kτ
Now λ
m is the lowest achievable rate, and λ(τ
) = λ
at the initial experience, ε
,
accumulated up to or at the initial outcome(s), and
λ
= 1/τ for the very first, rare or initial outcome, like an inverse “power law”. In the usual engineering reliability terminology, for n failures out of N total: Failure probability,p(τ) = (1 -R(τ)) = # failures/total number = n/N, and the frequency is known if n and N are known (and generally N is not known).
The prior learning Human Bathtub
p(τ) = 1 –exp {(λ -λ
m
)/k -λ
m
(τ –τ
)} Probability of an organizational failure or an individual error
Log scale
Increasing Experience (for the homo-technological system (HTS)) A
c
c
i
d
e
n
t
P
r
o
b
a
b
i
l
i
t
y
(
i
n
c
r
e
a
s
i
n
g
r
i
s
k
)
The initial or first event has a
purely random
occurrence
We descend the curve by
learning from experience
thus reducing the chance
The bathtub bottom or minimum
risk is eventually achieved
Eventually, when very large
experience is attained, we
climb up the curve again
because we must
have an event
? R B Duffey
& J W Saull 2004
or risk
Challenge: Predicting failures with little or no data Now as experience is gained and learning occurs, the failure rate falls to the minimum
achievable, λm, and eventually we reach a lifetime probability or service lifetime as the risk exposure measure, τ→T .
For illustrative convenience, we take the maximum lifetime, T, as corresponding to an equal 50:50 chance of the certainty of failure or survival. This half lifetime is given when p(T) ~ 0.5, or p(T) = 1 –exp -λm T ,
The equal chance of failure or survival then occurs when exp -λm T = 0.5, or λm T = -ln 0.5 =
0.693, or at a service half-life or accumulated risk exposure of
T ~ 0.69/ λm
The maximum half lifetime, T, or “likely service life” before failure , is therefore expressed as proportional to the inverse of the minimum attainable failure rate per past unit experience.
So all we have to do is provide a lower bound estimate for the failure rate, λm, and if and as additional failure data are gathered, the known achievable or attainable failure rate can always be updated to reflect this additional experience and/or risk exposure.
To determine the minimum failure rate, λm, we can adopt the classical approach of using data from analogous systems with human involvement.
Because of the common and dominant human contribution, the failure rate of modern systems is inherently applicable to other similar systems, and can be used as a basis for prediction based on what we already know..
Tanker Spills (S/Mtoe)
Spills>1000gals (S/Mtoe)
Tanker Spills (S/Mtoe)
Spills>1000gals (S/Mtoe)
MERE, IR (S/Mtoe)=0.018+0.4 exp-(accMtoe/21600)Data sources:
US Coast Guard Polluting IncidentCompendium,2003;US DOE EIA, US Trade Overview,2003