Wednesday, December 1, 2021

The Seal Failure in the SRB that Doomed Challenger

Update 10-29-2023:  recently,  I have received "comments" on this article that are nothing but ad solicitations from the makers of O-ring products,  usually from overseas.  They picked this article precisely because it mentions O-rings,  which tells me this was the result of a keyword search of published articles on the internet.  My blog site is not a monetized,  commercial site.  I do not accept any advertising from anyone.  When I find them,  I delete them.

-------

I have real difficulty with the fact that,  even after all these years,  it is still necessary to explain to people what really destroyed Space Shuttle Challenger and killed her crew,  back in January 1986.  It was really two very seriously-bad upper management decisions at NASA,  one long before the launch: 

(1) to insist on poorly-designing the O-ring seal joints with 3 interacting serious errors,  and

(2) to fly soaked-out colder than had ever been tested,  when everybody’s engineers did not want to.

Background

First,  you have to understand what really happens in federal government contracting.  There is only one customer,  and he thinks he is always right about every decision that he makes.  If you do not do it exactly the way he wants,  no matter how wrong he might be,  then you lose the contract and you don’t get paid.  And,  the government is quite often wrong about how best to do things!  That’s not to say the contractors are always right,  but they are wrong a lot less often than the government.

You also have to understand that NASA never did know,  and still does not know,  the art of building reliable solid propellant rockets.  Essentially,  no one at NASA ever did that kind of work.  They buy these things from contractors who (by definition) know much more of the science,  and especially the art,  than anyone at NASA knows.  The “science” is that knowledge which was written down.  The “art” is the knowledge that was not written down,  usually because no one wanted to pay for the writing.

I can tell you from experience as an insider within the business,  that “rocket science” isn’t really “science”!  It is only about 40% science,  about 50% art,  and about 10% blind dumb luck.  And that’s in production work!  In new product development work,  the art and luck percentages are even higher. 

Further,  this same sentiment applies to pretty much any type of engineering effort,  not just rocket work.  That explains a lot,  about a lot of things,  doesn’t it?

Poorly-Designed O-Ring Seal Joints

What I show in Figure 1 is how such joints should be designed and built.  This is the design that most solid rocket motors use,  very successfully,  whether large or small.  In most rocket motors,  you need only join the aft and forward closures to the case cylinder.  Only in some of the really large motors,  the case cylinder itself is divided into segments that must be joined,  usually to limit the size of the case-bonded propellant masses that must be cast and cured within them.

The sketch in the figure is what mechanical engineers call a “radial static seal”.  It is “radial” because the O-ring lies between an inner and an outer surface,  that must include a gap of tightly-controlled size between the two parts,  for assembly.  One part stabs into/inside the other,  in order to join them,  in this case by a row of pins.  It is “static”,  because the parts,  once joined,  do not move anymore.  There are strict but well-published guidelines and procedures for sizing the O-ring groove dimensions,  the gap for assembly,  and the size of the O-ring,  as well as its material composition and its hardness.  These guidelines and procedures are used precisely because they work so very well.   Examples:  Refs. 1 and 2.

Something also shown in the figure is peculiar to solid rocket motors,  especially those that are segmented-case designs.  There is a joint in the insulation (and thus also the propellant) that leads to the sealing surface gap,  that in turn leads directly to the O-ring in its groove.  You DO NOT obstruct this path with sealants,  putties,  greases,  or anything else!  But there does need to be a right-angle bend,  to stop radiant heat transfer from the flame in the motor from heating the O-ring directly.

The air in this path is what gets suddenly compressed upon motor pressurization,  and which in turn forces the O-ring to the far side of its groove,  where it gets squeezed against that surface to seal.  This is called “seating the O-ring”,  and until it is properly seated,  it CANNOT seal,  and so it briefly leaks!

Figure 1 – A Properly Designed O-Ring Seal Joint

It is the air in the path that gets compressed against the O-ring,  with hot booster gases and hot solids filling most of the path volume that the air formerly occupied.  But the air cools by convection to the steel much more effectively and faster than to the O-ring itself.  THAT is how the O-ring is not damaged by the hot air,  or the hot gases!  The hot solids are stopped by the right-angle bend.  This is a rapid transient on a time scale equal to,  or shorter than,  the motor pressurization event. 

What you DO NOT want is contact of the hot gases (and especially the hot solids) upon the O-ring!  The “hot sandblast” effect of that outcome would cut through the O-ring almost instantaneously.

Note that these two design requirements of (1) one O-ring and (2) an unobstructed pressurization path,  will interact very strongly with how one verifies proper assembly of the motor!  You must do a pressure leak check of the motor to verify sealing,  but you must do it by pressurizing the entire motorHowever,  you NEED NOT pressurize the motor to its full operating pressure to do this verification! 

You only need an atmosphere or so of pressure difference to seat any O-ring and then verify its sealing.  If it holds at that low pressure,  and you followed the design guidelines correctly,  it will hold at full motor operating pressure!  THAT is what you verify when you do motor case hydroburst testing,  long before you ever cast propellant to make a live motor!  That’s the way the real solid rocket motor manufacturers prefer to do it.  And it works to very high reliability levels,  as indicated in the figure.

However,  that is NOT what NASA insisted upon doing!  In the mistaken belief that a second back-up O-ring increases sealing reliability,  they insisted upon the two O-ring design indicated in Figure 2.  Thiokol complied,  lest they lose the contract.  In the mistaken belief that they had to pressure leak check at full motor operating pressure,  NASA did not want to risk fully pressurizing a live loaded motor (and rightly so).  And so NASA insisted on a way to apply air pressure at full motor operating pressure,  between each pair of O-rings at every joint,  instead of any motor pressurization.   This is shown in the figure.

What this does is drive the downstream (backup) O-ring to the correct side of the groove,  thus seating it for motor operation.  But,  it also drives the upstream (primary) O-ring to the wrong side of its groove,  from which motor pressurization upon ignition must unseat it,  drive it across its groove,  and re-seat it on the correct side!  Until and unless it re-seats on that correct side,  the upstream (primary) seal ALWAYS leaks!  Period!  There is NO WAY AROUND that outcome!  And THAT lets hot gases and solids reach the primary O-ring,  simply because the re-seating process takes a longer time than pressurization!

Figure 2 – The Improperly-Designed 2 O-ring Joint That Flew,  Up Through Challenger

NASA made a third mistake:  in the mistaken belief that it would prevent hot gases and hot solids from reaching the O-ring,  they insisted on obstructing the pressurization path by filling the insulation joint with “heat protective” putty (zinc chromate putty actually).  This is also shown in the figure.

This last mistake makes a bad risk even far worse,  because high pressure gases always (ALWAYS!!!) “wormhole-through” a not-solid material (like putty or grease) at a single point!  THIS effect is also shown in the figure.  That re-distributes the “push” of the gas from a broad front all around the O-ring,  to a single point upon the O-ring,  as indicated in the figure.  The delay unseating the ring,  pushing it to the other side of the groove,  and reseating it,  almost guarantees that the compressed air leaks past it,  so that booster hot gases and solids can reach the O-ring.  And those will cut a hole right through it.

“Half-moon slices” right through the primary upstream O-ring were seen,  upon SRB motor disassembly,  in a rather significant percentage of the SRB’s recovered and refurbished.  That verifies what I just said about the upstream O-ring being cut!  There is no surprise there,  once you understand the process!

The difference between this point load problem,  and what NASA analyzed in its structural calculations for the O-ring seal is quite stark!  The structural analysts were assuming pressurization on a broad front.  They did not model the point load effect of the hot gases and solids wormholing-through the putty obstructing the pressurization path.  Quite simply,  what was built was NOT what was analyzed!

Unnecessary Risk to Fly Too Cold

If the motor is sufficiently cold-soaked,  the primary upstream O-ring loses its flexibility and resilience (as do all of them).  Pushing the entire embrittled O-ring across its groove all at once is risky enough,  but if you concentrate the “push” at one single location by the wormhole effect,  you essentially guarantee snapping the O-ring apart at that point!  This cold brittleness effect was amply demonstrated by Dr. Feynman at the Rogers Commission hearings (assisted by Gen. Kutyna),  when he stirred his sample of the O-ring material in his glass of ice water,  and then demonstrated its non-resilience.

Any failure of the primary upstream O-ring,  whether by hot sandblast cutting,  or by cold brittle fracture from the point jet force load,  then puts a single-point hot sandblast jet impacting onto the downstream O-ring,  simply because it is nearbyThus,  a sort of “cascade failure” is a very high risk indeed!

The post-Challenger “fix” was a third O-ring in every joint.  This just set up the cascade failure as a longer chain,  as indicated in Figure 3.  The only reason the Challenger disaster did not repeat is that they never flew that cold again.  But the 1/51 failure rate demonstrated by loss of Challenger speaks for itself!

Figure 3 – The Cascade Failure Risk Was Compounded By the Redesigned Joint

Fatal Consequences We All Saw

The photography obtained during the launch and loss of Challenger confirms everything claimed here.  The seal failed upon motor ignition and pressurization,  as shown quite clearly in Figure 4.  The dark grey plume is carbon soot-bearing hot gases spewing through the two failed O-rings at the aft segment joint. 

Figure 4 – Seal Leak Upon Ignition Seen In Photography

This leak miraculously “cured” itself by plugging-up with aluminum oxide-carbon slag from the metallized propellant.  This slag-plugging just happened to hold pressure like that,  until the Challenger encountered a wind shear while at “max-Q”,  where it was also most highly stressed by aerodynamic forces.  The slag plug failed,  letting the hot motor gases and solids rush through the hole again.  This is shown quite clearly as the anomalous bright-but-small extra plume in Figure 5 below.

This jet of leaking hot gases and solids finally got so big that it cut through one of the aft struts holding the SRB to the center tank.  There is always hydrogen leaking from the center tank’s hydrogen tank,  and in this case the leaked plume probably burned a hole in that hydrogen tank.  With the strut cut,  the bottom of the SRB moved outboard.  That pushed the nose of the SRB inboard,  such that the nose of the SRB poked a hole in the side of the center tank’s oxygen tank. 

Suddenly dumping oxygen into a base-burning hydrogen-air fire caused an explosion in the wake behind the center tank that both overheated and structurally overloaded it.  The tank collapsed,  letting both SRB’s and the orbiter fly free.   The released propellants burned explosively as this happened.  All this happened in an instant,  so it looks like just the one sudden explosion.

Figure 5 – Leakage Resumed After Being Shaken By Wind Shear at Max Q

The released SRB’s continued to “fly” out-of-control under their own thrusts,  as we all saw.  This is shown in Figure 6.  The orbiter’s engines were pointed through a center of gravity that suddenly no longer existed,  so they forced the orbiter to pitch-up violently,  before starving for lack of propellant from the suddenly-missing center tank.  The pitched-up orbiter went broadside to the supersonic wind,  which tore it to pieces.  This is how those pieces,  that we all saw fall into the sea,  came to be.  

Figure 6 – The SRB Did Not Explode,  But It Punched a Hole In the Center Tank

Final Remarks

The two O-ring joint was a NASA-mandated design mistake,  compounded by mandating putty obstructing the O-ring pressurization paths.  The “customer is always right” in government contracting,  except that he was lethally and fatally wrong about this oneSee also Ref. 3.

The decision to fly cold-soaked colder than the SRB’s had ever been tested,  was also a NASA management decision.  Both NASA and Thiokol engineers objected,  but were over-ruled.  Thiokol upper management also over-ruled their own engineers,  and told NASA to go ahead and launch.  Thus emboldened by Thiokol management,  NASA launched the thing,  thus killing its crew.

The stand-down to “correct” this problem was nearly 2 years long and horribly expensive.  Which just goes to prove what I like to say to anyone who will listen:  “there is nothing as expensive as a dead crew,  especially one dead from a bad management decision”. 

The only problem with that return-to-flight effort is that they did not correct the real problems upon return-to-flight,  they actually made them worse with a 3-O-ring joint,  and by keeping the putty obstructions.  The ONLY thing they did “right” was never to fly that cold againWhich is very likely the ONLY reason that the Challenger disaster did not repeat itself before the Shuttle got retired,  since there were more than 51 more flights after the Challenger disaster!

By the way,  the crew did not die in the tank explosion and subsequent ripping-apart of the orbiter by air loads.  The telemetry showed no high-gee accelerations at all!  The crew was still alive in the orbiter cabin until it finally hit the sea,  which is about a 200-gee stop,  since it hit dead broadside.  See Figure 7.

Figure 7 – The Crew Was Still Alive In This Cabin Section (Arrow) That Is Falling Back

I say what I said about the crew because the flight deck back-seaters leaned forward and flipped on the breathing-air packs for the front-seater pilots.  They would not have done that unless they knew the cabin had depressurized,  and that would have been significantly AFTER the explosion and ripping-apart of the orbiter.  They were tumbling clear of the explosion cloud by that time,  as illustrated in the figure. 

Those two flight deck pilots had breathed-up all the oxygen in their breathing packs by the time they hit the sea,  something confirmed by the empty breathing packs that were recovered.  Which means they were alive when they hit the sea!  By extension,  so were the back-seaters,  plus the three down on the mid-deck.

They did not have pressure suits,  parachutes,  breathing bottles,  and a hatch they could blow open (basic bail-out gear).  More importantly,  there was no way to take the spin off the tumbling cabin.  Spinning like that,  there was no way to reach and exit the hatch,  even if they had the other basic bailout gear!  But a small drogue parachute from the nose of the cabin section would have taken off the spin!  That plus the basic bail-out gear just listed could have saved that crew!  It took almost 5 minutes to hit the sea.  They had the time to bail out.

I submitted that means for bail-out to NASA,  but I was ignored.  Coming from an outsider,  my idea was “not invented here”,  as far as NASA was concerned.  Yet,  something rather like it might even have worked for Columbia some years later:  the 3 mid-deck occupants were still alive inside a tumbling cabin section as it approached impact near Tyler,  Texas,  well after the breakup during re-entry.   Time was short for a bail-out,  but without the de-spin drogue,  they could not reach the hatch at all. 

References

#1. Parker O-ring Handbook ORD 5700,  copyright 2021,  original release 1957,  Parker O-Ring and Engineered Seals Division,  Lexington,  KY, available from parkerorings.com 

#2. Seal Design Guide,  Apple Rubber Products,  Lancaster NY,  available from AppleRubber.com

#3. Wikipedia article “Rogers Commission Report”,  in this case accessed 11-26-2021

Final Notes

There are different design rules for static radial and static face seals,  and different rules yet for dynamic radial seals (as on a piston moving inside a cylinder,  like a syringe or a hydraulic cylinder).  The Shuttle SRB joints fall into the static radial classification. 

The appropriate set of rules specifies O-ring sizes and hardness,  groove dimensions,  and when to use back-up rings.  You just follow the design rules,  and make sure that only compressed air reaches the O-ring (and on a broad front),  upon solid propellant motor ignition.  

You accomplish that broad-front pressurization with the 90-degree bend geometry to stop the hot solids and radiant heat transfer,  and by NEVER obstructing the O-ring pressurization path with anything!  Even too close a fit between the hard parts,  can cause problems with the transient pressurizing flow.

You verify your seal design,  your case structural design,  and your leak check procedure,  during case hydroburst testing,  long before you ever cast a live motor!  You NEVER delete the hydroburst testing step in your development effort.  Never!  Not for any reason at all! 

Then you test live motors at every environmental extreme condition in which you think you might possibly operate.  If any redesigns (of anything) are needed,  you go back and verify them in all the tests,  from hydroburst all the way forward.  No motor goes to production,  until its exact design configuration has been verified in every test at every test condition!

Once your design has passed all those tests,  you stick with your verified leak check procedure as if it were a religious mandate!  You add rigorous quality control (of the “total quality management” type),  for production.  That includes X-raying every single item,  to verify that there are no casting voids in the propellant,  no unbonds between propellant and case liner,  and no other propellant grain cracks or other problems.  And then you NEVER operate a motor outside the conditions for which it was tested! 

THAT is the way to achieve no-more-than-1-in-a-million failure rates,  with solid propellant rocket motors!

The “bean counters” and “management professionals” will absolutely hate that prescription as “too expensive”,  but killing a crew with a bad design just costs a whole lot more,  than the cost of following that prescription.  We’ve already seen that with Apollo-1,  Challenger,  and Columbia.

Simple as that. 

And just as hard to sell to the “bean counters” and “management professionals”,  as you might fear.  


11 comments:

  1. Late comment on your post. That solid propellant in the boosters were based on PBAN polymer. The old timers we worked with comment to me when I was working the HARM inert missile that " PBAN is a great polymer system, but it cracks when it gets cold". I'm convinced that the cold soak that morning caused a cracked grain that boosted pressures above the design criteria which added to your scenario.

    ReplyDelete
    Replies
    1. Hi Larry! The telemetry from the vehicle showed no pressure anomalies from the boosters, which would have also shown up as a thrust anomaly. That suggests that the PBAN did not get cold enough to crack at +29 F. Colder, it might have. GW

      Delete
  2. Do you agree with NASA that there was joint rotation when the SRB was pressurized after ignition?

    ReplyDelete
    Replies
    1. Not necessarily. I think the O-rings were both burnt through at ignition in the cascade failure described in the article. A piece of crud stopped the leak until it jostled loose 73 seconds up.

      Delete
    2. So you don't believe the Space Shuttle's SRBs had joint rotation?

      Delete
  3. The right answer is "I don't know" because I do not have access to what NASA did. There's always a little distortion upon pressurization, but why would that flight have a different distortion than the rest? It would seem unlikely for joint rotation to have killed that bird and not another.

    ReplyDelete
  4. I agree with just about everything you wrote here, except "There is always hydrogen leaking from the center tank’s hydrogen tank" I don't think there was significant leakage from the ET H2 tank.

    This does not change any of your conclusions.

    I was a 30 year NASA contractor; I worked on simulations of the shuttle main propulsion system, including the ET.

    ReplyDelete
    Replies
    1. I saw the tank base hydrogen fire on multiple launches. It's why there was insulation on the base. -- GW

      Delete
  5. The entire tank was covered in insulation. The base heating was because of the SRB plumes. There was no significant H2 leaking from the ET.

    ReplyDelete
    Replies
    1. Then why did I see what certainly appeared to be the flameholding of incandescent gases on the tank bases?

      Delete
    2. And why did some report running low on LH2? -- GW

      Delete