Why It Breaks

Why It Breaks

11 Jun 20269 min readChristopher Deen
Macro photograph of a small cylindrical metal component showing a fatigue fracture surface with classic beach marks and crack propagation radiating from a central initiation point, illustrating fatigue failure mechanisms in precision metal components. Macro photograph of a small cylindrical metal component showing a fatigue fracture surface with classic beach marks and crack propagation radiating from a central initiation point, illustrating fatigue failure mechanisms in precision metal components.

Why It Breaks: This article is part of ATL Medical's Hard-Won Engineering series, which explores the engineering and clinical challenges of medical device development — and how rigorous analysis, honest failure investigation, and design discipline produce better outcomes for patients and programs alike.

The Smith Who Didn't Know Why

Materials science is one of the oldest engineering disciplines. For most of its history, it wasn't engineering at all. People hammered metal into shape for thousands of years before anyone could tell you what the hammering actually did to it. Work hardening — the reason a cold-worked edge holds — was exploited for millennia before it was understood. The smith knew that it worked. He had no idea why.

Try running an effectiveness verification on that. How do you prove you've made it hard enough when you have no concept of hardness as a measurable property, no microstructure, no idea what the carbon is even doing in the iron? You don't. You make the sword, you hand it to someone, and the result comes back as a binary: they lived, or they didn't. Design validation by attrition, with the feedback loop running through a battlefield.

The Japanese sword is the example everyone reaches for, and it's the one that usually gets told wrong. The steel from traditional clay furnaces came out heterogeneous — uneven carbon, riddled with slag. Folding and forge-welding the billet homogenized the carbon and worked the inclusions out. The differential hardening — clay packed along the spine so the edge quenched hard while the back stayed tough enough to survive impact — is the part actually worth understanding? Not one of the smiths doing it could have told you why it worked. They had a process refined across generations by the only validation available: which blades came back broken.

The Gap Between Passing a Test and Being Safe

Do anything for a few thousand years, and you start to assume you've understood it.

By the early twentieth century, we were confident we understood steel. The Titanic was built from plates that passed every test anyone knew to run — the tensile numbers met spec. What nobody knew was that if you get steel cold enough, it stops being malleable and ductile and instead shatters like brittle glass — and that this plate crossed that line somewhere north of +30°C, while the North Atlantic that night was -2°C. The hull didn't fail because the steel was defective by 1912 standards. It failed because it was being used well outside an envelope nobody knew existed. Brittle fracture wasn't a concept yet, so there was no test for it, and the steel passed. We didn't confirm it until the 1990s, when the steel pulled off the wreck was finally put through impact tests that hadn't existed when she was built.

That gap stayed open until something forced it shut. In the Second World War, the United States mass-produced thousands of Liberty ships — welded rather than riveted, turned out at a pace nobody had attempted. Nearly fifteen hundred suffered brittle fractures. Several broke clean in half. A Liberty-style tanker, the SS Schenectady, split across the deck while sitting at the fitting-out dock in calm water. The investigation that followed is among the most consequential in engineering history: the welds weren't the culprit, the steel was. It turned brittle in the cold. The square hatch corners concentrated stress. And because the hull was one continuous welded piece, a crack that started at a corner could run the length of the ship, where a riveted hull would have stopped it at the next plate. Three things — the steel, the design, the joining method — that only made sense once you understood all three at once. That investigation helped establish the entire field of fracture mechanics. The tools that finally explained the Titanic were the ones the Liberty ships forced us to invent.

That is the whole point. Once you understand a failure — the mechanism, not just the symptom — you stop reacting to failures and start designing against them. The smith put a hard edge and a tough spine into one blade by packing clay and trusting the outcome. We now put fine grain in the bore of a turbine disc for fatigue strength and coarse grain at the rim for creep resistance — two microstructures in one component, the same principle, except done to a drawing, because we understand exactly why each behaves as it does. Know the use environment and how the material actually behaves, and you can judge whether a design is fit before you cut metal. That is the thread that runs from the forge to the finite element mesh.

What This Looks Like in Practice

Historical examples make the principle legible. Real engineering work is where it becomes useful. Here are two failure investigations from real engineering work — neither glamorous, both instructive.

The Green Tint

We had stainless steel parts coming back with a green tint. Green, from the literature, points to chromium oxide, which requires oxygen to form. That was the part that didn't fit because the parts were supposed to be gas-fan-quenched under argon. An inert atmosphere has no oxygen to give.

Rather than send samples straight to the lab, I asked the heat treatment plant to walk me through their process. They were wrapping the parts in a foil envelope, crimping it shut in open air, then loading that into the furnace and quenching under argon. There it was. They had folded a pocket of ordinary air — oxygen and all — into a sealed envelope and held it against the parts at temperature. The inert furnace atmosphere never reached them. And these parts are tiny, so the trapped air was more than enough to do the damage.

The fix didn't require a research program: the tool room made a simple mesh basket so the parts sat exposed to argon, with no captive air. The plant trialed it, the green disappeared, and the parts have conformed ever since. That last part matters — the green going away and staying away is the verification of effectiveness that tells you the root cause was probably right.

The Test in the Wrong Place

The second was a tungsten alloy that snapped during production. The parts were delaminating, and the investigation pointed to a flame treatment used to burn off organic residue—running hot enough to anneal the surface, as the investigation subsequently confirmed. Annealing a heavy alloy that's otherwise tough embrittles the skin, and an embrittled skin is exactly what delaminates under load.

The sting was in the inspection. An earlier CAPA had already put a mandrel-wrap ductility test in place to catch exactly this kind of brittleness — but it sat before the flame treatment. It passed the parts as ductile, then the flame treatment embrittled them afterward, downstream of the one check meant to catch it. A test in the wrong place in the sequence is no test at all.

We had it moved to after the flame treatment, so finished-condition ductility became the gate, and the production delamination stopped. That's the verification of effectiveness — and the real lesson sits in where you put the check. A part can pass every test you run and still be unfit if the test never looks at the thing that actually fails.

Reactive Is Not Enough

Neither of these is glamorous. A mesh basket and an inspection moved to the right place in a process sequence are not the stuff of legend. But set them next to the blacksmith, and the gap between them is the whole of modern engineering. He had a process that worked and no idea why, and it was validated by those who lived. We have a process that works, a mechanism we can name, and a verification that closes the loop without anyone getting hurt — especially our patients. What changed isn't the steel. What has changed is that we can finally answer the question the smith could never: why. And a why you understand is a why you can verify, predict, and design against.

All of that, though, is still reactive — root cause after something has already broken, the corrective half of CAPA. The preventive half is where the value compounds, and it runs on the same understanding: a mechanism you can explain is a mechanism you can anticipate. At ATL, that understanding feeds the front of the line, not just the investigation file — design of experiments to map a process window and find its edges before a part reaches production, prototyping to identify failure modes while they're still cheap to fix, on a bench rather than in the field.

It is the same verification discipline, aimed forward instead of back, and it pays twice: patients are protected by devices whose failure modes were designed out rather than inspected for, and those devices reach the market faster, because the surprises that wreck a timeline get predicted and removed before validation, not discovered during it. Catching the failure protects this batch. Preventing it protects every batch after — and the patient who never knew there was a risk.

Conclusion

The gap between the blacksmith and the modern engineer isn't talent or attention — it's understanding. The smith worked with materials he couldn't fully explain, validated by outcomes he couldn't always control. We work with mechanisms we can name, test conditions we can define, and verification loops we can close before any patient is exposed to a risk.

That understanding is the foundation of everything that follows in this series — from materials selection and process design to the manufacturing tolerances that determine whether a device performs as designed. The next article explores one of the most demanding examples: ceramic isolator design and the precision tooling required to manufacture RF ceramic components reliably at scale.

FAQ

Verification of effectiveness (VoE) is the step in a CAPA process that confirms a corrective action actually solved the problem — and that the solution holds. It's not enough to implement a fix; you have to demonstrate that the root cause has been addressed and the failure mode no longer occurs under the same conditions.

Root cause analysis moves beyond the visible failure to identify the underlying mechanism driving it. In a regulated environment, correctly identifying the root cause is essential — corrective actions built on a misdiagnosis will fail to prevent recurrence and may introduce new risks.

Reactive engineering addresses failures after they occur — investigating root causes, implementing corrections, and verifying effectiveness. Preventive engineering applies the same understanding of failure mechanisms earlier in the development process, using design of experiments, prototyping, and failure mode analysis to identify and eliminate risks before a device reaches production or validation.

CAPA stands for Corrective and Preventive Action — a formal process required under ISO 13485 and FDA regulations for identifying, investigating, and resolving quality issues. The corrective half addresses existing failures; the preventive half uses that understanding to anticipate and eliminate potential failures before they occur.

Surface finish directly influences where fatigue cracks initiate. Stress concentrations are strongly influenced by surface quality and feature geometry — a rough surface or a sharp feature transition creates local stress amplification that can significantly reduce the number of cycles a component can survive before failure, even when bulk material properties are within specification.

A test placed at the wrong point in a manufacturing sequence can pass components that are unfit in their finished condition. If a process step that affects a critical property occurs after the inspection designed to catch that property change, the inspection provides false assurance. Test sequencing must reflect the actual sequence of process steps that affect the property being measured.