The Tragic Story That Demands a Better System

Earlier this week, the NZ Herald published a damning report under the headline "'People didn't do their job': Hospital failures linked to young woman's preventable death - NZ Herald". The article details the case of a young woman who died after multiple breakdowns in hospital care-missed handoffs - ignored alerts,. And a culture that placed individual blame over systemic improvement. As a software engineer who has spent years building and debugging complex distributed systems, I read the report with a mix of anger and recognition. This wasn't a story about a few lazy or incompetent workers. It was a story about a system designed to fail.

The NZ Herald investigation revealed that at no point did any single caregiver intend harm. Yet a cascade of small misses-a lab result not checked, a change in vital signs not escalated, a shift change without proper handoff-led to an outcome that was entirely preventable. The phrase "people didn't do their job" was used repeatedly by hospital leadership,, and but that framing is dangerousIt masks the deeper, structural problems that made failure not only possible but likely.

In my own field, we see the same pattern. When a production outage costs millions, the first instinct is often to ask "who pushed that commit? " or "who forgot to test that path? " But as the tech industry has learned-painfully-blaming individuals is the fastest way to ensure the same failure repeats. The NZ Herald story is a mirror held up to every engineer who has ever worked in a culture of fear. Let's examine what healthcare and software engineering can teach each other about preventing these disasters.

Medical team discussing patient chart in a hospital, highlighting systemic communication failures

The "People Didn't Do Their Job" Fallacy: Why Blame Is Counterproductive

Hospital administrators told the NZ Herald that individuals had failed to perform their duties. But this explanation ignores decades of research in safety engineering. The Swiss Cheese Model, first formalised by James Reason in 1990, shows that major accidents rarely result from a single error. Instead, they require multiple holes in the defence layers to align. In this case, the holes included a missing escalation protocol, a poorly designed shift-change process,. And an electronic health record (EHR) system that buried critical alerts.

I've seen the same in software. In 2018, an AWS S3 outage that affected half the internet was initially attributed to a "human error" where an engineer mistyped a command. But Amazon's own postmortem revealed deeper issues: the command line interface lacked sufficient guardrails, there were no staged rollback procedures,. And the team was operating under severe time pressure. Blaming the engineer would have been easy-and useless. Instead, AWS invested in idempotent operations, staged deployments, and automated rollback mechanisms. The problem was solved, not the person.

In healthcare, the equivalent would be redesigning the EHR to require double confirmation before dismissing a critical lab result, or implementing a "timeout" protocol that forces a second clinician to review handoff notes. The NZ Herald story makes it clear that no such systematic fixes were in place. The hospital's response-blaming individuals-guarantees that the next family will hear the same apology.

Lessons from Software Engineering: Postmortems and Blameless Cultures

One of the most major practices in modern tech is the blameless postmortem. Pioneered by companies like Google, Etsy,. And Netflix, this process treats every incident as an opportunity to improve the system, not punish the operator. The golden rule: "If you can't imagine the failure happening to you, you haven't done enough to prevent it. "

For example, after the 1996 Ariane 5 rocket explosion-caused by a software overflow-the investigation didn't blame the programmers. It found that the team had used code from the Ariane 4,. Which assumed different flight dynamics. The fix was to add runtime bounds checking and to require independent validation of reused components. Today, every major tech company mandates that postmortems end with specific, actionable engineering changes-not HR memos.

If the hospital in the NZ Herald report had conducted a blameless postmortem, they might have asked: Why did the nurse not see the lab result? Was it buried by other alerts? Why wasn't there an automatic escalation when the patient's vitals crossed a threshold? What cognitive biases were at play during the 12-hour night shift? These are systems questions, and they lead to systems answers. The phrase "people didn't do their job" isn't an answer; it's a refusal to look deeper.

The Swiss Cheese Model in Healthcare and Code Deployments

I often draw the Swiss Cheese Model on whiteboards during incident reviews. Each layer of defence-code review, unit tests, integration tests, staging environment, monitoring-has holes that represent potential failures. The goal is to reduce the size of the holes and increase the number of layers. In the NZ Herald case, the layers were paper-thin. The attending physician believed the nurse would check the lab. The nurse believed the physician had already seen it. The EHR system displayed the alert in a section that wasn't part of the default dashboard. No layer caught the overlap.

In software, we use defence in depth. Kubernetes deployments, for example, use liveness probes, readiness probes, resource limits, network policies,, and and pod disruption budgets-all independent layersIf one fails, the others catch it. Hospitals could learn from this: an automated system that alerts both the attending physician and a backup clinician when a lab value enters a critical range; a mandatory checklist that must be acknowledged before shift handoff; a second-person review for any discharge order. These aren't extra steps-they are safety nets.

One concrete technique from software that maps well to healthcare is the "pre-mortem" (coined by Gary Klein). Before a major release, the team imagines it has already failed and writes a story about what went wrong. Then they fix those vulnerabilities preemptively. A hospital could run a pre-mortem for every high-risk pathway: "A young woman died after discharge because…" and trace the hypothetical failures.

Swiss cheese model diagram showing layers of safety with holes aligning

How AI and Clinical Decision Support Could Have Prevented This

Artificial intelligence isn't a silver bullet, but it can plug holes that humans consistently miss. Clinical decision support systems (CDSS) powered by machine learning have been shown to reduce diagnostic errors by up to 40% in some studies (e g, and, this meta-analysis in JAMA). In the NZ Herald case, an AI that continuously monitored lab values, vital signs,. And the patient's history could have issued a high-confidence warning that the patient was deteriorating-even if no single human had noticed.

But the key is explainability and integration. Simply adding another alert to an already noisy EHR is counterproductive-it contributes to alarm fatigue. Instead, the AI should be designed to adjust its threshold based on context: a young woman with no comorbidities requires less aggressive intervention than an elderly patient with multiple conditions. This is similar to how anomaly detection systems in software use baseline profiling: a sudden spike in error rates on a rarely used endpoint is flagged differently than on a high-traffic API.

More importantly, the AI shouldn't just flag a problem; it should suggest the next action. For instance: "Lab result K+ 6. 5 (critical). Action: Notify attending physician. Administer calcium gluconate per protocol, since start insulin-dextrose infusion. " This reduces cognitive load and ensures that the human knows exactly what to do. The NZ Herald story makes clear that information existed but wasn't translated into timely action. That's where AI can bridge the gap between data and decision.

Building Redundancy and Safety Nets: From Kubernetes to Hospital Protocols

In distributed systems, redundancy isn't optional-it's the core guarantee. Kubernetes controllers constantly reconcile desired state with actual state. If a pod crashes, the controller spins up a replacement. If a node fails, pods are rescheduled. This is automated error recovery, and healthcare lacks this kind of proactive resilienceIn the NZ Herald case, the only recovery mechanism was human vigilance,. And it failed.

What would a "healthcare Kubernetes" look like? It would include:

  • Automated escalation paths: If a critical lab value isn't acknowledged within 15 minutes, the system pages the next-level clinician automatically.
  • Forced checklists: Before discharge, a mandatory checklist is unskippable-similar to a CI/CD pipeline that fails if tests don't pass.
  • Handoff protocols enforced by software: The EHR requires both shift nurses to sign off on a structured summary, modeled after the SBAR (Situation-Background-Assessment-Recommendation) format.

These aren't pie-in-the-sky ideas. They have been implemented in leading hospitals like Johns Hopkins and Mayo Clinic,. And they drastically reduce adverse events. The hospital in the NZ Herald article lacked even basic redundancy, and the tragedy was inevitable under those conditions

The Role of Communication Silos: When Handoffs Fail

The investigation highlighted that handoffs between shifts were "rushed" and often omitted key details. This is a classic communication silo problem-exactly what software teams overcame with tools like Slack, Jira,. And structured incident management. In a well-run DevOps team, an on-call engineer uses a shared incident log screen that captures every action taken, every decision,. And every pending alert. The incoming engineer can review the entire timeline in minutes.

In healthcare, the equivalent would be a real-time digital whiteboard that tracks all pending tasks, lab results,. And medication changes for each patient. Several EHR vendors now offer these, but they're only effective if the culture mandates their use. The NZ Herald report suggests that at this hospital, the culture tolerated-even encouraged-informal handoffs. That's like running a production database without replication: it works until it doesn't.

The fix requires both technology and training. Every clinician should undergo simulation training where a simulated patient deteriorates unless handoff information is correctly transmitted. This is exactly how the aviation industry trains pilots with Crew Resource Management (CRM), which has dramatically reduced crash rates since the 1970s. Healthcare needs the same mindset: treat every handoff as a critical system boundary.

A Template for System-Wide Incident Analysis (Using Real Techniques)

After reading the NZ Herald report, I wanted to translate its findings into a structured analysis that any tech leader could apply to their own systems-and that healthcare administrators could use too. Here's a simplified template based on the Five Whys and the Heinrich Triangle:

  1. Define the event: Young woman dies from preventable causes after hospital discharge.
  2. Immediate cause: Lab result not acted upon; vitals not escalated; no one followed up.
  3. Why did the lab result not trigger action? Because it was displayed in a secondary dashboard that the physician did not check.
  4. Why was it displayed there? Because the EHR defaulted to a "normal" view that hid critical alerts unless the user scrolled.
  5. Why wasn't the default changed? Because IT had never run a user experience review with clinicians. The system was designed for billing compliance, not clinical safety.
  6. Why wasn't there a fail-safe? Because the institution had no system-level risk assessment for high-risk pathways.

This progression reveals that the root cause is not a person but a design failure-the EHR's information architecture and the lack of a monitoring feedback loop. In my own experience leading incident responses, the fifth "why" almost always points to a missing process or a poorly designed tool. The phrase "people didn't do their job" is usually an artifact of stopping at the first or second why.

Engineer analyzing system architecture diagram, representing root cause analysis in complex systems

The Cost of "Normalization of Deviance" in Both Industries

Diane Vaughan, a sociologist who studied the Challenger space shuttle disaster, coined the term normalization of deviance: when a system repeatedly accepts small failures, those failures become the new normal. The O-rings on the Challenger had shown damage in previous flights,. But NASA engineers had sometimes accepted the risk. Over time, the deviation became acceptable-until it wasn't.

The NZ Herald report describes a similar pattern: missed labs - rushed handoffs,. And ignored alerts had likely happened before without catastrophic outcomes. Each near-miss reinforced the belief that the system was safe. This is exactly what we see in software when a team "fixes" a bug without adding a regression test,. Or when a deployment proceeds despite a failing smoke test. The failure is normalized until it happens to the right patient-or the right customer-and causes real damage.

Breaking this cycle requires a cultural shift. In software, we use chaos engineering (pioneered by Netflix) to intentionally inject failures and test the system's resilience. A hospital could simulate an overloaded scenario where multiple alarms trigger simultaneously,. And then measure how the care team responds. The goal is to find weak points before they align catastrophically. The hospital in the NZ Herald article clearly didn't run such simulations,. And the result was a life lost, and

FAQ: Five Common Questions

1How can software engineering methodologies apply to healthcare?

Healthcare is a socio-technical system, just like a large-scale distributed application. Principles such as blameless postmortems, defense in depth, automated fail-safes,. And structured handoffs are directly transferable. Many hospitals are already adopting.

Need a Custom App Built?

Let's discuss your project and bring your ideas to life.

Contact Me Today →

Back to Online Trends