consuming unknown input as state

Bugs of Unstructured State

Bugs caused by consuming unknown or inadequately validated input as state, without type checking or structural contracts.

Data without shape, type, or contract; the mismatch cascades downstream.

01the defects

The defects in detail

02the smell

In practice

Each defect here starts the same way: input consumed as state with no shape or contract to hold it. It shows up as data that doesn't fit its container and bugs that linger no matter how many tests you add. The tools that make it easy are dynamic types, missing validation, and eval/parse that trust whatever they're handed.

How it shows up

  • Information doesn't fit its expected container
  • Latent, lingering bugs even with more tests
  • Poor big-O performance with respect to input size
  • The problem moves around or persists

Tools that hurt

dynamic & duck typeslack of validationgarbage in, garbage outAny typingnullability without handlingmixed types in collectionsunbounded shapes & sizeseval() or parse()
03control flow

goto was the original unstructured state

The go to statement as it stands is just too primitive; it is too much an invitation to make a mess of one's program. The quality of programmers is a decreasing function of the density of go to statements in the code they produce.Edsger W. Dijkstra, “Go To Statement Considered Harmful” (1968)
04case file

When it ships anyway

Knight Capital Group

August 1, 2012 · 45 minutes · $440 million

A new deployment landed on old hardware still configured to run repurposed, retired code. A technician had missed one node in the cluster when installing the new trading software. For 45 minutes the cluster sent the firm millions of unintended orders before anyone could pull the deployment.

  • System miscommunication snowballed into destruction
  • Reused data structures meant a dead flag now triggered live trades
  • No input validation on the orders being fired
  • No architectural discipline to stop a partial deploy

Fintech at scale means no backsies.

read the post-mortem ↗
05rules of thumb

Input Validation Rules of Thumb

Reading

Read values are correct, up-to-date, and trustworthy — until you have validated their structure and type, assume none of that is true.

Writing

Written values are structured correctly, atomic, consistent, idempotent, and durable. Falsey and NULL values are handled safely, not by accident.

Executing

An executed value is code, and code for the correct interpreter — and it can be interpreted correctly. (Wait, why are we executing input again?)

06antidotes

Philosophies & antidotes

This family maps to the CWE pillar: CWE-707 — Improper Neutralization