shared × unstructured — bad data written permanently, and it spreads

Encoding / Charset Corruption (Mojibake)

Bytes written in one charset and read back assuming another corrupt the stored text permanently.

01the recipe

In the wild

compound ofFile & Network AccessCWE-176 Unicode HandlingVersion & Library MismanagementcompoundCWE-172 Encoding Error

example.py

# SMELL: write bytes in one charset, read them back assuming another.
# (file / network access x version / library mismanagement)
with open("names.csv", "w") as f:        # platform-default encoding (changed across versions)
    f.write(name)
...
name = open("names.csv", encoding="utf-8").read()   # decoded as UTF-8 -> mojibake
# every round-trip mangles the bytes further; the store is now corrupt.

# RIGHT: pin the encoding on both ends; never rely on the default.
with open("names.csv", "w", encoding="utf-8") as f:
    f.write(name)
name = open("names.csv", encoding="utf-8").read()

Relying on a default encoding that differs across versions and hosts means bytes written as one charset are decoded as another. The text is corrupted in place -- permanently -- and the damage compounds on each re-save.

// observed

mojibake: 'Jose' with an accent reads back garbled, then worse
pinned:   utf-8 on both ends -- bytes survive the round-trip

02weakness catalog

Mapped weaknesses (CWE)

On its own, this defect is catalogued by MITRE as one or more of these weaknesses. The exploitable vulnerability usually appears only when it chains or combines with another.