Researcher fully recovers text from pixels: how to reverse redaction


Using pixelation to redact images? Those pixels may not actually be hiding anything.

A researcher has demonstrated how he was able to successfully recover text that had been redacted using the pixelation technique. Further, the researcher has released a GitHub tool that can be used by anyone to reconstruct text from obscure, pixelated images.

Reversing obscure pixels

This week, Dan Petro, Lead Researcher at offensive security firm Bishop Fox has demonstrated how he was able to completely recover text from an image redacted via the pixelation method.

When publishing sensitive images online, pixelation or blurring is often used as a redaction technique by media outlets and researchers alike.

But Petro shows why it might be safer to just stick good old opaque bars over the text you want to hide, rather than chancing it with alternate techniques—especially with pixelation.

Last year, Jumpsec Labs shared an open challenge for anyone to decipher the text present in the following image:

Pixelated image sent to the researcher
Pixelated image sent to the researcher (Bishop Fox)

“How could I refuse such a challenge?” says Petro, who after studying various pixelation and deobfuscation techniques arrived at a solution.

And after applying his magic, the researcher managed to fully reverse the text sent to him. At Jumpsec’s request, he’s only partially disclosed the solution:

recovered text
Unredacted text recovered from pixels — only first 4 characters shown (Bishop Fox)

“I reached out to Caleb Herbert at Jumpsec, and they confirmed that my guess was correct!” states Petro.

“Caleb also asked me to not disclose the solution, so you reading this can have a go at it yourself. (It’s blurred out above, and there’s no way you can read blurred text, right?).”

Interestingly, Petro’s partially disclosed solution is blurred rather than pixelated. So others can continue to experiment with and solve the challenge in their own ways, without being able to recover it from Petro’s blurred image.

Similar solutions exist for ‘enhancing’ low res images

Although similar solutions have existed for enhancing pixelated photos of people or landscapes, no concrete real-world solutions released thus far promised accurate recovery of text present in pixelated images, while simultaneously cutting out the noise.

Google Brain has previously been able to provide “zoom and enhance” functionality for photos, based on extensive research [PDF].

google brain results
Google Brain reconstructs images (center) from low res images provided (left).
The right column shows the actual images before their pixelation.

Existing tools like Depix do provide similar functionality for pixelated text blocks, but fall short in real-world scenarios, according to the researcher.

“I like the theory of this tool a lot, but… perhaps it doesn’t work as well in practice as you’d like,” says Petro, pointing to Jumpsec’s aforementioned research.

“In real world examples, you’re likely to get minor variations and noise that throws a wrench into the gears.”

Depix de-pixelation results
Text partially recovered by Depix in second row from the pixelated input (GitHub)

New GitHub tool recovers text from pixels

The researcher’s success with solving Jumpsec’s challenge prompted him, along with Bishop Fox, to release a new open source tool on GitHub called Unredacter.

A test run below shows Unredacter reconstructing original text in its entirety and correctly from the given pixelated input: 

Live running example
Live running example of Unredacter predicting original text from pixelated image (GitHub)

Suffice to say, when publishing sensitive images online—using opaque shapes for redaction provides far more assurance than pixelation.

Petro’s detailed research findings are explained in his Bishop Fox blog post.

“The bottom line is that when you need to redact text, use black bars covering the whole text. Never use anything else. No pixelization, no blurring, no fuzzing, no swirling,” warns Petro.

When redacting text, it should be edited as an image, warns the researcher, rather than being obscured using simple HTML/CSS styling. For example, text masked using the same background color as that of the text body itself can be trivially revealed when highlighted.

“The last thing you need after making a great technical document is to accidentally leak sensitive information because of an insecure redaction technique,” concludes the researcher.