Steganography — LSB Introduction with Python — Part 1

Published in

ITNEXT

5 min readOct 7, 2019

Hiding plain text messages inside the RGB channels of the pixels of a PNG image file

A friend of mine introduced me to hackthebox.eu a few weeks ago and since then, even though it’s not the main category of the challenges, I’ve been fascinated by the world of steganography. The whole concept of a seemingly innocuous container hiding secret data inside of it takes me aback to when I was a kid and first discovered that I could write secret messages with lemon juice.

Steganography (/ˌstɛɡəˈnɒɡrəfi/ (STEG-ə-NOG-rə-fee) is the practice of concealing a file, message, image, or video within another file, message, image, or video. The word steganography combines the Greek words steganos (στεγᾰνός), meaning “covered or concealed”, and graphe (γραφή) meaning “writing”.

It’s like all of a sudden you have this super power, even if you’ll never need to use it. You picture yourself as a spy, carrying hidden data that even the best minds wouldn’t be able to find.

Of course the reality is not exactly like that and there are a myriad of tools that will help a curious mind to find the data. It was the blind usage of these tools that made me try to understand the process and, while I do it, write some tools that will help us hide and retrieve data inside other files.

Let me emphasize the word find because, even though it’s usually used in combination with encryption, steganography at its most basic form will only hide some data inside a container.

I want to focus today on LSB steganography inside an image holder. LSB stands for least significant bit and it refers to the process of replacing the least significant bit of the bytes that create a container file with the bits that form the data we want to hide.

The initial goal will be to hide the string “ledger” inside a 12x12 image. After we do this I’ll follow up on a separate post on how to hide an image inside another one and improve on the code we write today to provide some failsafes.

If we read the pixels from this image with something like PIL we’ll get a tuple that contains the values (0–255) for the RGB(+alpha if available) channels. This means that for each pixel we have 3–4 bits to work with. The binary representation of the string “ledger” I want to hide inside this image is 011011000110010101100100011001110110010101110010 which is 48 bits long meaning we will need at least 16 pixels to work with since we will assume no alpha channel.

How does this work, visually?

Each pixel of the image contains three channels representing the red, green and blue values for that pixel’s color. That value can be represented in a single byte and combining them (you can do this with lighten blend mode in your image editor) will, of course, return the source color.

Replacing the least significant bit in this case means turning for instance the 01100100 representing the rosewood color to 01100101 but since the first three bits of our data are 011 we only need to replace the green 10010000 with 10010001 . This means that in most cases we wont even have to modify all channels

Believe me, the bottom rectangle is two different colors

While the numbers don’t lie, the visual difference between #6490F1 and #6491F1 is negligible. Also, we need to keep in mind that when this is done in the real world it would be hosted on an image with a lot more variety of colors which would make it almost impossible for the naked eye to see any difference whatsoever.

The simplest code I could come up with to write arbitrary binary data into a png image without any data length checks, prompts or helps is as follows. The code should read pretty straight-forward with perhaps the only exception being the bit manipulation. pixel[n] & ~1 will clear the LSB from the channel and then we only need to change to a 1 when our data says so, running | 1 will turn that LSB into a 1 and | 0 will keep it as is.

i=0
data = "011011000110010101100100011001110110010101110010" #ledger
with Image.open("source.png") as img:
    width, height = img.size
    for x in range(0, width):
        for y in range(0, height):
            pixel = list(img.getpixel((x, y)))
            for n in range(0,3):
                if(i < len(data)):
                    pixel[n] = pixel[n] & ~1 | int(data[i])
                    i+=1
            img.putpixel((x,y), tuple(pixel))
    img.save("source_secret.png", "PNG")

I’m sure there are other ways of doing this and, as I mentioned above, I’m learning while I write this so feel free to provide some input it I got something wrong :)

How to get the data back?

The code to extract is practically the same as to write but instead of modifying the LSB we simply read it. To read the LSB of a byte in python we do a bitwise AND &1

extracted_bin = []
with Image.open("source_secret.png") as img:
    width, height = img.size
    byte = []
    for x in range(0, width):
        for y in range(0, height):
            pixel = list(img.getpixel((x, y)))
            for n in range(0,3):
                extracted_bin.append(pixel[n]&1)

data = "".join([str(x) for x in extracted_bin])

Since we are ignoring the length of the hidden data the output will be at the begining followed by a text representation of the rest of the image, sort of what would happen if you opened the image on a text editor.

What’s the limit of data that can be stored like this?

Assuming we replace the LSB of every single pixel of the image we can in theory store W*H*3/8 bytes of data which considering the holder image we are using being 12x12 pixels would give us 54 bytes of storage in the LSB.

This image contains a secret message, but medium probably processes them so here’s a link to the original https://raw.githubusercontent.com/juan-cortes/steg/master/test_secret.png

I’ll get back to you with something more advance.

Steganography — LSB Introduction with Python — Part 1

How does this work, visually?

How to get the data back?

What’s the limit of data that can be stored like this?

Written by Juan Cortés