Audio Steganography : The art of hiding secrets within earshot (part 2 of 2)

Sumit Kumar Arora
9 min readJun 17, 2018

In the previous installment of this piece on Audio Steganography, I described the fundamentals of Audio Steganography along with some applications and related concepts. If you haven’t yet read the earlier post, I recommend reading it before going further with this post.

Audio Steganography Methods

In this article, we will take a look at some of the popular methods to embed “secret” text, images and audios inside a “public” sound file. Our purpose is to enable a sender to secretly send their data, hidden within a song.

LSB (Least Significant Bit) Algorithm

LSB algorithm is a classic Steganography method used to conceal the existence of secret data inside a “public” cover. The LSB or “Least Significant Bit”, in computing terms, represents the bit at the unit’s place in the binary representation of a number.

For example, we can represent the decimal number 170 in binary notation as 10101010 (we assume little-endian machine with address starting from the right and increasing towards left). As shown in the figure, the least significant bit, in this case, is 0.

In the simplistic form, LSB algorithm replaces the LSB of each byte in the “carrier” data with one bit from the “secret” message. This concept is visualized in the diagram below.

The sender performs “embedding” of the bits of secret messages onto the carrier data byte-by-byte. Whereas the receiver performs the “extraction” procedure by reading LSB bits of each byte of received data, this way the receiver reconstructs the secret message.

Isn’t this corrupting the carrier signal?

Yes, but the main idea here is that we are trying to exploit the human perception of the integrity of the carrier signal. LSB steganography is very popular for Image Steganography, i.e. hiding secrets in images. And the change in LSB affects the color just so slightly that the change in color is not generally perceptible to the human eye. However, the human ear is more sensitive to slights changes in sound and hence the “noise” that we are adding would have a higher chance of being noticed. To overcome this problem of this trivial form of LSB algorithm, many researchers have suggested variants that increase robustness in the audio domain.

LSB algorithm implementation in Python

Let’s implement this method with some sound data as our carrier signal that would carry our secret text. As our sound data, I am taking a sample from a song by Indian electronic music producer Udyan Sagar’s (better known as Nucleya). This song will be the carrier of our secret text message, “Peter Parker is the Spiderman!”.

Please extend your support to the artist here.

The underlying bit manipulation in LSB is pretty straightforward. We will perform logical AND operation between each byte of carrier audio (the “song”) and a bit mask that resets the LSB of carrier byte.

Then we will perform a simple logical OR operation between the modified carrier byte and the next bit (0 or 1) from the secret message.

We will use .wav audio file format for our carrier song. Wave is one of the most popular lossless compression format. Python has a native library called “wave” that provides us basic tools to manipulate audio data.

Below code is used by the sender to embed the secret text message. The code is sufficiently commented to explain the process step-by-step.

As the output of the above code, we get the below audio file that has the secret text embedded in it. There is some noticeable noise in this audio file. As an experiment, you could try and embed every 2nd or 3rd byte of carrier audio with one bit from the secret, and see if that fetches a more robust result.

To extract the secret from this audio, the receiver shall run the below Python code.

Among other popular techniques for Audio Steganography are Phase Coding, Echo Hiding, and Spread Spectrum. These classic techniques do not induce noise in the carrier signal and hence are more robust methods for achieving Steganography. I encourage you to read related literature if you want to study them in more details or let me know in the comments if you would like me to write on these methods.

A note on SSTV

As you can image, we can embed any text, document, audio, video within the carrier audio by simply encoding bits of the secret data within each byte of carrier audio. However, the carrier message has to have enough data bytes so as to carry all the bits of the secret message. In the next section, we will take a look at a frequency modulation based method to hide secret data in the inaudible frequency range. But before that I would like to talk about a method that you can use to encode your image data to audio and then use the Steganography method described in the next section to embed and extract the image from the carrier audio.

SSTV is an acronym for Slow-Scan Television, which is a very popular method in radio transmission to send image data over a long distance via ionoshperic propagation. SSTV enables transmission of images in places where very little bandwidth is available, for example, over the Plain Old Telephone Service(POTS) line. In fact, Apollo 11 moon mission had used SSTV to transmit images back to earth.

Apollo 11 moonwalk unconverted slow-scan television image [Source]

SSTV is based on analog frequency modulation, that looks at the brightness of each pixel and the accordingly allocates a different audio frequency for it. Usually, SSTV is used to transfer greyscale images, we can also use it to transfer colored images with some loss in image resolution.

Amateur-Radio wiki has a list of some of the available SSTV software on different platforms. András Szentkirályi has made a Python package called PySSTV that generates SSTV modulated wave file for image files. You can use pip to install this package.

pip install pysstv 

After installing the package, you can run below command to convert an image file (image.jpg) to SSTV modulated audio (audio.wav) file.

python -m pysstv /path/to/image.jpg /path/to/audio.wav

This package also has the implementation for different SSTV modes including the popular Martin, Robot, and Scottie. You can specify the modes and other parameters while converting the image.

Now you can play this audio on your computer using any media player and that is all you need to do at the sender end. To receive this audio, I would recommend installing some application on your mobile phone that can extract the image back by listening to our “audio.wav” through the phone’s mic. One such Android app is the Robot36 on play store.

As an example, I encoded below image using PySSTV and got the given encoded audio.

“secret” image
image converted to SSTV audio

The resulting audio is not pleasant on the ear, but you can modulate it to the inaudible range and have it transferred covertly within a “public” audio.

In the below video I have given a demonstration of encoding and decoding an image by transferring it as SSTV modulated audio.

[Video to be added soon.]

Frequency Modulation based Steganography

The Frequency-modulation based method that we are going to discuss here is very powerful and doesn’t induce noise in the carrier signal. Using this method we will embed one “secret” audio onto another “public” audio. The secret audio shall be imperceptible to the human ear and the receiver would be able to extract the secret audio at their end.

The main idea of this algorithm is that we will conceal our secret audio in the near-ultrasound range while keeping our public audio data in the normal hearing range. As explained in the previous installment of this post, the near-ultrasound audio will be inaudible to most humans above a certain age. You can decide what frequency modulation frequency to use, based on the age and even gender of your target.

To modulate the secret audio onto the inaudible range, we will use the method described by Lowery Oliver M, in below patent.

We will use near-ultrasound frequencies to encode the secret rather than infrasounds because as we saw in the last post, long exposure to infrasounds can be harmful, whereas there is no scientific evidence of any harm being induced to human ear by small exposure to near-hypersounds.

We can use the method described above to make a demonstration on a regular sound card present on most computers with support for 44.1 kHz sampling rate. Audacity user, edgar-rft, has provided one implementation of the method in above patent in Nyquist programming language. This code sample can directly be imported in Audacity as a Nyquist plugin.

Points to note:

  • This plugin produces a single-side-band modulated signal with a suppressed carrier.
  • The sample rate must be more than double the frequency of the modulation carrier frequency. So, for example, if you wish to use a carrier frequency of 17500 Hz, then the sample rate of the secret audio must be more than 35000 Hz. You can use the “Tracks menu > Resample” option in Audacity, if this condition is not satisfied for your secret audio.
  • Most of the speakers and headphones reliably transmit audio up to 20,000 Hz. This is one reason why in my example, I will use a near-ultrasound carrier frequency range (17.5 kHz).
  • Make sure that when you export the processed audio, the sampling rate in “Project rate” is more than double the carrier frequency.

For this demonstration, let’s assume that we want to transfer the same song by the artist Nucleya, that we used in the earlier example. This will be our “secret” audio.

Frequency distribution for our secret audio shows that the data is well spread in the spectrum.

We will hide this audio within the below music by The Indian Jam Project. This music is their rendition of the Interstellar theme music. This will be the “public” cover for our secret audio.

Please extend your support to the artist here.

This is what frequency distribution for public audio looks like.

Next, we use the above Nyquist code to convert our secret to a subliminal audio. Here I am using the carrier frequency of 17500 Hz. You can use a different level as per your requirement and capability of the sound card. The output audio confirms that the secret is inaudible (it may still be audible to young people who have healthy hearing range).

Frequency distribution for the subliminal confirms the modulation.

Next, we mix the public audio and modulated secret audio to create a mixed output that has our secret in the audible range. This embedded audio file shall be sent to the intended receiver.

The frequency distribution for the mixture looks great.

At the receiver end simply run below code to extract the original secret.

Below is the extracted secret audio data.

The sound quality is a bit poor, because for a sample rate of 44100 Hz and a carrier frequency of 17500 Hz, the audio bandwidth is less than 4500 Hz, which is not enough for high-quality audio. Hence, you should expect some distorted speech signal, because there is not much “room” for sound quality in the range between 17500 and 22050 Hertz on a soundcard with only 44100 Hertz sample frequency. I encourage you to repeat this experiment with higher range for carrier frequency if you have a better sound card at your disposal.

That’s all from my side for this post on Audio Steganography. I hope you enjoyed learning about this cool concept, and I would love to hear your feedback in the comment section below.