Getting Started with Audio Design in Embedded Systems

May 11, 2025

Adding audio capabilities to embedded systems can make a big difference in the user experience, whether you are creating home automation, wearable, or industrial devices. Even just simple sound prompts or alerts can improve user interaction considerably.

This article takes you through the ways embedded devices process, store, and play back audio—without getting too involved in the subjective realm of "sound quality."

Audio Begins as Analog

In the real world, sound is analog. That is to say, any audio we wish to record and play back must first be converted from analog to digital—because embedded systems process digital data.

This conversion to digital involves two important parameters: sampling rate and bit depth.

Sampling rate is the number of times the sound signal is recorded per second. The Nyquist-Shannon Sampling Theorem states that your sampling rate needs to be a minimum of twice the highest frequency sound you'll be recording.

Bit depth is the degree of precision with which each of those sound samples is recorded—more bits = more detail, but also more memory consumption.

A real-world example: Telephone audio uses just 400–3400 Hz bandwidth, far lower than the full range of human hearing (20 Hz to 20 kHz), yet it’s good enough to understand speech and even recognize a person’s voice.

Choosing the Right Bit Depth

Bit depth specifies how much volume each audio sample can represent. For instance, an 8-bit sample can have 256 levels, and a 16-bit sample can have 65,536.

In embedded systems, ADCs (Analog-to-Digital Converters) accomplish this task. But the usable resolution in practice is usually a bit less than what's on the datasheet because of such imperfections as noise and signal distortion. A useful rule of thumb: deduct 2 bits from the advertised bit depth to arrive at a realistic expectation (e.g., use a 12-bit ADC as if it were really 10-bit).

Storing and Compressing Audio

Most embedded systems keep audio in PCM (Pulse Code Modulation) or WAV format. Both are straightforward and convenient but tend to use a lot of memory space. For example, CD-quality audio at 44.1 kHz with 16 bits of depth can occupy more than 700 KB for a single second of mono sound.

To conserve space, programmers usually:
Compact audio using MP3 (although this needs more processing power).
Pre-process the sound to restrict bandwidth and dynamic range through software such as Audacity.
Lower the sampling rate and bit depth to accommodate the hardware's capabilities better.

In the event of a limited processing power, external decoders can decode MP3 files in order to remove the workload from the primary processor.

Playing the Audio

Once the audio is ready, it has to be converted back to analog for playback. This is where DACs (Digital-to-Analog Converters) come in. PCM data goes directly to a DAC, while compressed formats need to be decoded first.

You’ll also need a low-pass filter after the DAC to remove high-frequency noise caused by sampling. If your system handles stereo output, you’ll need two DACs and filters.

Alternatively, many microcontrollers use I2S (Inter-IC Sound)—a digital audio protocol designed for efficient transmission of stereo sound using just three wires. I2S is flexible with sampling rates and bit depths, making it ideal for embedded applications.

Amplifying the Sound

Whether using DAC or I2S, the output signal is too weak to drive a speaker directly. That’s where audio amplifiers come in.

There are three main types:

Class-A: Great quality, but inefficient—rarely used in embedded systems.
Class-AB: More efficient, commonly used in chip form.
Class-D: Highly efficient, compact, and perfect for embedded devices.

Class-D amplifiers work by converting the signal to PWM (Pulse Width Modulation), then driving a transistor (like a MOSFET) on and off rapidly. This approach saves energy and reduces heat.

Just like with DACs, a low-pass filter is needed to clean up the output before it reaches the speaker.

Speaker Output

The sound is produced by converting the electrical signal into motion, and that motion is used to drive a coil that's suspended from a diaphragm. Depending on your application, you might require different kinds of speakers, such as woofers for low frequencies or tweeters for high frequencies. High-fidelity systems tend to use both for improved sound quality.

In Summary

Audio design in embedded systems involves a series of careful trade-offs—balancing storage, processing power, and playback quality. Whether you’re building simple voice alerts or adding rich audio playback, understanding how digital audio works from input to output is key to making smart design choices.

Ready to Add Audio to Your Embedded Product?

At Silicon Signals, we specialize in integrating high-performance audio solutions into embedded systems—whether it’s playback via I2S, class-D amplification, or optimizing audio storage for your platform.

🔊 Let’s build the future of sound, together.

📩 Reach out today at www.siliconsignals.io

or connect with us directly to explore our custom audio design services.

Search This Blog

Silicon Signals