Audio filtering in Python with SciPy

1. Intro / Motivation

Recently, I've been weirdly fascinated with audio. More specifically, how audio does certain things, and that includes filters. I've been exploring as such to try and build towards a final goal; get hired by Dolby. To help reach that goal, I am attempting to build projects to help myself learn about audio, and how it works, as well as to build relevant experience. This is the first step in doing just that.

I built a Python script, which, using SciPy and NumPy, allows you to apply filters over audio using nothing but the command line. This project is primarily a learning exercise, so practical applications are limited. The code for this project is also not public at this time, though can be shared on request.

2. Project Overview

For a quick overview of features, this Python script:

Contains a CLI that applies bass, treble, or telephone filters to WAV files.
Can chain multiple filters in one run to explore combined effects.
Works with stereo audio (processes first channel) and normalizes automatically.
Outputs a standard 16-bit WAV file, ready to listen to.

3. Technical Approach / Key Concepts

When building the script, a primary focus (as always) was to minimize the number of external libraries and dependencies used. For this complete script, four libraries are used: numpy, scipy, scipy.io, and argparse.

SciPy.io was used for it's simple functionality allowing the script to easily read and write .wav files. Specifically, the scipy.io.wavfile.read() function was used, to convert the WAV file to a numeric array.

Once the WAV file was converted to an array, processing became straightforward. The first step was to handle stereo files correctly. To do this, the script processes the data such that it only works with the first channel. Then, it uses the astype function, in a format that looks like this audio_data = audio_data.astype(np.float32) / np.max(np.abs(audio_data)), which normalizes the interpreted data's values to between -1.0 and 1.0.

For filtering, Butterworth filters were used with SciPy. The specific choice to use Butterworth filters was made because they provide a smooth, flat response in the passband, allowing frequencies we want preserved to stay intact. It also rolls out frequencies that are unwanted gradually, which avoids unnatural artifacts, or harsh noises.

For the bass boost filter, the decision made was to simply lower and remove frequencies above the 1 kHz range. The other effect this has is it provides a muffled audio output for most aspects of whatever audio was passed in.

For the treble boost filter, audio above the 2 kHz range is amplified, while sound below that is lowered and removed. This provides an effect similar to (as weird and specific a description this is) similar to if you've ever heard a recording of sound playing out of a phone, being recorded on another phone. If you've heard this effect, it was most likely in a music video from the 2010s (I don't know why, but this is the only place I've ever heard it at least).

The goal with the telephone effect was slightly stranger than the other two. The goal here was to mimic the way that older phones would cut out lower and higher sounds, leaving only the midrange frequencies. To do this, sounds outside of the 300 to 3400 Hz range are reduced or removed. This makes the audio sound a little more compressed, almost like you're hearing it through an old phone handset.

The script allows you to run multiple filters, thanks to the argparse package. When running the script, arguments are passed in with the format of main.py input_file.wav output_file.wav --modes [treble, bass, telephone], however, multiple of these filters can be passed in. If they are, then the script will run the audio through all the filters in the order they were passed in. As an example, if when running the script, you used --modes bass telephone, then the script will take the audio file, and run it through the bass filter. It will then take the output from that filter, and run it through the telephone filter, before saving the output.

To save the output, it's first run through one line of code to take it from a numbered array back into an actual WAV file. The responsible code looks like this filtered_int16 = np.int16(filtered / np.max(np.abs(filtered)) * 32767). Once that's done, it's passed from SciPy.io's wavfile.write function, which writes the filtered data back into an audio file.

The whole process for applying a filter here is able to complete in just a few seconds, with the longest observed time being 14 seconds to apply all 3 filters. Performance of course may vary on different machines, however this is in a performance range I'd consider to be perfectly acceptable.

4. Challenges / Lessons Learned

Some challenges of course were faced while working on this, but nothing that couldn't be figured out with some time and effort.

Balancing the telephone filter to be noticeable on voices and music, while not being too in your face, or significantly hurting what was processed through it was a tricky act. I believe that I managed to strike a good balance, however I also understand there is always room for improvement. I do intend to potentially work more on this, fine tuning the numbers used in the effect.

Filter chaining presented some initial challenge, and to a degree still does. Due to the lack of overlap between the filters, running the treble and bass filters back to back for example isn't super noticeable compared to just running one of them. While that isn't a problem necessarily, it did throw me off a little bit during testing, so all in all, good lesson.

This is my first time working with any audio in a scripted capacity, let alone with Python. I was unaware of the process to normalize audio going into this project, which definitely caused some problems with regards to the audio clipping. Once I had some details however to work with for normalization, it did lead to a noticeable improvement in output.

5. Next Steps / Future Experiments

I do have some ways to potentially improve this script, obviously. Some of the key ways would be:

Allow custom cutoff frequencies via command line.
Handle stereo separately, or process both channels independently.
Add additional effects (reverb, EQ, compression)
Visualize frequency response of single and stacked filters.

If these were implemented, it would allow this project to be used for more advanced and practical applications, which would obviously be beneficial.

6. Conclusion / Takeaways

This project was a small but meaningful step in learning audio manipulation and processing with Python. Even with a basic script, consisting of 80 lines of code, written in roughly an hour and a half, I was able to implement a system for frequency manipulation, and understand filter design.

Overall, this script was written as a learning exercise to help develop an understanding of audio pipelines, codecs, and other aspects of audio, and I believe that this proved an effective way to begin learning. I intend to use knowledge gained from this to try to develop more advanced, practical applications off of this knowledge in the future.

Thanks for reading!\n - Heath / MinoDab492

(Note: the code used in this project is not open-source or available on GitHub or other platforms. If you'd like to request access to the code for whatever reason, you can email me at heath.garvin@minodabproductions.dev, or contact me on Discord @minodab492.)