A spectrogram is a great way to visualize the frequencies present in a given waveform, such as an audio recording, radio transmission or electronic signal. The spectral intensity of the signal is plotted (usually on a logarithmic scale) against time (x-axis) and frequency (y-axis).
Processing the data
The plot below shows the left and right (green overlaid on blue) channels from an audio recording. We want to decompose it at a series of time steps into its constituent frequencies.
The following python code is a fairly basic implementation of function to make spectrograms. We parcel the input data into several sections of a given width (N_fft), each of which will be Fourier transformed. It's useful to make sure that there is some overlap between these sections, so the output is fairly smooth. Since most inputs are real (as in the examples above - music, speech etc), we can do a purely real transform. The last thing we need is some envelope function, so that the edges of each section are not sharp, but rather the signal in each section smoothly decays to zero; in this case, we multiply by the Hamming function, which is commonly used for this purpose.
from numpy import *
def make_spectrogram(data,N_fft,N_overlap):
N_net=N_fft-N_overlap
total_iterations=(len(data)-N_fft)//N_net
output=zeros((N_fft//2+1,total_iterations),complex)
for idx in range(0,total_iterations):
output[:,idx]=fft.rfft(data[idx*N_net:idx*N_net+N_fft]*hamming(N_fft))
return output
The spectrogram
After we apply the function to our nice data set (with an appropriate choice of the size and overlap of the sections to be Fourier transformed), we get the following plot:
There is a usual humdrum of noise close to the bottom, around 100 Hz, which corresponds to human speech and other noise. But we can also see an intense spectrum of harmonics emerge and then rise and fall in frequency (usually called chirping) in two events at 20 and 21 seconds. The harmonics are either real constituents of the audio clip, or due to sampling, or most likely both.
The input
In this case I took the raw signal from the following clip of delightful Twitch streamer Oddish (OddishPlaysGames @StephOddish) as she ambushes a Brightwing player in Heroes of the Storm. (By the way, I challenge any young reader to explain that sentence to someone over 50)
(High-res original here)
The full code
If you'd like to try this yourself, the following python script should work with only fairly basic libraries. You will need to have an appropriate .wav file, which I obtained by downloading the clip from Twitch and extracting the audio with ffmpeg, but any other sinusoidal signal will also do.
from numpy import *
import matplotlib.pyplot as Plot
Plot.rc('text', usetex=True) #To make the plot text look nicer
Plot.rc('font', family='serif')
import wave as wave
import struct
def make_spectrogram(data,N_fft,N_overlap):
N_net=N_fft-N_overlap
total_iterations=(len(data)-N_fft)//N_net
output=zeros((N_fft//2+1,total_iterations),complex)
for idx in range(0,total_iterations):
output[:,idx]=fft.rfft(data[idx*N_net:idx*N_net+N_fft]*hamming(N_fft))
return output
#Read in .wav file and convert to array of floats for each channel; .wav generated by the following command:
#ffmpeg -i Oddish.mp4 -ab 160k -ac 2 -ar 44100 -vn Oddish.wav
wave_file=wave.open("Oddish.wav",'r')
params=wave_file.getparams()
channels=params[0]
frame_rate=int(params[2]/channels)
frame_number=params[3]
frames_raw=wave_file.readframes(frame_number)
time_window=[18.0,23.0]
frequency_window=[0,10000]
even_idx=list(range(0,frame_number,2)); odd_idx=list(range(1,frame_number,2));
frames=[array(struct.unpack("%ih" % frame_number*channels,frames_raw))[even_idx],array(struct.unpack("%ih" % frame_number*channels,frames_raw))[odd_idx]]
#Plot sinusoidal signal
Plot.plot(linspace(time_window[0],time_window[1],num=int(time_window[1]-time_window[0])*frame_rate),frames[0][int(time_window[0]*frame_rate):int(time_window[1]*frame_rate)])
Plot.plot(linspace(time_window[0],time_window[1],num=int(time_window[1]-time_window[0])*frame_rate),frames[1][int(time_window[0]*frame_rate):int(time_window[1]*frame_rate)])
Plot.xlabel(r'$t$ [s]',fontsize=28)
Plot.ylabel(r'Amplitude [a.u.]',fontsize=28)
Plot.savefig('OddishSignal.png', format='png', dpi=100,bbox_inches='tight',pad_inches=0.1)
#Create spectrogram
frequency_points=500
time_points=500
required_timepoints=(time_window[1]-time_window[0])*frame_rate
N_FFT=int(frame_rate/(frequency_window[1]-frequency_window[0])*frequency_points)
N_overlap=int((N_FFT*time_points-required_timepoints)/(time_points-1))
spectrogram=abs(make_spectrogram(frames[0][int(time_window[0]*frame_rate):int(time_window[1]*frame_rate)],N_FFT,N_overlap))
#Plot spectrogram
plot_time=linspace(time_window[0],time_window[1],len(spectrogram[0,:]))
plot_freq=linspace(frequency_window[0],frequency_window[1],len(spectrogram[:,0]))
fig=Plot.figure("Spectrogram")
ax=fig.gca()
colour_plot=ax.pcolor(plot_time,plot_freq/1000.0,log(spectrogram/amax(spectrogram)),vmin=-2.5*log(10), vmax=0,cmap='hot')
Plot.xlabel(r'$t$ [s]',fontsize=28)
Plot.ylabel(r'$f$ [kHz]',fontsize=28)
Plot.ylim(0,2)
Plot.xlim(time_window[0],time_window[1])
Plot.savefig('OddishSpectrogram.png', format='png', dpi=100,bbox_inches='tight',pad_inches=0.1)
Plot.show()