Wednesday 14 December 2016

Spectrograms: a ferocious example

A spectrogram is a great way to visualize the frequencies present in a given waveform, such as an audio recording, radio transmission or electronic signal. The spectral intensity of the signal is plotted (usually on a logarithmic scale) against time (x-axis) and frequency (y-axis).

Processing the data

The plot below shows the left and right (green overlaid on blue) channels from an audio recording. We want to decompose it at a series of time steps into its constituent frequencies.



The following python code is a fairly basic implementation of function to make spectrograms. We parcel the input data into several sections of a given width (N_fft), each of which will be Fourier transformed. It's useful to make sure that there is some overlap between these sections, so the output is fairly smooth. Since most inputs are real (as in the examples above - music, speech etc), we can do a purely real transform. The last thing we need is some envelope function, so that the edges of each section are not sharp, but rather the signal in each section smoothly decays to zero; in this case, we multiply by the Hamming function, which is commonly used for this purpose.

 from numpy import *
 def make_spectrogram(data,N_fft,N_overlap):  
      N_net=N_fft-N_overlap  
      total_iterations=(len(data)-N_fft)//N_net            
      output=zeros((N_fft//2+1,total_iterations),complex)  
      for idx in range(0,total_iterations):  
           output[:,idx]=fft.rfft(data[idx*N_net:idx*N_net+N_fft]*hamming(N_fft))  
      return output  


The spectrogram

After we apply the function to our nice data set (with an appropriate choice of the size and overlap of the sections to be Fourier transformed), we get the following plot:





















There is a usual humdrum of noise close to the bottom, around 100 Hz, which corresponds to human speech and other noise. But we can also see an intense spectrum of harmonics emerge and then rise and fall in frequency (usually called chirping) in two events at 20 and 21 seconds. The harmonics are either real constituents of the audio clip, or due to sampling, or most likely both.

The input

In this case I took the raw signal from the following clip of delightful Twitch streamer Oddish (OddishPlaysGames @StephOddish) as she ambushes a Brightwing player in Heroes of the Storm. (By the way, I challenge any young reader to explain that sentence to someone over 50)



(High-res original here)

The full code

If you'd like to try this yourself, the following python script should work with only fairly basic libraries. You will need to have an appropriate .wav file, which I obtained by downloading the clip from Twitch and extracting the audio with ffmpeg, but any other sinusoidal signal will also do.

 from numpy import *  
 import matplotlib.pyplot as Plot  
 Plot.rc('text', usetex=True)     #To make the plot text look nicer  
 Plot.rc('font', family='serif')  
 import wave as wave  
 import struct  
   
 def make_spectrogram(data,N_fft,N_overlap):  
      N_net=N_fft-N_overlap  
      total_iterations=(len(data)-N_fft)//N_net            
      output=zeros((N_fft//2+1,total_iterations),complex)  
      for idx in range(0,total_iterations):  
           output[:,idx]=fft.rfft(data[idx*N_net:idx*N_net+N_fft]*hamming(N_fft))  
      return output  
   
 #Read in .wav file and convert to array of floats for each channel; .wav generated by the following command:  
 #ffmpeg -i Oddish.mp4 -ab 160k -ac 2 -ar 44100 -vn Oddish.wav  
 wave_file=wave.open("Oddish.wav",'r')  
 params=wave_file.getparams()  
 channels=params[0]  
 frame_rate=int(params[2]/channels)  
 frame_number=params[3]  
 frames_raw=wave_file.readframes(frame_number)  
 time_window=[18.0,23.0]  
 frequency_window=[0,10000]  
 even_idx=list(range(0,frame_number,2)); odd_idx=list(range(1,frame_number,2));  
 frames=[array(struct.unpack("%ih" % frame_number*channels,frames_raw))[even_idx],array(struct.unpack("%ih" % frame_number*channels,frames_raw))[odd_idx]]  
   
 #Plot sinusoidal signal  
 Plot.plot(linspace(time_window[0],time_window[1],num=int(time_window[1]-time_window[0])*frame_rate),frames[0][int(time_window[0]*frame_rate):int(time_window[1]*frame_rate)])  
 Plot.plot(linspace(time_window[0],time_window[1],num=int(time_window[1]-time_window[0])*frame_rate),frames[1][int(time_window[0]*frame_rate):int(time_window[1]*frame_rate)])  
 Plot.xlabel(r'$t$ [s]',fontsize=28)  
 Plot.ylabel(r'Amplitude [a.u.]',fontsize=28)  
 Plot.savefig('OddishSignal.png', format='png', dpi=100,bbox_inches='tight',pad_inches=0.1)  
   
 #Create spectrogram  
 frequency_points=500  
 time_points=500  
 required_timepoints=(time_window[1]-time_window[0])*frame_rate  
 N_FFT=int(frame_rate/(frequency_window[1]-frequency_window[0])*frequency_points)  
 N_overlap=int((N_FFT*time_points-required_timepoints)/(time_points-1))  
 spectrogram=abs(make_spectrogram(frames[0][int(time_window[0]*frame_rate):int(time_window[1]*frame_rate)],N_FFT,N_overlap))  
   
 #Plot spectrogram  
 plot_time=linspace(time_window[0],time_window[1],len(spectrogram[0,:]))  
 plot_freq=linspace(frequency_window[0],frequency_window[1],len(spectrogram[:,0]))  
 fig=Plot.figure("Spectrogram")  
 ax=fig.gca()  
 colour_plot=ax.pcolor(plot_time,plot_freq/1000.0,log(spectrogram/amax(spectrogram)),vmin=-2.5*log(10), vmax=0,cmap='hot')  
 Plot.xlabel(r'$t$ [s]',fontsize=28)  
 Plot.ylabel(r'$f$ [kHz]',fontsize=28)  
 Plot.ylim(0,2)  
 Plot.xlim(time_window[0],time_window[1])  
 Plot.savefig('OddishSpectrogram.png', format='png', dpi=100,bbox_inches='tight',pad_inches=0.1)  
   
 Plot.show()  

No comments:

Post a Comment