Wednesday 14 December 2016

Spectrograms: a ferocious example

A spectrogram is a great way to visualize the frequencies present in a given waveform, such as an audio recording, radio transmission or electronic signal. The spectral intensity of the signal is plotted (usually on a logarithmic scale) against time (x-axis) and frequency (y-axis).

Processing the data

The plot below shows the left and right (green overlaid on blue) channels from an audio recording. We want to decompose it at a series of time steps into its constituent frequencies.



The following python code is a fairly basic implementation of function to make spectrograms. We parcel the input data into several sections of a given width (N_fft), each of which will be Fourier transformed. It's useful to make sure that there is some overlap between these sections, so the output is fairly smooth. Since most inputs are real (as in the examples above - music, speech etc), we can do a purely real transform. The last thing we need is some envelope function, so that the edges of each section are not sharp, but rather the signal in each section smoothly decays to zero; in this case, we multiply by the Hamming function, which is commonly used for this purpose.

 from numpy import *
 def make_spectrogram(data,N_fft,N_overlap):  
      N_net=N_fft-N_overlap  
      total_iterations=(len(data)-N_fft)//N_net            
      output=zeros((N_fft//2+1,total_iterations),complex)  
      for idx in range(0,total_iterations):  
           output[:,idx]=fft.rfft(data[idx*N_net:idx*N_net+N_fft]*hamming(N_fft))  
      return output  


The spectrogram

After we apply the function to our nice data set (with an appropriate choice of the size and overlap of the sections to be Fourier transformed), we get the following plot:





















There is a usual humdrum of noise close to the bottom, around 100 Hz, which corresponds to human speech and other noise. But we can also see an intense spectrum of harmonics emerge and then rise and fall in frequency (usually called chirping) in two events at 20 and 21 seconds. The harmonics are either real constituents of the audio clip, or due to sampling, or most likely both.

The input

In this case I took the raw signal from the following clip of delightful Twitch streamer Oddish (OddishPlaysGames @StephOddish) as she ambushes a Brightwing player in Heroes of the Storm. (By the way, I challenge any young reader to explain that sentence to someone over 50)



(High-res original here)

The full code

If you'd like to try this yourself, the following python script should work with only fairly basic libraries. You will need to have an appropriate .wav file, which I obtained by downloading the clip from Twitch and extracting the audio with ffmpeg, but any other sinusoidal signal will also do.

 from numpy import *  
 import matplotlib.pyplot as Plot  
 Plot.rc('text', usetex=True)     #To make the plot text look nicer  
 Plot.rc('font', family='serif')  
 import wave as wave  
 import struct  
   
 def make_spectrogram(data,N_fft,N_overlap):  
      N_net=N_fft-N_overlap  
      total_iterations=(len(data)-N_fft)//N_net            
      output=zeros((N_fft//2+1,total_iterations),complex)  
      for idx in range(0,total_iterations):  
           output[:,idx]=fft.rfft(data[idx*N_net:idx*N_net+N_fft]*hamming(N_fft))  
      return output  
   
 #Read in .wav file and convert to array of floats for each channel; .wav generated by the following command:  
 #ffmpeg -i Oddish.mp4 -ab 160k -ac 2 -ar 44100 -vn Oddish.wav  
 wave_file=wave.open("Oddish.wav",'r')  
 params=wave_file.getparams()  
 channels=params[0]  
 frame_rate=int(params[2]/channels)  
 frame_number=params[3]  
 frames_raw=wave_file.readframes(frame_number)  
 time_window=[18.0,23.0]  
 frequency_window=[0,10000]  
 even_idx=list(range(0,frame_number,2)); odd_idx=list(range(1,frame_number,2));  
 frames=[array(struct.unpack("%ih" % frame_number*channels,frames_raw))[even_idx],array(struct.unpack("%ih" % frame_number*channels,frames_raw))[odd_idx]]  
   
 #Plot sinusoidal signal  
 Plot.plot(linspace(time_window[0],time_window[1],num=int(time_window[1]-time_window[0])*frame_rate),frames[0][int(time_window[0]*frame_rate):int(time_window[1]*frame_rate)])  
 Plot.plot(linspace(time_window[0],time_window[1],num=int(time_window[1]-time_window[0])*frame_rate),frames[1][int(time_window[0]*frame_rate):int(time_window[1]*frame_rate)])  
 Plot.xlabel(r'$t$ [s]',fontsize=28)  
 Plot.ylabel(r'Amplitude [a.u.]',fontsize=28)  
 Plot.savefig('OddishSignal.png', format='png', dpi=100,bbox_inches='tight',pad_inches=0.1)  
   
 #Create spectrogram  
 frequency_points=500  
 time_points=500  
 required_timepoints=(time_window[1]-time_window[0])*frame_rate  
 N_FFT=int(frame_rate/(frequency_window[1]-frequency_window[0])*frequency_points)  
 N_overlap=int((N_FFT*time_points-required_timepoints)/(time_points-1))  
 spectrogram=abs(make_spectrogram(frames[0][int(time_window[0]*frame_rate):int(time_window[1]*frame_rate)],N_FFT,N_overlap))  
   
 #Plot spectrogram  
 plot_time=linspace(time_window[0],time_window[1],len(spectrogram[0,:]))  
 plot_freq=linspace(frequency_window[0],frequency_window[1],len(spectrogram[:,0]))  
 fig=Plot.figure("Spectrogram")  
 ax=fig.gca()  
 colour_plot=ax.pcolor(plot_time,plot_freq/1000.0,log(spectrogram/amax(spectrogram)),vmin=-2.5*log(10), vmax=0,cmap='hot')  
 Plot.xlabel(r'$t$ [s]',fontsize=28)  
 Plot.ylabel(r'$f$ [kHz]',fontsize=28)  
 Plot.ylim(0,2)  
 Plot.xlim(time_window[0],time_window[1])  
 Plot.savefig('OddishSpectrogram.png', format='png', dpi=100,bbox_inches='tight',pad_inches=0.1)  
   
 Plot.show()  

Monday 5 December 2016

Brent and Levenberg-Marquardt: the bread and butter algorithms for postgrads

There are two extremely useful numerical algorithms I end up using almost all the time when processing data. This was particularly true during my final year undergraduate project and my PhD, when like most people I had raw data which had to be processed and conclusions made. Here I will outline how to use them in python (which I think is the most useful and fastest coding all-purpose language) with a specific example, which is explained at the end.

Levenberg-Marquardt: Curve fitting


It is often useful to fit an analytical function to some raw, messy data. In this case, a numerical algorithm will iterate through different values of input parameters, until a mathematical function is as close as possible to the data's independent variable. The aim here might be to extract some hypothesized physical parameters; for example, Planck discovered the constant named after him, when fitting a function to black-body spectra.

In python, suppose we have two lists (with the same length) x and y. First, we define the function we think best describes the data, or one which might have the most scientific relevance. Here I have chosen a linear function, combined with a shifted hyperbolic tangent:

We want to use scipy's well-optimized curve_fit routine (which we import to our script) to fit this to our data. First, we define the left hand side of the above equation as fit_function, where the first argument is always our independent variable (here it's x) and the others are fit parameters. We then provide this function and the data as arguments to curve_fit:

 from numpy import *; from scipy.optimize import curve_fit  
 def fit_function(x,m,c,B,x0,w):  
      return m*x+c+B*tanh((x-x0)/w)  
 p,errors=curve_fit(fit_function,x,y,p0=[10.0,10.0,10.0,20.0,10.0])  
 y_fit=fit_function(copy(x),p[0],p[1],p[2],p[3],p[4])  

We have made the algorithm's job a lot easier by specifying an initial guess p0 for the parameters. This is generally useful, otherwise it may converge slowly, or get stuck in a basin of attraction far from the optimal results. Our parameters are returned in an array p, along with a covariance matrix. In order to plot the result, we now use the parameters to define y_fit and allow us to plot the following:


However, the linear coefficient m = -0.11 has come out negative. This may well be the best fit to our data, but suppose that we demand that it be positive (for physical reasons). We can then trick the algorithm by making our function return an absurd value if m is negative:

 def fit_function2(x,m,c,B,x0,w):  
      if m<0.0:  
           return ones((len(x)))*1E100  
      else:  
           return m*x+B*tanh((x-x0)/w)+c  

This does indeed give a positive (albeit small) value of m = 1.53E-8 and the following figure:


Brent: root finding

Now that we have an analytical model for our data, we can make some conclusions from it. For example, suppose we want to predict the value of x for which y = 17.75. There is no analytic inverse function for such a mix of linear and hyperbolic tangent. To solve the equation numerically, we rearrange it so that we are finding the roots of some expression, namely:
The most efficient generic algorithm to do this was invented by Richard Brent. It combines the method of bisection with interpolation, meaning that it requires two bounds (a,b) on the root; for it to work, the expression should go to zero once and only once in this interval. Strictly, the algorithm looks for a change in sign of a function, which could also happen if it does so when going through infinity, as for example 1/x does at x=0.

Brent's algorithm is implemented in scipy in a function called brentq, which takes as its inputs a function and the two bounds. The function argument must itself take only a single input; this is a problem in our case, because we all the parameters which we calculated in the previous section to supply to our fit function as well. We get around this by using the lambda notation in python, which can force function taking multiple variables to take just one. Our implementation is as follows:

 from scipy.optimize import brentq  
 threshold_1775=brentq(lambda x: 17.75-fit_function2(x,p[0],p[1],p[2],p[3],p[4]), 10.0,30.0)  

We have checked the graphs from previously to get the bounds (10,30) between which our result lies and have also checked that y = 17.75 is has only a single solution in this region (otherwise there would be multiple roots and the algorithm would fail). The result is x = 24.7.

The data

In fact, the data shown corresponds to gender diversity in physics; here x+1950 is the year and y is the percentage of Physics A-levels awarded to women. Many thanks to @KarenLMasters (with the raw data available here [1] [2]) for providing the following graph:
 
I'm using it here as an example, without getting into a discussion into its societal and cultural meaning. This is a curve fit we obtained with curve_fit:

 

Depressingly, as we see above, the best fit is a hyperbolic tangent (meaning that the percentage has "plateaued" at around 23%) with a negative or at best an insignificantly small linear trend on top. It is up to teachers, politicians and the rest of society to get rid of any linear transient and get B = 50%.

Extracting numerical data from screenshots

As the raw data was difficult to obtain, I ended up using  python script to extract the raw data from the JPEG figure itself, which is also an extremely useful trick for scientists.

The script takes a minimalist version of the graph in question, meaning that the image file must end at the axes of the graph - all white space and text has to be removed. I provide the script to extract the data from the following minimalist image RawData.png here:


The broad idea is to find the image co-ordinates of (in this case) any blue pixels and then transform to graph co-ordinates. In this case, given that there is one star per year, this allows the center of each star to be identified easily (rather than, say, its points).

 #Needed for PNG  
 import Image  
 from numpy import *  
 #Other stuff  
 import matplotlib.pyplot as Plot  
 from scipy.optimize import curve_fit  
 #~~# Graph data acquisition  
 imageFile="RawData.png"          #The image here has to be the axes only (no padding, no labels)  
 x_limits=array([1950.0,2020.0])     #Fill in these limits from the graph labels  
 y_limits=array([10.0,26.0])  
 im1=Image.open(imageFile)  
 rgb_im = im1.convert('RGB')  
 dimensions=rgb_im.size # 0 - x, 1 - y  
 X=linspace(x_limits[0],x_limits[1],dimensions[0])  
 Y=-ones((dimensions[0]))  
 for idx_x in range(0,dimensions[0]):  
      convergence=0  
      while convergence==0 and Y[idx_x]<dimensions[1]-1:  
           Y[idx_x]+=1  
           if rgb_im.getpixel((idx_x,int(Y[idx_x])))[2]>220 and rgb_im.getpixel((idx_x,int(Y[idx_x])))[0]<40:  
                convergence=1  
 for idx_y in range(0,dimensions[0]):  
      Y[idx_y]=(y_limits[1]-y_limits[0])*(dimensions[1]-Y[idx_y])/dimensions[1]+y_limits[0]  
 #**# Analysis  
 years=linspace(1951,2016,num=66)  
 percentage=zeros((len(years)))  
 def find_nearest(x,x_desired):  
      nearest_idx=0  
      nearest_difference=abs(x[0]-x_desired)  
      for idx in range(1,len(x)):  
           difference=abs(x[idx]-x_desired)  
           if difference<nearest_difference:  
                nearest_idx=idx  
                nearest_difference=difference  
      return nearest_idx  
 for idx in range(0,len(years)):  
      percentage[idx]=Y[find_nearest(X,years[idx])]  
 #Output  
 output_file=open('ExtractedData.txt', 'w')  
 for output_idx in range(0,len(years)):  
      print >> output_file, years[output_idx], percentage[output_idx]  
 output_file.close()  
 Plot.figure(1)  
 Plot.plot(X,Y)  
 Plot.plot(years,percentage,'o')  
 Plot.show()  

References

Thanks again to Karen Masters for making me aware of the following data, accessed at the time of writing (6th December 2016):

[1] http://www.iop.org/policy/statistics/gender/page_67095.html

[2] http://www.gatsby.org.uk/uploads/education/reports/pdf/18-physics-in-schools-patterns-and-policies-august-2006.pdf