Michel van der Mark
 

Retrieving Pitch and Rhythm from Archive Recordings


        Why would we want it?

        When listening to original recordings of folk music I found that the intonation is not always according to the modern equally tempered scale. For certain countries this is quite obvious. In Turkish music, for example, scales are used that have complicated but well-documented intervals. For other countries, like Bulgaria and Romania, the ethno-musicological books use mainly the 'classical' notation without any statement about different intonation. So here is something to discover.
        In the same manner, the rhythms of dance music differ in many cases from the mathematically correct time signature that they are said to have.  The musicians themselves may tell us that they play a 2/4 or 7/8 measure; the counts within these measures apparently do not all have the same duration. This difference is part of the 'style' and often typical for the music of some regions.

        When I was trying to perform Balkan music with a few other musicians, we sometimes had discussions about the correct rhythmical ‘feeling’.  For example, one said that the first count should be emphasised by making it longer, but another thought that the first count should be emphasised by delaying it, so stretching the preceding count.
        At that time, I felt the need for a way to find out exactly what happens. So I tried several ways to measure the exact timings on a recording. Later, the same tools could be used to practice and get some useful feedback.
For similar reasons, I also wanted to know more about the pitches and intervals of some recordings.
 

        The Pitch problem

        The general task that needs to be done, is retrieving the pitch of a number of notes of a song, then average the frequencies of all the notes that are meant to be the same, and calculate the intervals.
        The first action, retrieving the pitch, was the subject of some experiments. The most logical, technical way to do it appeared to be the use of Fast Fourier Transform. Many programs are available for doing that. In some cases it worked, but in most cases it failed because of the complexity of the sound. If the singer makes vibrato, trills or other ornaments, it becomes very hard to tell what part of the note should be measured to find the frequency that determines the 'intonation' or subjective pitch impression of the note.
        When there are other instruments, voices or other sounds, so many lines appear in the spectrum that it becomes impossible to find a reasonably accurate frequency.
        In other words, this method only works for very pure recordings of solo voices or instruments.

        So I started searching in a different direction. As a musician, I trust my ears better than results of measurements that have not proven to be useful. Is it not the ear of the musician that makes him decide about the intonation?
        Things such as timbre, modulations (trills, vibrato) and harmony may influence the subjective pitch for the listener, but also for the singer himself. The singer will produce a tone of a correct subjective pitch, rather than a tone with the mathematically correct frequency.
This also made me decide about another matter:  if we use the ear for comparing pitch to a reference tone, should the tones sound simultaneously or in sequence? This comparison obviously is needed to find the numbers (frequencies) for the statistical processing.
Listening to a looped note and a tone-generator simultaneously makes it possible to zero-beat these two. In other words: we tune the generator until the beats slow down to a completely stable tone.
        This method does not work at all, since the tones produced by a singer are never stable enough. Also, if there is an instrument playing in unison with the singer, or if there are other instruments or voices, the beats become inaudible. The best we can do is to tune the generator so that it ‘fits’ best into the sound. But this does not guarantee that its frequency matches the pitch impression of the original tone.

        In my opinion, alternating the sample of the song with the tone generator gives the best guarantee of a frequency that corresponds to the subjective pitch.
        To date, I have used the program ‘Cool Edit Pro’ to make a sample of a note.
This is simply done by dragging the mouse over the waveform, between the boundaries of what we expect to be one note. Then we can listen to it by pressing the play button or the loop button. It is easy to find the ‘body’ of the note; if there are still remains of other notes at the edges, we can correct the boundaries of the selection with the mouse and the shift key.
In the first experiment, this listening was interrupted, and a tone burst was generated with a freeware tone generator.  This sounded promising. It was quite easy to adjust the tone generator with an accuracy of 1 or 2 Hz, depending on the stability of the note in the sample (frequencies around 300 Hz).

        To achieve more accuracy, many samples need to be measured and averaged. This required automation of the process. Therefore, I made a little program to play the sample alternating with the tone burst.  The sample is transferred from Cool Edit Pro to the Pitch Compare program using the Windows clipboard.
        While listening to the loop, we can adjust the generator frequency until the pitches match. Then the frequency of the generator can be written to a text file with a button click. This text file can be imported in Excel or any other program for analysis.

        Testing this program, I noted some important details:
        First, the dynamic level (amplitude) of the generator should be matched to the level of the sample. If the levels are too different, it is much harder to compare. It is a well-known psycho-acoustic phenomenon that loudness influences the pitch experience of a sinusoidal tone. This means that a bad match of the levels leads to quite important errors.
        Second, the difference in timbre causes some doubt when comparing. So it would be nice if we could change the waveform of the tone generator in order to make the timbre more similar.
        It would not surprise me if the first problem will also be solved to a large extent by this modification.
 

        Interpretation of the results

        Here is a sample list of results in the song ‘Tri Bjulbjuli’ (Bulgaria). The samples are in their order of appearance,
 
 
298 do 279 si 278 Si 292 do
325 re 249 la 249  La 328 re
293 do 274 si 276 Si 277 si
321 re 294 do 298 Do 293 do
299 do 300 do 297 Do 273 si
281 si 298 do 274 Si 271 si
296 do 276 si 251 La 248 la
295 do 247 la 246 La
263 si 252 la 275 Si
254 la 251 la 291 Do
253 la 252 la 325 Re

        Since I am not good at mathematics and statistics, I started to do some interpretation by hand.
        I just categorized the frequencies into list per tone (name). Then an average is made per tone.
 
254 la 281 si 298  do 325 re
253 la 263 si 293 do 321 re
249 la 279 si 299 do 325 re
247 la 276 si 296 do 328 re
252 la 278 si 295 do
251 la 276 si 294 do
252 la 274 si 300  do
249  la 275 si 298 do
251 la 277 si 298 do
246 la 274 si 297 do
248 la 273  si 291 do
271 si 292 do
293 do
250,1818 274,75 295,6923 324,75  Average frequency
162,171 127,1727 162,2794 Interval in cents

        The intervals are calculating the logarithm of the ratio between two tones, and dividing this by the logarithm of one cent. This latter equals log(2) / 1200.
        Thus the whole calculation is:

Interval in cents = log(si / la) / log(2) * 1200

        As you can see, the intervals in this song differ quite a lot from the ‘modern’ intervals (100 cents for a semitone, 200 cents for a whole tone).  As a next step one could compare these intervals with the intervals in other, known scales.

        What also can be seen is that the deviations are not completely random. This is most probably because the intonation of the singer drifts a bit away. The averaging of the values compensates partly for this, but there can be an error when for example some notes appear only later in the song. I am thinking of a way to compensate for this.
 

        Retrieving rhythm and timing

        This problem is far less straightforward.
        The easiest way to have an impression of the timing is to imitate the rhythm by tapping it on the spacebar of a computer while the music plays. It is not difficult to make the computer measure the times of the tapping.
        In fact, in my first experiment, I tapped the rhythm with a pencil on the back of the running analogue tape. Then I measured the spacing between the marks. This had an extra advantage: I could ‘scrub’ the tape to find the exact start of a note, and then mark it (just like analogue editing). It is important to check the beginning of each note in this way, since the imitation of the rhythm is partly a result of our subjective interpretation of it; quite dangerous for an experiment that should give objective results!

        Today, with the help of computers, it should be much easier to do a job like this.
        Most editing programs have features for placing and moving marks of different kinds. So I tried to do it with Sonic Station. Here it is possible to find the start of a note in a graphic plot of the waveform. Then you can listen to and from this point, to see if the note starts really there. There are also shuttle and scrub functions. Or you can put markers ‘on the fly’, by hitting the marker key while the sound file plays.
        All these tools are very useful for placing and checking the mark positions.
        As an extra check on the positions, I experimented with a little ‘beep’ or other suitable sound, starting on each marker. This gives a very good indication if the timing is correct or not.
Unfortunately, this had to be done manually by copying and pasting time values. Quit a lot of work, so some automation would be useful too.
        The next problem: how to get a list of time values in a form, suitable for further processing.
I could not find a feature in the program to do this, nor to copy values from a dialog within Sonic to a text program. There was, however, a little trick. The marks are saved by Sonic in the Edit Decision List. This list is in text form, so with some tools I could copy the time list to a text file.

        The most logical order to do this job seems to me:
Select a representative part of the music and put it in the editor.
Practice a few times, then place marks by tapping on the appropriate key
Check the marks by looking at the wave form and by play to / play from mark
Make beeps on the marks (this is actually a rather difficult job; it must be automated)
Listen if the beeps correspond with the original rhythm; if possible at half the speed
Make corrections (by dragging the marks, if that is possible!)
Save the time values to a text file for processing.

        I am working on a software tool that has all the features for this job.

        Processing of the results

        Again, this is less straightforward than the pitch problem.
The most logical thing to start with is to order the list so that there will be one line for each bar:
 
 
Original time  values
7,7319 8,041573 8,496975 9,107429
9,631401 9,968829 10,39623 10,86864
11,38091 11,74346 12,14466 12,61522
13,15739 13,52054 13,94612 14,44601
15,03339 15,37082 15,79561 16,3455
16,90163 17,21083  17,62949 18,12314
18,73551 19,04794 19,44161 19,94452
20,47566  20,82558 21,23175 21,75664
22,30328

Next, two tables can be produced
In the first table, the time of the first count (the first of each row) is subtracted from the other values. In the other table, the difference between all successive counts is calculated.
 
Time  values relative to first count
0 0,309673 0,765075 1,375529 1,899501
0 0,337428  0,76483 1,237239  1,749513
0 0,362541 0,763745 1,234306 1,776475
0 0,363151 0,788727  1,288621 1,875997
0 0,337429 0,762228  1,312113 1,868245
0 0,309199 0,727861 1,221507 1,833877
0 0,312435  0,706102 1,20901 1,740148
0 0,349926 0,75609 1,280979 1,827628
0
Average values
0,335223 0,754332 1,269913  1,821423
Standard Deviations
0,021218 0,023952 0,052196 0,056189

 
 
 
 
 
 
Time between successive counts
0,309673 0,455402 0,610454 0,523972
0,337428 0,427402 0,472409 0,512274
0,362541 0,401204  0,470561 0,542169
0,363151 0,425576 0,499894 0,587376
0,337429 0,424799  0,549885 0,556132
0,309199 0,418662 0,493646 0,61237
0,31435 0,393667 0,502908 0,531138
0,349926 0,406164 0,524889 0,546649
Average values
0,335223 0,41911 0,515581 0,55151
Standard Deviations
0,021218 0,018013 0,043363 0,031296

 
        As you can see, the second method, in particular, has an interesting result. After the first, shortest count, the duration of each count increases, but there is no simple ratio!
(The music fragment was taken from a Romanian folk dance from the Mure_ region.)

        The Standard Deviation tells us if there are counts in the bar that are more variable than other counts. One may expect that for example the last count has a more or less undetermined extra pause or prolongation.

        A little problem in the left table, is that an error in the value of the first count influences the whole row.  To cure this problem, we could use interpolated starting times for each bar, instead of the measured time.
 
 
 
 Times from interpolated bar start
-0,02142 0,288257  0,743659  1,354113 1,878085
0,056662 0,39409 0,821492  1,293901  1,806175
-0,01525 0,347293  0,748497 1,219058  1,761227
-0,0602  0,302955 0,728531 1,228425 1,815801
-0,00562  0,331807 0,756606 1,306491 1,862623
0,0412  0,350399  0,769061  1,262707 1,875077
0,053654  0,366089  0,759756 1,262664 1,793802
-0,02762  0,322305  0,728469 1,253358 1,800007
-0,02142
Averages:
0,002677 0,3379  0,757009 1,27259 1,8241
Standard Deviations
0,04006  0,031942 0,02781 0,041279 0,04006

 
        For this example, there seems to be little difference, but that may vary in other cases.

        In this music fragment, the duration of each bar was more or less equal (constant tempo). Of course, there can be a change in tempo like ‘accelerando’. In these cases, we should calculate in percentages of the bar duration rather than in absolute seconds.

        A quite different purpose would be the analysis of measureless songs. All the calculations as described above are almost useless in this case, but we could still use the listing. Categorising the (shorter) notes makes it easier to make a precise notation of the song.
 

        Things to do

        The first challenge will be programming a computer tool for the rhythm and timing retrieval. In my current plans it will look like a sound editor. It can read a sound file from disk or from the Windows clipboard. Then you will see the waveform on the display. Zoom features are available, and you can play the part that is visible. You can put marks with the mouse or by hitting a key during playback. Marks can be moved with the mouse or with keys. They can be made audible as short beeps during playback. You can also play at half the speed.
        Finally the list of marker times can be saved in a text file.

        Processing of the results (both pitch and rhythm) is done in Excel, in a very primitive way. Of course, this can be automated to a large degree.

        The pitch list can be analysed so that the tones of the scale are separated automatically. Then the program will calculate averages and standard deviations of the tones.
It would be very nice to find a compensation mechanism for eventual ‘drift’ in overall pitch.

        The rhythm analyser program could have the following features:
Recognizing the bar pattern and automatically dividing into bars
Putting the count values in place even when counts are missing
Producing listings as above
Trying to find a suitable time signature for complex rhythms, and calculate the stress factor
To demonstrate the last item, imagine a bar that has two counts of different duration (like Rumanian ‘Hora’). Does it fit better in 3+2 or in 4+3?
 

        Conclusion

        The results of the experiments have proved to be very useful, as the above examples hopefully demonstrate.
A disadvantage of the methods described is that it is still based on the trained ears and musical skills of the person, in spite of the efforts to make the test as objective and structured as possible.

        The BIG advantage is that these tools and methods can be used on any recording, even if the sounds are complex and the technical quality is bad. During the process you can judge the reliability by estimating the amount of doubt in the decisions you make, and afterward by looking at the standard deviations.
        In some cases, the results may be very accurate, but in more difficult cases at least useful.
The results can be a guide for musicians that want to learn an ethnic style and intonation, or just as a source of information for scientific analysis.