Why would we want it?
When listening to original recordings of folk music I found that the intonation
is not always according to the modern equally tempered scale. For certain
countries this is quite obvious. In Turkish music, for example, scales
are used that have complicated but well-documented intervals. For other
countries, like Bulgaria and Romania, the ethno-musicological books use
mainly the 'classical' notation without any statement about different intonation.
So here is something to discover.
In the same manner, the rhythms of dance music differ in many cases from
the mathematically correct time signature that they are said to have.
The musicians themselves may tell us that they play a 2/4 or 7/8 measure;
the counts within these measures apparently do not all have the same duration.
This difference is part of the 'style' and often typical for the music
of some regions.
When I was trying to perform Balkan music with a few other musicians, we
sometimes had discussions about the correct rhythmical ‘feeling’.
For example, one said that the first count should be emphasised by making
it longer, but another thought that the first count should be emphasised
by delaying it, so stretching the preceding count.
At that time, I felt the need for a way to find out exactly what happens.
So I tried several ways to measure the exact timings on a recording. Later,
the same tools could be used to practice and get some useful feedback.
For similar reasons, I also
wanted to know more about the pitches and intervals of some recordings.
The Pitch problem
The general task that needs to be done, is retrieving the pitch of a number
of notes of a song, then average the frequencies of all the notes that
are meant to be the same, and calculate the intervals.
The first action, retrieving the pitch, was the subject of some experiments.
The most logical, technical way to do it appeared to be the use of Fast
Fourier Transform. Many programs are available for doing that. In some
cases it worked, but in most cases it failed because of the complexity
of the sound. If the singer makes vibrato, trills or other ornaments, it
becomes very hard to tell what part of the note should be measured to find
the frequency that determines the 'intonation' or subjective pitch impression
of the note.
When there are other instruments, voices or other sounds, so many lines
appear in the spectrum that it becomes impossible to find a reasonably
accurate frequency.
In other words, this method only works for very pure recordings of solo
voices or instruments.
So I started searching in a different direction. As a musician, I trust
my ears better than results of measurements that have not proven to be
useful. Is it not the ear of the musician that makes him decide about the
intonation?
Things such as timbre, modulations (trills, vibrato) and harmony may influence
the subjective pitch for the listener, but also for the singer himself.
The singer will produce a tone of a correct subjective pitch, rather than
a tone with the mathematically correct frequency.
This also made me decide about
another matter: if we use the ear for comparing pitch to a reference
tone, should the tones sound simultaneously or in sequence? This comparison
obviously is needed to find the numbers (frequencies) for the statistical
processing.
Listening to a looped note and
a tone-generator simultaneously makes it possible to zero-beat these two.
In other words: we tune the generator until the beats slow down to a completely
stable tone.
This method does not work at all, since the tones produced by a singer
are never stable enough. Also, if there is an instrument playing in unison
with the singer, or if there are other instruments or voices, the beats
become inaudible. The best we can do is to tune the generator so that it
‘fits’ best into the sound. But this does not guarantee that its frequency
matches the pitch impression of the original tone.
In my opinion, alternating the sample of the song with the tone generator
gives the best guarantee of a frequency that corresponds to the subjective
pitch.
To date, I have used the program ‘Cool Edit Pro’ to make a sample of a
note.
This is simply done by dragging
the mouse over the waveform, between the boundaries of what we expect to
be one note. Then we can listen to it by pressing the play button or the
loop button. It is easy to find the ‘body’ of the note; if there are still
remains of other notes at the edges, we can correct the boundaries of the
selection with the mouse and the shift key.
In the first experiment, this
listening was interrupted, and a tone burst was generated with a freeware
tone generator. This sounded promising. It was quite easy to adjust
the tone generator with an accuracy of 1 or 2 Hz, depending on the stability
of the note in the sample (frequencies around 300 Hz).
To achieve more accuracy, many samples need to be measured and averaged.
This required automation of the process. Therefore, I made a little program
to play the sample alternating with the tone burst. The sample is
transferred from Cool Edit Pro to the Pitch Compare program using the Windows
clipboard.
While listening to the loop, we can adjust the generator frequency until
the pitches match. Then the frequency of the generator can be written to
a text file with a button click. This text file can be imported in Excel
or any other program for analysis.
Testing this program, I noted some important details:
First, the dynamic level (amplitude) of the generator should be matched
to the level of the sample. If the levels are too different, it is much
harder to compare. It is a well-known psycho-acoustic phenomenon that loudness
influences the pitch experience of a sinusoidal tone. This means that a
bad match of the levels leads to quite important errors.
Second, the difference in timbre causes some doubt when comparing. So it
would be nice if we could change the waveform of the tone generator in
order to make the timbre more similar.
It would not surprise me if the first problem will also be solved to a
large extent by this modification.
Interpretation of the results
Here is a sample list of results in the song ‘Tri Bjulbjuli’ (Bulgaria).
The samples are in their order of appearance,
| 298 | do | 279 | si | 278 | Si | 292 | do |
| 325 | re | 249 | la | 249 | La | 328 | re |
| 293 | do | 274 | si | 276 | Si | 277 | si |
| 321 | re | 294 | do | 298 | Do | 293 | do |
| 299 | do | 300 | do | 297 | Do | 273 | si |
| 281 | si | 298 | do | 274 | Si | 271 | si |
| 296 | do | 276 | si | 251 | La | 248 | la |
| 295 | do | 247 | la | 246 | La | ||
| 263 | si | 252 | la | 275 | Si | ||
| 254 | la | 251 | la | 291 | Do | ||
| 253 | la | 252 | la | 325 | Re |
Since I am not good at mathematics and statistics, I started to do some
interpretation by hand.
I just categorized the frequencies into list per tone (name). Then an average
is made per tone.
| 254 | la | 281 | si | 298 | do | 325 | re | |
| 253 | la | 263 | si | 293 | do | 321 | re | |
| 249 | la | 279 | si | 299 | do | 325 | re | |
| 247 | la | 276 | si | 296 | do | 328 | re | |
| 252 | la | 278 | si | 295 | do | |||
| 251 | la | 276 | si | 294 | do | |||
| 252 | la | 274 | si | 300 | do | |||
| 249 | la | 275 | si | 298 | do | |||
| 251 | la | 277 | si | 298 | do | |||
| 246 | la | 274 | si | 297 | do | |||
| 248 | la | 273 | si | 291 | do | |||
| 271 | si | 292 | do | |||||
| 293 | do | |||||||
| 250,1818 | 274,75 | 295,6923 | 324,75 | Average frequency | ||||
| 162,171 | 127,1727 | 162,2794 | Interval in cents |
The intervals are calculating the logarithm of the ratio between two tones,
and dividing this by the logarithm of one cent. This latter equals log(2)
/ 1200.
Thus the whole calculation is:
Interval in cents = log(si / la) / log(2) * 1200
As you can see, the intervals in this song differ quite a lot from the ‘modern’ intervals (100 cents for a semitone, 200 cents for a whole tone). As a next step one could compare these intervals with the intervals in other, known scales.
What also can be seen is that the deviations are not completely random.
This is most probably because the intonation of the singer drifts a bit
away. The averaging of the values compensates partly for this, but there
can be an error when for example some notes appear only later in the song.
I am thinking of a way to compensate for this.
Retrieving rhythm and timing
This problem is far less straightforward.
The easiest way to have an impression of the timing is to imitate the rhythm
by tapping it on the spacebar of a computer while the music plays. It is
not difficult to make the computer measure the times of the tapping.
In fact, in my first experiment, I tapped the rhythm with a pencil on the
back of the running analogue tape. Then I measured the spacing between
the marks. This had an extra advantage: I could ‘scrub’ the tape to find
the exact start of a note, and then mark it (just like analogue editing).
It is important to check the beginning of each note in this way, since
the imitation of the rhythm is partly a result of our subjective interpretation
of it; quite dangerous for an experiment that should give objective results!
Today, with the help of computers, it should be much easier to do a job
like this.
Most editing programs have features for placing and moving marks of different
kinds. So I tried to do it with Sonic Station. Here it is possible to find
the start of a note in a graphic plot of the waveform. Then you can listen
to and from this point, to see if the note starts really there. There are
also shuttle and scrub functions. Or you can put markers ‘on the fly’,
by hitting the marker key while the sound file plays.
All these tools are very useful for placing and checking the mark positions.
As an extra check on the positions, I experimented with a little ‘beep’
or other suitable sound, starting on each marker. This gives a very good
indication if the timing is correct or not.
Unfortunately, this had to be
done manually by copying and pasting time values. Quit a lot of work, so
some automation would be useful too.
The next problem: how to get a list of time values in a form, suitable
for further processing.
I could not find a feature in
the program to do this, nor to copy values from a dialog within Sonic to
a text program. There was, however, a little trick. The marks are saved
by Sonic in the Edit Decision List. This list is in text form, so with
some tools I could copy the time list to a text file.
The most logical order to do this job seems to me:
Select a representative part
of the music and put it in the editor.
Practice a few times, then place
marks by tapping on the appropriate key
Check the marks by looking at
the wave form and by play to / play from mark
Make beeps on the marks (this
is actually a rather difficult job; it must be automated)
Listen if the beeps correspond
with the original rhythm; if possible at half the speed
Make corrections (by dragging
the marks, if that is possible!)
Save the time values to a text
file for processing.
I am working on a software tool that has all the features for this job.
Processing of the results
Again, this is less straightforward than the pitch problem.
The most logical thing to start
with is to order the list so that there will be one line for each bar:
| 7,7319 | 8,041573 | 8,496975 | 9,107429 |
| 9,631401 | 9,968829 | 10,39623 | 10,86864 |
| 11,38091 | 11,74346 | 12,14466 | 12,61522 |
| 13,15739 | 13,52054 | 13,94612 | 14,44601 |
| 15,03339 | 15,37082 | 15,79561 | 16,3455 |
| 16,90163 | 17,21083 | 17,62949 | 18,12314 |
| 18,73551 | 19,04794 | 19,44161 | 19,94452 |
| 20,47566 | 20,82558 | 21,23175 | 21,75664 |
| 22,30328 |
Next, two tables can be produced
In the first table, the time
of the first count (the first of each row) is subtracted from the other
values. In the other table, the difference between all successive counts
is calculated.
| 0 | 0,309673 | 0,765075 | 1,375529 | 1,899501 |
| 0 | 0,337428 | 0,76483 | 1,237239 | 1,749513 |
| 0 | 0,362541 | 0,763745 | 1,234306 | 1,776475 |
| 0 | 0,363151 | 0,788727 | 1,288621 | 1,875997 |
| 0 | 0,337429 | 0,762228 | 1,312113 | 1,868245 |
| 0 | 0,309199 | 0,727861 | 1,221507 | 1,833877 |
| 0 | 0,312435 | 0,706102 | 1,20901 | 1,740148 |
| 0 | 0,349926 | 0,75609 | 1,280979 | 1,827628 |
| 0 |
| 0 | 0,335223 | 0,754332 | 1,269913 | 1,821423 |
| 0,021218 | 0,023952 | 0,052196 | 0,056189 |
| 0,309673 | 0,455402 | 0,610454 | 0,523972 |
| 0,337428 | 0,427402 | 0,472409 | 0,512274 |
| 0,362541 | 0,401204 | 0,470561 | 0,542169 |
| 0,363151 | 0,425576 | 0,499894 | 0,587376 |
| 0,337429 | 0,424799 | 0,549885 | 0,556132 |
| 0,309199 | 0,418662 | 0,493646 | 0,61237 |
| 0,31435 | 0,393667 | 0,502908 | 0,531138 |
| 0,349926 | 0,406164 | 0,524889 | 0,546649 |
| 0,335223 | 0,41911 | 0,515581 | 0,55151 |
| 0,021218 | 0,018013 | 0,043363 | 0,031296 |
As you can see, the second method, in particular, has an interesting result.
After the first, shortest count, the duration of each count increases,
but there is no simple ratio!
(The music fragment was taken
from a Romanian folk dance from the Mure_ region.)
The Standard Deviation tells us if there are counts in the bar that are more variable than other counts. One may expect that for example the last count has a more or less undetermined extra pause or prolongation.
A little problem in the left table, is that an error in the value of the
first count influences the whole row. To cure this problem, we could
use interpolated starting times for each bar, instead of the measured time.
| -0,02142 | 0,288257 | 0,743659 | 1,354113 | 1,878085 |
| 0,056662 | 0,39409 | 0,821492 | 1,293901 | 1,806175 |
| -0,01525 | 0,347293 | 0,748497 | 1,219058 | 1,761227 |
| -0,0602 | 0,302955 | 0,728531 | 1,228425 | 1,815801 |
| -0,00562 | 0,331807 | 0,756606 | 1,306491 | 1,862623 |
| 0,0412 | 0,350399 | 0,769061 | 1,262707 | 1,875077 |
| 0,053654 | 0,366089 | 0,759756 | 1,262664 | 1,793802 |
| -0,02762 | 0,322305 | 0,728469 | 1,253358 | 1,800007 |
| -0,02142 |
| 0,002677 | 0,3379 | 0,757009 | 1,27259 | 1,8241 |
| 0,04006 | 0,031942 | 0,02781 | 0,041279 | 0,04006 |
For this example, there seems to be little difference, but that may vary
in other cases.
In this music fragment, the duration of each bar was more or less equal (constant tempo). Of course, there can be a change in tempo like ‘accelerando’. In these cases, we should calculate in percentages of the bar duration rather than in absolute seconds.
A quite different purpose would be the analysis of measureless songs. All
the calculations as described above are almost useless in this case, but
we could still use the listing. Categorising the (shorter) notes makes
it easier to make a precise notation of the song.
Things to do
The first challenge will be programming a computer tool for the rhythm
and timing retrieval. In my current plans it will look like a sound editor.
It can read a sound file from disk or from the Windows clipboard. Then
you will see the waveform on the display. Zoom features are available,
and you can play the part that is visible. You can put marks with the mouse
or by hitting a key during playback. Marks can be moved with the mouse
or with keys. They can be made audible as short beeps during playback.
You can also play at half the speed.
Finally the list of marker times can be saved in a text file.
Processing of the results (both pitch and rhythm) is done in Excel, in a very primitive way. Of course, this can be automated to a large degree.
The pitch list can be analysed so that the tones of the scale are separated
automatically. Then the program will calculate averages and standard deviations
of the tones.
It would be very nice to find
a compensation mechanism for eventual ‘drift’ in overall pitch.
The rhythm analyser program could have the following features:
Recognizing the bar pattern
and automatically dividing into bars
Putting the count values in
place even when counts are missing
Producing listings as above
Trying to find a suitable time
signature for complex rhythms, and calculate the stress factor
To demonstrate the last item,
imagine a bar that has two counts of different duration (like Rumanian
‘Hora’). Does it fit better in 3+2 or in 4+3?
Conclusion
The results of the experiments have proved to be very useful, as the above
examples hopefully demonstrate.
A disadvantage of the methods
described is that it is still based on the trained ears and musical skills
of the person, in spite of the efforts to make the test as objective and
structured as possible.
The BIG advantage is that these tools and methods can be used on any recording,
even if the sounds are complex and the technical quality is bad. During
the process you can judge the reliability by estimating the amount of doubt
in the decisions you make, and afterward by looking at the standard deviations.
In some cases, the results may be very accurate, but in more difficult
cases at least useful.
The results can be a guide for
musicians that want to learn an ethnic style and intonation, or just as
a source of information for scientific analysis.