The distribution of the secondary data in the Firfov Collecrion
In September,
2003, the database with the secudary data of the Firfov Collection was
completed. The main purpose of this effort is to present, analyse and comment
the distribution of the results in this database. Thus, the structure of
the audio material would be exposed, which is of crucial importance to
any future research of this collection.
Bearing in
mind that we faced many problems while entering the data, and later during
its analysis, this paper will concentrate on those issues. Thus, the paper
is composed of two sections:
- Errors made in the entering of data
- Distribution of the data
Both problems
presented in this study are directly related to the structure of the database,
i.e. the definitions of the fields. Bearing in mind that most of the fields
have nominal variables, apart from the rational variable of the duration
of the sound samples, the problem of clearly defining the categories within
each of the mentioned fields arises.
The base consists
of 18 main fields, some of which diverge into several subfields. A sample
of the database is included at the end of this text. Since most of the
fields clarify their definition by their name, this text will only examine
the problems in the definition of the fields and the errors made in the
entering of data, followed by a presentation of the analysis of the distribution
of data.
I. Errors made in the
entering of data
By making
use of the advantages of computer technology, in order to establish the
errors made in the entering of the data, we first began sorting the data
in the fields. Many different errors instantly appeared.
In order to examine the types of errors, we separated
them into two larger groups.
- Errors made by subjects
- Errors due to inadequate definition of the fields,
i.e. categories
The errors made by subjects can be further classified:
- errors made by the collector
- incorrect duration of the sound samples
- incorrect listing of title data
- incorrect listing of order
- incorrect listing of personal data about the performers
- incorrectly registered structure of the performers’
ensemble
The errors made due to inadequate definition of the fields may also be divided into:
- errors concerning the performer’s personal data
- incorrectly registered structure of the performers’
ensemble
1. Errors made by subjects
1.1. Errors made by the
collector
During the
construction of the secondary database, the main source of information
collected by Zhivko Firfov was the notebook he left together with his cassette
collection. Several errors immediately appeared upon the sorting of the
data. The collector made these errors while ordering the data.
We divided
these errors in two categories:
- incorrect titles
- incorrect entering of the performers’ personal data
1.1.1. Titles
This category
of errors includes errors made in the titles.
For example,
in number 335, the title “25 mina se madonci” is enclosed, whereas it is
evident that an error has been made. After re-listening to the song, one
can clearly hear the text “Dvaesetpetmina se mladi momci”.

The sound sample number 401 carried the name “Grushka shekjer” by Firfov, even though it is evident that the title is “Grutka shekjer”.

Song no. 820: Sarievski Aleksandar |
Song no. 1052: Aco Sarievski |
Song no. 739: stolevska Vera |
Song no. 1211: Vera Stoleva |
Thus the question how to note the names in cases such as the above mentioned:
- to leave the names as different ones (Stoleva/Stolevska),
or
- to note them in one way, as one name, since we know
that it is the same performer
While solving this problem, we took the following steps:
- if we knew the correct name of the performer from other
sources, we noted its official name (for instance, Sarievski’s official
name is Aleksandar, not Aco)
- if the performer was unknown to us (for instance, Vera
Stoleva or Vera Stolevska), we left it as it was originally noted.
The next problem
we faced while entering the data about the performers was writing the abbreviations.
In the sound samples no. 80, 82, in the instrumental group column, the
following abbreviations appeared “Rasp. Gostivarci” and “Rasp. Kavadarchani”.
1.2. Duration
The sorting
of this field revealed the errors made in the entering of data in the previous
two fields: time of beginning and ending. In fact, after the formatting
of this database, this field functioned like a formula which gave the results
between the differences of the previous two fields. After further inspection
and listening, these errors were corrected.
1.3. Titles
As in the
errors in the collector’s sound sample data, we found errors in the titles.
These errors were made while they were being entered in the database.
The sound
sample no. 409 has been noted as “Done, Done, semln Done”, and in Firfov’s
notebook the title of this song is “Done, Done, sejmen Done”.
The Firfov Collection
includes sound samples not registered by Firfov. In this case, errors were
made when the title was put into the “Collector’s title” field, instead
of the “Other title” field.
For instance,
no. 248 in the field “name of collector” carries the registered title “Bukite
razvivaat brakja, gore na balkanot”, whereas in Firfov’s notebook it hasn’t
been registered at all.
1.4 Order
The greatest
number of errors were found in the order of the sound samples, where one
sound sample had been registered in the database twice. Thus, for
instance, the sound samples from C 25 1 (cassette 26, first side)
were listed as continuations of C 24 2/1, and later repeated in the
C 25 1 section, where they actually belong. After correcting these
errors, the overall number of sound samples decreased by 19 units.
|
|
|
|
|
The second
instance of double listing of a same sound sample is when a sound sample
has not been recorded fully at the end of one cassette side, and continues
on the other side where it had been stopped.
For instance, the sound sample no. 574 at the end of
C 21 1 is not fully recorded, and continues on the other side (C
21 2). During the registering of data the number of order had been
repeated. Both sides read that the sample’s number is 574.
In such cases, should
the sample be listed with the same number of order, or should the continuing
part be listed as a new number? On the other hand, it is nevertheless a
separate sound sample, and it occupies new space on the cassette, or the
CD.
We believe that it should be listed as a separate number,
and note in the “comment” field that it is a continuation of the previous
song. After the correction of this error, the overall number of sound samples
increased by one unit.
1.5 Personal data about the performers
In the listing
of data in the database, we next encountered the problem of unintelligible
handwriting.
For example, in the collector’s notebook, the surname
of the performer in the sound samples no. 290, 291, 288 and 289 is unintelligible.

The “age” column (age of the performer) includes the dates of birth of the performers. This data only partially exist in Firfov’s notebook.
The errors
we faced while examining the database were related to the field -
the age of the performer, where two different ages for one performer were
noted. In cases such as these, we erased the question marks and replaced
them with “born” for the adequate year
Such are the examples no. 206/7,8, 9, 10 and 211:
In fact, this is
a result of incorrect registering of the informer’s age by the collector.
The recording itself proves that the informer mentions another age. Thus,
the subject registering the data noted both ages. We consider this an error
made by the subject, because the incorrect age (now noted in parentheses)
should be noted in the comment column. Such registering of data may create
confusion in the further analysis of information.
In this field
we also encountered registering of data about the date of birth of the
performer. This data doesn’t belong in this section, since the field should
only mention the age of the performer at the time of recording. Apart from
this database, there is another secondary database with the title “individuals”,
which is actually a detailed database on the performers in the Firfov Collection.
In any case, in the future this field will have to be reorganised, and
all the data will be included in the field “age of the performer”.
1.6. Incorrectly registered
structure of the performers’ ensemble
This field
included many errors in the registering of the structure of the performers’
ensemble. Thus, two performers accompanied by a folk instrument orchestra
were incorrectly registered “1M + Folk” (meaning one male vocalist and
folk instrument orchestra).
2. Errors in the defining
of the categories and fields
It was of
course impossible to foresee the situations arising from the individual
characteristics of the primary material during the formatting of the database.
Thus we may assume that the sorting of data in the secondary base caused
a number of errors to appear owing to the undefined content of the fields.
We have already noted the two types of errors that frequently occur:
- errors in the performers’ personal data
- incorrect structure of the performers’ ensemble
As we shall see later in this text, we had solutions for some of these errors and we used them in the correction of the database. However, we left the part concerning the performers as it is now, until a precise definition of the fields is brought.
2.1. Errors in the performers’
personal data
The main problem
in the definition of this field occurred with the separation of the categories
“group” and “soloist”. Our sound samples often involve two singers performing
in unison, in the same melodic line. This poses the question of whether
this is solo or group singing. Since in such situations the collector noted
the names of the singers, meaning he treated them as soloists, they were
incorporated as such in the adequate field. This is a serious problem leading
to a discrimination in terms of the distribution of the fields: “soloist”
and “group”. Because several subjects participated in registering the data,
an examination and correction of the inconsistent data follows.
The situation of two or more soloists performing with
accompaniment (vocal group, orchestra, instruments, etc) is somewhat clearer.
In this case, it is evident that the performers are soloists.
On the other
hand, the field “vocal/instrumental soloist” has not been precisely defined,
for it often includes more than two names. When this field would be defined
as “vocal/instrumental soloists”, we would again face the problem during
the processing of data, because of the absence of a consistent principle
in their sorting - especially when it comes to nominal variables.
2.2. Incorrect structure of the performers’ ensemble
This is probably
one of the most unclear fields in the database in terms of definition.
We faced many problems:
- registering the drone
- treating the issue group-soloist
2.2.1. Drone
In the initial
registering in the database, the drone samples were registered in the structure
of the performers’ ensemble. This doesn’t correspond to the definition
of this field, because the drone is not a characteristic of the structure
of the performers’ ensemble, but a characteristic of the piece of music.
Consequently, we decided to add another field indicating the presence of
this category. This avoids discrimination of the results of the field.
2.2.2. Treating the issue
group-soloist
We have already
discussed this issue in the unclear definition of the personal data fields,
which include parentheses only further adding to the vagueness of the categories.
Thus, for example, the sound sample no. 226 has been
registered:
II. Analysis of the distribution of data
This text will only present the frequency distribution of data in the fields whose structure permits such analysis. The fields which created problems in terms of definition will be also omitted here. Their analysis will follow their precise definition and correction of their data. Thus, the following fields will be analysed:
- duration
- titles
- lyric incipit
- language
- author
- performer
- original recording
- digitisation
- comments
After the last
examination of the database and the elimination of the incorrect data,
we are able to determine the complete number of sound samples in the Firfov
Collection.
The primary
database contains 1366 sound samples. The last three do not belong to the
Firfov Collection, but are radio shows dedicated to Zhivko Firfov and his
life. Furthermore, one sound sample is related to the demonstration of
the range and features of several Macedonian folk instruments. Thus, we
can conclude that the base contains 1362 music samples.
1. Duration
The digitised
material in this collection amounts to 29.47 GB (31,646,244,866 bytes),
distributed in 129 audio files, with duration of 48 hours, 36 minutes and
41 seconds. Nevertheless, this isn’t an accurate presentation of the duration
of the sound samples, since the complete duration includes the pauses at
the beginning and end of the sound samples, i.e. between them. Thus, apart
from the complete duration of the entire audio database, we shall present
the precise duration of the sound samples:
- 1366 sound samples with a duration of 47 hours, 20 minutes
and 18 seconds
- 1362 music samples with a duration of 45 hours, 38
minutes and 18 seconds
According to
the music samples, the mean is 2,47 minutes, which shows that on average,
one “song” lasts for 2,47 minutes. This indicates the fact that not all
“songs” have been recorded with their complete text. In fact, an additional
research related to these parameters indicates the deviations and variations
would take us a step further in the statistic analysis of this database.
In any case, at this moment we have grouped the music samples according
to duration up to one minute, and duration up to twelve minutes. The following
results were achieved:
|
|
4. Lyric incipit
This field
includes 1092 lyric incipit from a total of 1362 music samples. This is
not the final number of music samples, i.e. vocal-instrumental, since some
samples contain unintelligible lyrics. Upon comparing the fields “comments”
and “structure” we concluded that ten music samples are commented as vocal
samples with unintelligible lyrics. Four of these samples do not include
any text in the adequate field. From the remaining 266 samples, 162 are
vocal samples without registered lyric incipit and without explanations
in the “comment” section, whereas 104 samples are instrumental music samples.
|
|
|
6. Language
The examination
of the distribution of data in this field proved that in the database 1230
music samples are registered as sung in a particular language. The distribution
of the entered data contains 1177 sound samples in Macedonian, 22 in Aroumanian,
11 in Romany, 9 in Turkish, 5 in Serbian and 5 in Albanian. Bearing in
mind that the total number of vocal-instrumental samples, as we have already
noted, is 1258, data for 28 samples is missing. Because of the bad quality
of recording, the language of singing is unintelligible.
|
|
7. Author
Bearing in
mind that the collection above all refers to music folklore, one should
expect the samples to be of collective authors, as the definition of music
folklore notes. However, the distribution of data proved that this field
contains a number of original songs (24). The authorship was noted by Firfov
(and registered with ZAMP), but the name of the author was not stated.
We entered these songs in the column “unknown” author. Also, there is missing
data for 42 samples.
|
|
|
8. Performer
We have already
stated that serious problems appeared because of the inadequate defining
of this field. Thus, it requires additional processing in order show the
results of the distribution. As previously mentioned, the field “drone”
was added to this database. By examining the distribution in this field,
out of 1258 vocal music samples, 52 samples contained a drone.
9. Original recording
This column
contains several subdivisions referring to data concerning the original
recordings: date of recording, location, method, environment, technology
of recording, carrier, in which archive they exist, etc.
Bearing in
mind that part of the fields had identical data, the obtained results were
constant:
- equipment of music recording (Philips Cassette Deck)
- carrier of the original material (compact cassette)
- speed
- mono or stereo 9 (S)
- sound engineer (Zhivko Firfov)
- producer
- director
- publisher
- title
- archive (FC)
- number (C1-1, C1-2, etc)
From the remaining data in this field we have data about:
- the date of recording of 40 music samples
- data about the location of recording of 104 music samples
- data about the environment of recording of 1241 music
samples
The existing data yielded the following data:
|
|
|
11. Digitisation
This field
involved constants as the filed of “original recording”, owing to the fact
that the entire collection was digitised in IRAM, 2001. As shown, the following
was used:
- Logic Audio 4.0 software
- Hard Disk and Compact Disk carriers
- format: SDII; 44.1kHz/16Bit; Stereo
- the collection is located at the IRAM archives
- Firfov’s notebook is an additional source for secondary
data for this database.
12. Comments
The category
of this field includes the comments of the collector, as well as the comments
of the subjects entering the data in the database. All comments were in
Macedonian in their initial state.
In order to
simplify the analysis of this field, we designed a code for part of the
comments. In fact, we formed abbreviations from the comments in the English
language. Nevertheless, we were not able to design codes for all comments
(such as the ones that explain the song or informers in detail, or ones
where Firfov was not certain of the title of the song and used a question
mark instead, etc.). A brief outline of the abbreviations used follows:
Legend:
CC - collector’s
comment
OC - our comment
IC - informant’s
comment
CC:
v village
r - region
ens - ensemble
dc circular
dance
instr instrumental
rf regional
festival
OC:
ul unclear
lyrics
I - incomplete
mn missing
notes
cont - continuation
br
bad recording
conv - conversation
int - interruption
rep - repetition
IC:
IP before
informer presentation before the music sample
IP after
informer presentation after the music sample
Conclusion
This study
displayed the initial results from the distribution of the secondary data
from the Firfov Collection. As presented, part of the fields in which we
faced problems concerning the definition of their content have been set
aside for further examination. Nevertheless, the existing results are sufficient
for the presentation of the basic characteristics of the digitised primary
audio material. They will help the further search of this base and location
of the sections which might be of different theoretical interest.
Furthermore, they will serve as a basis for the formation of the Firfov
Collection tertiary analytical database.