Irena Mitevska

The distribution of the secondary data in the Firfov Collecrion

        In September, 2003, the database with the secudary data of the Firfov Collection was completed. The main purpose of this effort is to present, analyse and comment the distribution of the results in this database. Thus, the structure of the audio material would be exposed, which is of crucial importance to any future research of this collection.
        Bearing in mind that we faced many problems while entering the data, and later during its analysis, this paper will concentrate on those issues. Thus, the paper is composed of two sections:

- Errors made in the entering of data
- Distribution of the data

        Both problems presented in this study are directly related to the structure of the database, i.e. the definitions of the fields. Bearing in mind that most of the fields have nominal variables, apart from the rational variable of the duration of the sound samples, the problem of clearly defining the categories within each of the mentioned fields arises.
        The base consists of 18 main fields, some of which diverge into several subfields. A sample of the database is included at the end of this text. Since most of the fields clarify their definition by their name, this text will only examine the problems in the definition of the fields and the errors made in the entering of data, followed by a presentation of the analysis of the distribution of data.

I. Errors made in the entering of data
        By making use of the advantages of computer technology, in order to establish the errors made in the entering of the data, we first began sorting the data in the fields. Many different errors instantly appeared.
In order to examine the types of errors, we separated them into two larger groups.

- Errors made by subjects
- Errors due to inadequate definition of the fields, i.e. categories

The errors made by subjects can be further classified:

- errors made by the collector
- incorrect duration of the sound samples
- incorrect listing of title data
- incorrect listing of order
- incorrect listing of personal data about the performers
- incorrectly registered structure of the performers’ ensemble

The errors made due to inadequate definition of the fields may also be divided into:

- errors concerning the performer’s personal data
- incorrectly registered structure of the performers’ ensemble
 

        1. Errors made by subjects

1.1. Errors made by the collector
        During the construction of the secondary database, the main source of information collected by Zhivko Firfov was the notebook he left together with his cassette collection. Several errors immediately appeared upon the sorting of the data. The collector made these errors while ordering the data.
        We divided these errors in two categories:
- incorrect titles
- incorrect entering of the performers’ personal data

1.1.1. Titles
        This category of errors includes errors made in the titles.
        For example, in number 335, the title “25 mina se madonci” is enclosed, whereas it is evident that an error has been made. After re-listening to the song, one can clearly hear the text “Dvaesetpetmina se mladi momci”.
 


        The sound sample number 401 carried the name “Grushka shekjer” by Firfov, even though it is evident that the title is “Grutka shekjer”.


 
 
        1.1.2. Incorrect filing of the performers’ personal data
        In the performers’ data field, one performer has been noted in different ways in five different places.

 

Song no. 820: Sarievski Aleksandar


Song no. 1052: Aco Sarievski



Song no. 739: stolevska Vera


Song no. 1211: Vera Stoleva

        Thus the question how to note the names in cases such as the above mentioned:

- to leave the names as different ones (Stoleva/Stolevska), or
- to note them in one way, as one name, since we know that it is the same performer

        While solving this problem, we took the following steps:

- if we knew the correct name of the performer from other sources, we noted its official name (for instance, Sarievski’s official name is Aleksandar, not Aco)
- if the performer was unknown to us (for instance, Vera Stoleva or Vera Stolevska), we left it as it was originally noted.

        The next problem we faced while entering the data about the performers was writing the abbreviations. In the sound samples no. 80, 82, in the instrumental group column, the following abbreviations appeared ­ “Rasp. Gostivarci” and “Rasp. Kavadarchani”.
 


        In this case we faced the problem of leaving the name of the instrumental group as the collector had originally written it, or writing it in its full version.
        We were certain in the meaning of the abbreviations and we concluded that these abbreviations should be corrected and the full version should be used, which we did. We consider this an alleviation in the further search and analysis of the data.
        In order to achieve facile search of the database, we used the following order for all the names: 1) surname, 2) name, regardless of the order in Firfov’s notebook.

1.2. Duration
        The sorting of this field revealed the errors made in the entering of data in the previous two fields: time of beginning and ending. In fact, after the formatting of this database, this field functioned like a formula which gave the results between the differences of the previous two fields. After further inspection and listening, these errors were corrected.

1.3. Titles
        As in the errors in the collector’s sound sample data, we found errors in the titles. These errors were made while they were being entered in the database.
        The sound sample no. 409 has been noted as “Done, Done, semln Done”, and in Firfov’s notebook the title of this song is “Done, Done, sejmen Done”.


       The Firfov Collection includes sound samples not registered by Firfov. In this case, errors were made when the title was put into the “Collector’s title” field, instead of the “Other title” field.
        For instance, no. 248 in the field “name of collector” carries the registered title “Bukite razvivaat brakja, gore na balkanot”, whereas in Firfov’s notebook it hasn’t been registered at all.

1.4 Order
        The greatest number of errors were found in the order of the sound samples, where one sound sample had been registered in the database twice.  Thus, for instance, the sound samples from C 25 ­ 1 (cassette 26, first side) were listed as continuations of C 24 ­ 2/1, and later repeated in the C 25 ­ 1 section, where they actually belong. After correcting these errors, the overall number of sound samples decreased by 19 units.
 




K 24 - 2/1
K 25 - 1

        The second instance of double listing of a same sound sample is when a sound sample has not been recorded fully at the end of one cassette side, and continues on the other side where it had been stopped.
For instance, the sound sample no. 574 at the end of C 21 ­ 1 is not fully recorded, and continues on the other side (C 21 ­ 2). During the registering of data the number of order had been repeated. Both sides read that the sample’s number is 574.
 



       In such cases, should the sample be listed with the same number of order, or should the continuing part be listed as a new number? On the other hand, it is nevertheless a separate sound sample, and it occupies new space on the cassette, or the CD.
We believe that it should be listed as a separate number, and note in the “comment” field that it is a continuation of the previous song. After the correction of this error, the overall number of sound samples increased by one unit.

        1.5 Personal data about the performers
        In the listing of data in the database, we next encountered the problem of unintelligible handwriting.
For example, in the collector’s notebook, the surname of the performer in the sound samples no. 290, 291, 288 and 289 is unintelligible.
 

        When listing the data, we used a question mark in the place of the unintelligible letters. Of course, the discussion remains of whether to omit the name of the performer or the data completely, and write a comment in the “comment” section that this information was unintelligible.
        In the sound samples where there is more than one soloist, and Firfov noted only some of them, in the subcategory of the field “performer” ­ “name/vocal soloist” we only registered the existing information, and added a question mark next to them so as to signify that there is missing information.
        Such is the case with samples no. 57, 60 and others, where the field “structure” and the recording itself proves that there are two performers, but only one of them is listed, in this case Nikolovski Blagoja.
 



The “age” column (age of the performer) includes the dates of birth of the performers. This data only partially exist in Firfov’s notebook.


        The errors we faced while examining the database were related to the field -  the age of the performer, where two different ages for one performer were noted. In cases such as these, we erased the question marks and replaced them with “born” for the adequate year
Such are the examples no. 206/7,8, 9, 10 and 211:
 



       In fact, this is a result of incorrect registering of the informer’s age by the collector. The recording itself proves that the informer mentions another age. Thus, the subject registering the data noted both ages. We consider this an error made by the subject, because the incorrect age (now noted in parentheses) should be noted in the comment column. Such registering of data may create confusion in the further analysis of information.
        In this field we also encountered registering of data about the date of birth of the performer. This data doesn’t belong in this section, since the field should only mention the age of the performer at the time of recording. Apart from this database, there is another secondary database with the title “individuals”, which is actually a detailed database on the performers in the Firfov Collection. In any case, in the future this field will have to be reorganised, and all the data will be included in the field “age of the performer”.

1.6. Incorrectly registered structure of the performers’ ensemble
        This field included many errors in the registering of the structure of the performers’ ensemble. Thus, two performers accompanied by a folk instrument orchestra were incorrectly registered “1M + Folk” (meaning one male vocalist and folk instrument orchestra).

2. Errors in the defining of the categories and fields
        It was of course impossible to foresee the situations arising from the individual characteristics of the primary material during the formatting of the database. Thus we may assume that the sorting of data in the secondary base caused a number of errors to appear owing to the undefined content of the fields. We have already noted the two types of errors that frequently occur:

- errors in the performers’ personal data
- incorrect structure of the performers’ ensemble

        As we shall see later in this text, we had solutions for some of these errors and we used them in the correction of the database. However, we left the part concerning the performers as it is now, until a precise definition of the fields is brought.

2.1. Errors in the performers’ personal data
        The main problem in the definition of this field occurred with the separation of the categories “group” and “soloist”. Our sound samples often involve two singers performing in unison, in the same melodic line. This poses the question of whether this is solo or group singing. Since in such situations the collector noted the names of the singers, meaning he treated them as soloists, they were incorporated as such in the adequate field. This is a serious problem leading to a discrimination in terms of the distribution of the fields: “soloist” and “group”. Because several subjects participated in registering the data, an examination and correction of the inconsistent data follows.
The situation of two or more soloists performing with accompaniment (vocal group, orchestra, instruments, etc) is somewhat clearer. In this case, it is evident that the performers are soloists.
        On the other hand, the field “vocal/instrumental soloist” has not been precisely defined, for it often includes more than two names. When this field would be defined as “vocal/instrumental soloists”, we would again face the problem during the processing of data, because of the absence of a consistent principle in their sorting - especially when it comes to nominal variables.

        2.2. Incorrect structure of the performers’ ensemble
        This is probably one of the most unclear fields in the database in terms of definition. We faced many problems:
- registering the drone
- treating the issue group-soloist

2.2.1. Drone
        In the initial registering in the database, the drone samples were registered in the structure of the performers’ ensemble. This doesn’t correspond to the definition of this field, because the drone is not a characteristic of the structure of the performers’ ensemble, but a characteristic of the piece of music. Consequently, we decided to add another field indicating the presence of this category. This avoids discrimination of the results of the field.

2.2.2. Treating the issue group-soloist
        We have already discussed this issue in the unclear definition of the personal data fields, which include parentheses only further adding to the vagueness of the categories.
Thus, for example, the sound sample no. 226 has been registered:


        The recording clearly shows a female vocal soloist and a female group, i.e. two accompanying female vocalists.
We registered it as “F + FGr” (signifying female vocalist ­ soloist, and female group).
The problem of defining the category “group” arose yet again with this example.
 

II. Analysis of the distribution of data

        This text will only present the frequency distribution of data in the fields whose structure permits such analysis. The fields which created problems in terms of definition will be also omitted here. Their analysis will follow their precise definition and correction of their data. Thus, the following fields will be analysed:

- duration
- titles
- lyric incipit
- language
- author
- performer
- original recording
- digitisation
- comments

        After the last examination of the database and the elimination of the incorrect data, we are able to determine the complete number of sound samples in the Firfov Collection.
        The primary database contains 1366 sound samples. The last three do not belong to the Firfov Collection, but are radio shows dedicated to Zhivko Firfov and his life. Furthermore, one sound sample is related to the demonstration of the range and features of several Macedonian folk instruments. Thus, we can conclude that the base contains 1362 music samples.

1. Duration
        The digitised material in this collection amounts to 29.47 GB (31,646,244,866 bytes), distributed in 129 audio files, with duration of 48 hours, 36 minutes and 41 seconds. Nevertheless, this isn’t an accurate presentation of the duration of the sound samples, since the complete duration includes the pauses at the beginning and end of the sound samples, i.e. between them. Thus, apart from the complete duration of the entire audio database, we shall present the precise duration of the sound samples:

- 1366 sound samples with a duration of 47 hours, 20 minutes and 18 seconds
- 1362 music samples with a duration of 45 hours, 38 minutes and 18 seconds

        According to the music samples, the mean is 2,47 minutes, which shows that on average, one “song” lasts for 2,47 minutes. This indicates the fact that not all “songs” have been recorded with their complete text. In fact, an additional research related to these parameters indicates the deviations and variations would take us a step further in the statistic analysis of this database. In any case, at this moment we have grouped the music samples according to duration up to one minute, and duration up to twelve minutes. The following results were achieved:
 

 



3. Titles
        The field “titles” includes three subdivisions:
- author’s/performer’s title
- collector’s title
- other title
        The author’s/performer’s title had 13 entries in our database, proving insufficient for further analysis, especially owing to the fact that the collection includes songs bearing famous titles. In the field “collector’s titles”, where Firfov registered the titles, 1055 titles have been noted. Firfov did not note the titles of the remaining 307 sound samples.
        The last field “other titles” contains 86 entries, mainly coming from the individual knowledge of the subjects working on the secondary database.

4. Lyric incipit
        This field includes 1092 lyric incipit from a total of 1362 music samples. This is not the final number of music samples, i.e. vocal-instrumental, since some samples contain unintelligible lyrics. Upon comparing the fields “comments” and “structure” we concluded that ten music samples are commented as vocal samples with unintelligible lyrics. Four of these samples do not include any text in the adequate field. From the remaining 266 samples, 162 are vocal samples without registered lyric incipit and without explanations in the “comment” section, whereas 104 samples are instrumental music samples.
 



 



6. Language
        The examination of the distribution of data in this field proved that in the database 1230 music samples are registered as sung in a particular language. The distribution of the entered data contains 1177 sound samples in Macedonian, 22 in Aroumanian, 11 in Romany, 9 in Turkish, 5 in Serbian and 5 in Albanian. Bearing in mind that the total number of vocal-instrumental samples, as we have already noted, is 1258, data for 28 samples is missing. Because of the bad quality of recording, the language of singing is unintelligible.
 
 


7. Author
        Bearing in mind that the collection above all refers to music folklore, one should expect the samples to be of collective authors, as the definition of music folklore notes. However, the distribution of data proved that this field contains a number of original songs (24). The authorship was noted by Firfov (and registered with ZAMP), but the name of the author was not stated. We entered these songs in the column “unknown” author. Also, there is missing data for 42 samples.
 
 

 



8. Performer 
        We have already stated that serious problems appeared because of the inadequate defining of this field. Thus, it requires additional processing in order show the results of the distribution. As previously mentioned, the field “drone” was added to this database. By examining the distribution in this field, out of 1258 vocal music samples, 52 samples contained a drone.

9. Original recording
        This column contains several subdivisions referring to data concerning the original recordings: date of recording, location, method, environment, technology of recording, carrier, in which archive they exist, etc.
        Bearing in mind that part of the fields had identical data, the obtained results were constant:
- equipment of music recording (Philips Cassette Deck)
- carrier of the original material (compact cassette)
- speed
- mono or stereo 9 (S)
- sound engineer (Zhivko Firfov)
- producer
- director
- publisher
- title
- archive (FC)
- number (C1-1, C1-2, etc)

From the remaining data in this field we have data about:

- the date of recording of 40 music samples
- data about the location of recording of 104 music samples
- data about the environment of recording of 1241 music samples

The existing data yielded the following data:
 


 

 
 
 



 11. Digitisation
        This field involved constants as the filed of “original recording”, owing to the fact that the entire collection was digitised in IRAM, 2001. As shown, the following was used:

- Logic Audio 4.0 software
- Hard Disk and Compact Disk carriers
- format: SDII; 44.1kHz/16Bit; Stereo
- the collection is located at the IRAM archives
- Firfov’s notebook is an additional source for secondary data for this database.

        12. Comments
        The category of this field includes the comments of the collector, as well as the comments of the subjects entering the data in the database. All comments were in Macedonian in their initial state.
        In order to simplify the analysis of this field, we designed a code for part of the comments. In fact, we formed abbreviations from the comments in the English language. Nevertheless, we were not able to design codes for all comments (such as the ones that explain the song or informers in detail, or ones where Firfov was not certain of the title of the song and used a question mark instead, etc.). A brief outline of the abbreviations used follows:
 
 

Legend:

CC - collector’s comment
OC - our comment
IC - informant’s comment

CC:
v ­ village
- region
ens - ensemble
dc ­circular dance
instr ­ instrumental
rf ­ regional festival

OC:
ul ­ unclear lyrics
I - incomplete
mn ­ missing notes
cont - continuation
br ­ bad recording
conv - conversation
int - interruption
rep - repetition

IC:
IP before ­ informer presentation before the music sample
IP after ­ informer presentation after the music sample
 
 

Conclusion
        This study displayed the initial results from the distribution of the secondary data from the Firfov Collection. As presented, part of the fields in which we faced problems concerning the definition of their content have been set aside for further examination. Nevertheless, the existing results are sufficient for the presentation of the basic characteristics of the digitised primary audio material. They will help the further search of this base and location of the sections which might be of different theoretical interest.  Furthermore, they will serve as a basis for the formation of the Firfov Collection tertiary analytical database.