Branka Kostic

The Categorisation of Ethnomusicological Data
in Multimedia Databases

The continuing presence of the problem of the categorisation of ethnomusicological data also refers to the contemporary digital multimedia bases. In fact, these bases, which allow different type of data to be searched and connected, owe their existence to contemporary digital technology.

 Likewise this refers to the uncovering of new perspectives in the archiving and searching of ethnomusicological data. Thus, determining the structure of each database and defining its elements is of crucial importance to its future use and value. Therefore, this essay will focus on this problem - the categorisation of ethnomusicological data in the multimedia bases.

 As a matter of fact, this interest originates from the practical knowledge we attained while structuring the multimedia database of the Firfov Collection. This somewhat massive project, which still carries on, began at the School of Music in Skopje in 2001 (Kostic 2001, 2002).

 The arival of the digitised record data and other data from this collection demanded a clearly-defined methodology and a precise definition of the data fields entering the multimedia base. After the architecture of the base and its components was set and worked through, the process of entering the data also began (Kostic 2002:80).

 While defining the macrostructure of this multimedia base, two things were taken into consideration:

- the features of the archived material;
- the technical abilities and capacities of the medium in which the archiving will take place.


Therefore, the previously-mentioned Firfov Collection multimedia base contains the following sections:

- audio section (where primary data of strictly sonic nature is stored);
- text section (where secondary data of descriptive (textual) nature is stored);
- graphic section (where secondary data of visual nature is stored).


 Even though it was initially thought that all the sections of the base are of equal importance, the audio is  de facto its basic part and the reason for its formation. The remaining sections are merely its attributes, or spring from some analytical steps in the processing of the archived material. In this new approach, guided by the recommendations of  IASA 2001, the data can be classified into primary (data that exclusively refers to the audio, i.e. the strictly sound aspects of the works), and secondary (the remaining data, i.e. metadata for the primary data). According to IASA 2001, the secondary data can have many forms (text, music and video graphics), and together with the primary data forms the concept of  ‘cultural inheritance’. The secondary data in some cases is a part of the work itself (for example, the sticker of a CD), while some require additional compiling. The importance of the secondary data depends on the content, type of carrier and the future needs of the users, i.e. the use (IASA 2001).

 Because the categorisation into primary and secondary data developed from the practical use of the archives of cultural inheritance, we developed this concept further - we included part for analytical data. Even though the Firfov archives date data with analytical features, it is generally placed in the secondary data group. Besides archiving, a theoretical interest in the processing of primary data persists. This noted a need to create a third category of data known as tertiary data. (Kostic 2002:64-5).

 Having in mind the size of our methodological task, to date our interest has been focused solely on defining the fields of the textual section of the database. The next part of this essay will present the most important discoveries and conclusions made during the process of entering the textual data in the Firfov Collection.
 

The categories of textual data

 Defining the fields of the textual bases is a result of the need to organise them in a way which would ensure wide searching and identification of the entered material.

 Because the textual data is basically a part of the secondary data, it is also a type of metadata. In digital archiving, metadata means data about data, i.e. detailed and specific expansion of cataloguing practice ( IASA 2001; Buzarovski 2002). According to IASA, metadata plays a vital part in the use and control of the digital collections. Therefore, its preservation should be the key component in the handling of any digital collection.*

 When defining the fields of the textual bases, one must keep in mind their compatibility and ability of conversion in other formats, such as network use (for example Internet). We therefore decided to start from the already existing systems of global standardisation of the metadata. We selected the Dublin Core as the most widely used and accepted system.
The Dublin Core system was invented by the DCMI organization (DCMI Dublin Core Metadata Initiative). Its central task is to develope a modus operandi which will ease the searching of data in the systems of artificial intelligence (www.purl.oclc.org/metadata/dublin_core). The elements of the Dublin Core comply with the standards of vertical specific semantic information on the WEB-based resources.
Thus the definition for the Dublin Core:
“Metadata used to supplement existing methods for searching and indexing WEB-based metadata, regardless of whether the corresponding resource is an electronic document or a ‘real’  physical object”.
 For this purpose, a group of 15 elements was produced (DCMES Dublin Core Metadata Element Set), which are basically descriptive semantic definitions (www.purl.org/metadata/dub-lin_core_elements). The elements are made to suit a wide range of fields and purposes. The elements of the Dublin Core are mainly general. This ensures an easy conversion into this system and to the bases with multiple elements into the Dublin Core. In that case, one field from the Dublin Core incorporates many fields from the other bases.

 Having this as our starting point, we stepped forward into defining the structure of the textual base of the Firfov Collection, leaving aside the question of its software format for the time being.

 Initially, the entire textual base of secondary data was conceived as a file. Experience proved the need for its division into several parts. In this way we avoided the unnecessary multiple entry of the same data. Thanks to the computer format, these parts are mutually connectable, i.e. the data can be read from one part to the other, if the need arises.

 Hence our structural division of the textual section of the multimedia base:

 - a secondary data file referring to the audio-files;
 - a secondary data file for the persons whose names appear in the base;
 - a tertiary analytical data file for the archived works.


 When defining the names of the fields we had in mind that English is unquestionably the lingua franca of today's world. We therefore decided to use it exclusively within the creation of the base.

 In accordance with what we already mentioned about the tertiary data, the development of this file was postponed, whereas the structure of the first three types of files was completely processed.

The secondary data file for digitised recordings

 The part which contains the audio files data marks 18 categories. The fields cover different types of data and are found in different formats: numbers, dates, names, titles, etc. In orders to economize on space, some data was coded.

 The first field defines the identification number of the corresponding audio-file. Three fields follow which determine its time components: time of beginning and termination, time duration and markers. Markers are of particular importance because of the fast searching and identification of specific works, their parts, etc.

 The following two fields textual and melodic beginning, also relate to identification, since they allow precise definitions of a specific piece and its separation of other pieces, or variants.

 The eighth field covers data about the language in which the vocal and vocal-instrumental pieces are performed.

 The following three categories (from 8 to 11): Author, Arrangement and Performer, study the so-called creative subject, i.e. the subjects (individuals) who took part in the creation of the pieces. As we already mentioned, this type of file does not process the data about the individuals who are in some way connected to the archived material (such as authors, recorders, researchers etc.). Its main purpose is to present the basic data, according to which links with the files containing data about the individuals would be created. Nevertheless, the need for more accurate defining of the more important data resulted in the development of several subcategories. For instance, the category of age was introduced, in order to record the age of the performer at the time of the performance.

The twelfth field Original recording, i.e. its subcategories, plays the role of yielding information about the details of the original recordings, i.e. the digitised and archived materials. As far as the methods of recording are concerned, we will mention that we used the experiences of Dr. Dietrich Schueller of the Phonogram Archive in Vienna.

 According to Schueller, the gathering of primary data, i.e. the process of recording itself may be conducted in an explorative or documented manner (Schueller 1993:77-8). The explorative method denotes the outdoor or studio recording in which the performers perform a specific task, such as a piece of work from a different genre. The documented method refers to the recording of real events (customs, festivals, etc.). Later on in the Phonogram Archive in Vienna, these two categories develop into three: explorative, actual and simulative recording. Actual recording refers to the direct recording of events (festivals, concerts, fairs, customs, etc.), whereas simulative recording refers to the performers simulating a specific event (Buzarovski 2002:9-10).

 The next category of data, titled Score, refers to the score of the piece. Its subcategories contain the information about the author of the score, the editor, the publisher, its date of printing and re-printing, etc.

 The following group yields complete information about the process of the digitisation of folklore recordings. This category is similar to the one which unites the data related to the original recordings, but has modifications caused by the characteristics of digital technology.

The fifteenth category in this file of the multimedia base covers the details of copyrights.

 The sixteenth category Additional Materials gives information about the existence of some extra materials in the archived collection, which are not of an auditive nature.

 The seventeenth category - Notes, provides space for entry of data important for future research, that have not been included in the previous fields.

 The last category of this file is reserved for the names of the individuals who completed the entry of data.

The file for individuals

This section of the base covers all the subjects that in one way or another attributed to the process of the creation, performance, recording, processing or analysis of the material in the base. It is composed of 19 categories, containing data about: the name and age of the interpreter, his/her nickname, the ensemble in which he/she takes part or directs, sex, place of birth, ethnicity, religion, native and other languages in which the individual creates, level of education, profession, place of birth, upbringing, and current place of residence of the performer, his/her parents or ancestral heritage, additional comments, special notes and data for the archiver.

Data for the digital copies

 The third model covers the digital copies. It consists of seven fields, most of which are covered in the first file. Here we find files in which data about the authors of the pieces, the duration of the songs and their location in the collection and the audio file is entered. Besides this, data files about the digital copy are incorporated (such as the type and number of carrier of the audio file and the number of the audio file of the specific carrier).

Microsoft Excel software was used in order to realise the physiognomy of the textual section of the multimedia base. During the design of the models we ensured they were concise and clear for the potential users.

 The categorisation of the ethnomusicological data in the multimedia bases should be perceived as an open process, which demands constant redefining and adjustment. Having in mind that the Firfov Collection textual data base is the first of its kind here in the field of ethnomusicology, we might expect further work on its development.
 
 

References: