This study is part of a larger longitudinal research project on mother-infant interaction during the first two years of life. Twenty-nine mothers were recruited during childbirth information sessions for pregnant women at two different academic hospitals (XXX and XXX). Interested candidates were contacted after childbirth and asked if they still agreed to participate in the study. After signing an informed consent, mothers were invited at the university for audio and video recordings of free play sessions when their infants were 3, 6, 9, 12 and 18 months old. The recordings took place when the infant was in an awake and alert state. For the current study, we randomly selected the recordings of 15 mother-infant pairs that were taken when the infants were 3 months old. All infants were healthy, first-born children. They were full-term born and had a 1-minute Apgar score of 7 or more. At the time of the first recordings, all 15 mothers were between 25 and 33 years old (M = 29,53y, SD = 3,48y). All mothers were native Dutch speakers. Except for one, all of them received at least three years of higher education. The mean number of total years of education was 15,60 years (SD = 2,29y).
For the audio and video recordings of the mother-infant play sessions, a specific observation laboratory was designed at XXX (Jiang, Geerinck, Patsis, Kog, Loots, Verhelst & Sahli, 2007). Four pan-tilt-zoom dome cameras (Panasonic WV-CS950) and two clip-on directional microphones (Shure Microflex MX184) were used. The PC displayed a 4-way split-frame. The audio-analyses were made by two professional musicians with the help of two frequency-analysis programs, Cool Edit Pro version 2.1 (Syntrillium Softwares) and Melodyne version 220.127.116.11 (Celemony Softwares GmbH).
Observation sessions at the laboratory were scheduled no longer than 10 days after the infants became 3 months old. A session consisted of a semi-structured interview about pregnancy and the first three months of motherhood, a Bayley Scales of Infant Development, Dutch version (BSID-II-NL; van der Meulen, Ruiter, lutje Spelberg & Smrkovsky, 2000), and at 12 and 18 months also a Mc Arthurs’ Communicative Development Inventories, Dutch version (N-CDI; Zink & Lejaegere, 2002). After testing, mother and infant were observed in free play sessions in another part of the room. For the recordings at 3 months, the mother sat on a chair and the infant was placed in an infant seat on a table. Play sessions lasted 15 minutes: 5 minutes without a toy, 5 minutes with a standard toy (according to the age of the infant) and 5 minutes with a toy the mother had brought from home. For this study we only used the recordings of the condition without a toy. Every dyad got one minute of interaction playtime before cameras started recording to give them time to habituate to the lab situation. During the play sessions the researcher sat in an adjoining room in order to follow the video and audio recordings. There were no interruptions except to give instructions for the next 5-minute condition. After the play session mother and infant were invited to the control room to watch and comment on the recordings.
Every vocal sound of the infant (i.e., when the vocal cords vibrated, producing a definable sound that could be reliably translated to a pitch) was selected as meaningful for analysis. Sounds produced by the infant like breathing, coughing, sighing and clacking of the tongue that could not be translated into a pitch and did not require activity of the vocal cords were excluded from analysis. Some vegetative sounds that were not meant as vocalisation but to which a pitch could be easily attributed (as was often the case when the infant hiccupped) were included when the sound was followed by a response of the mother adapting her vocalisation to the pitch of the infant’s involuntary sound.
The vocalisations of the infant were expanded by the preceding and/or succeeding vocalisations of the mother to define interaction moments of mutual vocalisations. For this we began by analysing the vocalisations of the mother before and after the selected utterances of the infant. In these mutual vocalisations we searched for a tonal context in terms of harmonics or pentatonic series. When we found a tonal context that preceded the utterance of the infant we analysed backwards and took the first tone of the tonal context as the onset of the interaction moment. When we found a tonal context that succeeded the utterance of the infant we analysed forwards until this tonal context ended. We took the last tone of the tonal context as the offset of the interaction moment. These ‘Tonal Interaction Moments’ (TIM) were temporally marked out. It is worth notice that these TIMs did not always appear analogous to the grammatical structure of a sentence. In other words, the grammatical beginning or ending of a sentence did not automatically correspond with the onset or offset of a TIM.
A general speech pattern is relatively easy to transcribe into discrete notes because of the natural segmentation into syllables. However, ID-speech and some utterances of the infants often show a continuous course (i.e., a slide, glissando, portamento or fluctuation) that lends itself less to segmentation in view of a reliable and discrete pitch transcription. Therefore it was necessary that professional musicians listened to these utterances in order to deal with less clear segmentations. Two musicians trained themselves with the help of two frequency-analysis programs, Cool Edit Pro and Melodyne, and reached an inter-rater reliability of 91,01 (Cohen’s κ). Each of these programs offers their own merits. The program Cool Edit Pro yields a visual representation of the acoustic signal in a spectrogram obtained by a Fourier analysis. A detailed time-course of frequency is offered by providing precise temporal information regarding the onset and offset times of interaction moments. By this it is possible to define precisely the duration of every tonal interaction moment and the duration of the pauses between them. Melodyne has different capabilities, offering a visual display of the fundamental frequency in an acoustic signal over time and some energy-related information as a calculated mean pitch based on volume-course and duration. The visual display gives a clear view of the pitch at the onset and offset of an acoustic signal and of the most important possible discrete ‘pitch-stations’ in a continuous course pattern. We want to emphasize that every sound a human voice is expressing, even when produced by a professional singer, always contains a natural fluctuation (Seashore, 1967). In our study this fluctuation often appeared in the form of a very fast glissando, initiating or expanding the salient pitch of an acoustic signal. In using the Melodyne software, we included this salient pitch for analysis (inter-rater reliability of 91,01 (Cohen’s κ)). However, we often observed slow glissandos that could not be considered a fluctuation around one salient pitch. Unlike Melodyne, which calculates the mean pitch of a whole glissando, we interpreted such a slow glissando as one that consists of two salient pitches: namely, the onset and offset pitches of the glissando (inter-rater reliability of 91,01 (Cohen’s κ)).
By assigning note-names we make use of the equal-tempered paradigm tuned to a standard pitch of 440hz for an A. This equal-tempered system does not correspond perfectly with the frequencies of a harmonic series. So when we want to include the harmonics in the equal-tempered system, an adjustment to the closest note-name is required. Just as with Cool Edit Pro, for assigning a note-name, we accepted a deviation up to 50 cents above or under a note. As such, it is important to mention that the difference between the exact value in a twelve equal tempered system and the just intonation in cents of a harmonic never exceeds the norm of 50 cents (see table 1). Strictly theoretically speaking, the A 440 standard should not be necessary in this study because we are looking for relations and ratios between pitches in terms of pentatonic scales and harmonics which implies a relative, and not an absolute, way of listening. Thus, in some cases, the relative distance between two notes perceived by ear did not correspond with the translation from pitch to note-names of the computer programs. For instance, when a frequency analyzer registers a very low G#(-48) followed by a very high C#(+48), the program will note a fourth (G#-C#) instead of a fifth (G-D) that a musician would hear by ear. The interaction moments that showed this problem were all selected and presented to 12 professional musicians who were uninformed about the goal of the research. They could listen to the fragments as many times as they wanted and answered questions like: “Which distance do you hear: a second, third, fourth, fifth…”; “Do you perceive the following tones as the same?”; “What kind of chord do you hear?”. This meant that they were asked to listen in a relative and not absolute manner. Twenty out of 31 interaction moments for which the answers of at least 9 out of 12 musicians agreed were included for further analysis. The other eleven interaction moments that did not reach this consensus were excluded from further analysis.
Tonal Interaction Moments (TIM)
Of the total 558 interaction moments studied, 470 (84,23%) interaction moments contained clear tonal aspects. We referred to these as ‘Tonal Interaction Moments’ (TIMs). Within these TIMs, we obtained two categories: (a) TIMs based on the harmonic series (334 or 71,06%) (see examples in Figure 3) or (b) TIMs based on a pentatonic series (136 or 28,94%) (see examples in Figure 4). These TIMS often contained absolute and relative pitch/interval imitations. We will discuss them in more detail below.
Other Interaction Moments
Of the total 558 interaction moments studied, 88 (15,77%) interaction moments did not occur in a tonal context based on harmonic or pentatonic series. Fifty-four (10,31%) of these interaction moments contained absolute and relative pitch/interval imitations (see examples in Figure 4, 5 and 6). The other 34 interactions (6,1%) did not.