Corpus
Methods
The APriL corpus
The corpus contains three sets of speech data:
- Words with different prosodic structures
- Simple sentences in interactive play
- Controlled read sentences (adults only).
The data files are organized according to speaker and are presented in wav format, a format compatible with Praat and with most speech analysis applications.
Participants
We recorded 36 carer-child pairs: 12 English, 12 Catalan, and 12 Spanish. The children were aged 2, 4, and 6 years, and their mothers ranged from mid-twenty to early forty. The ages of the children were chosen so they fell into clearly differentiated developmental stages. The table below specifies the exact age of the children at the time of the recordings
Corpus 1: Words with different prosodic structures
The first corpus contains data elicited by means of a picture naming task, in which the prosodic properties of a range of lexical words was systematically varied.
Experimental materials and procedure
We controlled the prosodic composition of the target words, using words with different stress patterns, from S to SWSW, selected so they were imageable and familiar to young children. We used words such as “train” and “tren” (monosyllabic), “balloon”, “camión”, “camió” (WS); “baby”, “mono”, “mono” (SW), etc. (18 targets in all). The words were selected to be easy to pronounce and as comparable as possible across languages. Finally, V, CV and CVC, CCVC syllables were balanced across languages. The goal was to elicit utterances that were comparable across the three languages and, to the extent to which this is possible, to the adult target.
The data were elicited with a naming game, based on short, animated clips, shown on Powerpoint slides on a laptop screen. Mothers were given written instructions. They have to read a short story about a little fairy called Melanie who was looking for some objects and animals. The animations showed scenes, some with animals and some with everyday objects, that included the target word. The mother asked her child to name the target words by asking “What is Melanie looking for?” or “What is this?” and then praised the child for getting it right, and repeated what the child had said. If the child said a different word, as for instance “ball” instead of the target word “ballon”, the mother had to encourage her to try again until the child used the target word. The dialogue was modeled for her in each slide, with the target word highlighted in a different colour. A typical dialogue went thus:
- [mother] What is Melanie looking for?
- [child] The balloon
- [mother] Good! She is looking for the balloon.
- [mother] Can you find it? There! Well done
The recordings were made a quiet room in the home of the participants.
Corpus 2: Simple sentences in interactive play
The second corpus consists of simple SVO sentences produced by children and their mothers in an interactive game. To elicit Adult-Directed Speech (ADS), the mothers were recorded in the same role doing the task with the experimenter.
Experimental materials and procedure
The data consist of short question-and-answer dialogues, containing 23 short target utterances of around 10 to 14 syllables describing simple, everyday actions, which could easily be described in words that were highly familiar to the children. The utterances were elicited in a structured game, by means of computer-animated scenes that show a child manipulating an object or playing with another child. The scenes were depicted in animated clips shown on Powerpoint slides on a laptop screen.. For example, one scene showed a little girl blowing soap bubbles, another showed a little boy playing with building blocks.
The mother was instructed to ask her child to describe what was happening in each clip, then praise the child for getting it right, and repeat what the child had said. A typical dialogue went thus:
- Mother: “What’s happening here? What’s the little girl doing?”
- Child: “(She’s) blowing bubbles!”
- Mother: “That’s right! She’s blowing bubbles!”
The recordings were made a quiet room in the home of the participants.
Corpus 3: Controlled read sentences (adults only) [download data from here]
This data set was collected to examine the potential effects of syllable structure on the rhythmic differences between three languages that are reported to belong to different rhythmic classes (English: ‘stress-timed‘, Spanish: ‘syllable-timed‘, Catalan: ‘intermediate‘).
Experimental materials
The experimental materials used in this investigation are of three main types. First, a set of controlled materials, which consisted of 10 utterances per language which were matched for utterance length and syllabic structure. Half of them were composed of predominantly CV-type utterances and the other half predominantly closed syllables (or CVC and occasionally CVCC type syllables). All of these utterances were fairly well matched for number of syllables (from 13 to 19) and for segmental and prosodic composition (namely, number of stresses and pitch accents, and number of intended prosodic phrases). Second, a set of mixed materials that are representative of the target language. For this, we employed the same sentences used by Ramus et al. (1999).
You can see an example from each language, for each of the categories. Number of syllables appear in parenthesis.
- Predominantly open syllables
Cat: La mare de la Jana és de Badalona. (13)
Eng: The mother of Susana is from Badalona. (13)
Span: La madre de Susana es de Badalona. (13) - Predominantly closed syllables
Cat: Els donuts d‘Amsterdam són realment internacionals. (15)
Eng: These doughnuts from Amsterdam taste almost exceptional. (14)
Span: Los donuts de Ámsterdam són realmente internacionales. (15) - Mixed
Cat: Ell mai va tenir la possibilitat d’expressar-se. (15)
Eng: A hurricane was announced this afternoon on the TV. (16)
Span: Se enteraron de la noticia en este diario. (14)
Subjects and recording procedure
A total of 24 speakers read the 30 target utterances at a normal speech rate: 8 Southern English speakers, 8 Central Peninsular Spanish speakers from the Madrid area, and 8 Central Catalan speakers from the Barcelona surroundings. All participants in this study were female speakers between the ages of 28 and 40.
Recordings were made in a quiet room in the participants’ homes in Cambridge, Madrid and Barcelona, respectively, using a Marantz PMD660 recorder and Shure PG81 microphones for the Spanish and Catalan recordings, and a Tascam HD-P2 recorder with AKG C3000B microphones for the English recordings. Subjects were given time prior to the recordings to read the sentences. When errors or hesitations occurred during the readings, subjects were asked to repeat the tokens at the end of the session.





