Week 7: Annotation II
1 A somewhat quicker way to annotate Sound files
From the exercises in previous weeks, it has become clear to us that annotation can be a slow, painful process. There are, however, a couple of steps that you can take to make it easier:
- You can create a for loop that creates boundaries in your TextGrid every x number of (mili)seconds. You may either decide to space these boundaries evenly. This is what we learned last week. You could also put all the boundaries very close to each other at the begining or end of your script.
- You can make .txt files with transcriptions using a simple programme such as Notepad. If you already have one, then you can ask Praat to read these as a Strings list. Last week we saw that we can use a Strings list to feed our TextGrids with labels; today we will see a small variation of that trick. If you’re already wandering what the use of this is, see below...
- A Strings list with tokens in it may be used to be the interval texts for your TextGrids. Therefore, you can tell Praat not only to create boundaries for your TextGrid, but also fill in the intervals with the desired text.
2 Preliminaries: Feeding a .txt file to Praat as Strings list
Download from ILIAS the .zip file called samples (or click here to download), unzip it and put it in your Desktop or your desired
folder. Inside you’ll find six .wav files (corresponding to 6 different people reading the same text aloud: "the tiger tried to swallow all of the cheese at once, but it got stuck in his throat") and a file called annotation.txt. If you open this file, you will
find the following text:
The tiger tried to swallow all of the cheese at once, but it got stuck in his throat.
However, it will be written vertically, that is, one word per line, and without any orthographic symbols such as commas, full stops, etc.
Let's create a Strings list where the strings will be the words in the file annotation.txt. We type:
dir$ = "/home/fernanda/Desktop/samples/"
myann = Read Strings from raw text file: dir$ + "annotation.txt"
Remember that the first line is my own computer path. If you are still unsure about how to get the right path to your files, try opening one of the .wav files in the samples folder normally (i.e. by clicking in the Objects window). Then go back to your script and click Edit > Paste History. It will show there. If you have been clicking before, check the last line. If not, it will be the only line you'll see.
Now, let's go back to the Objects window. There will be a Strings object called annotation. Open it: it will be a list of strings with the words in the annotation.txt file, which we will use to annotate the Sound files.
3 TextGrid generation
3.1 Create a textGrid for each sound file
Now, right below the previous lines, we add the following bit:
wav_files = Create Strings as file list: "wavlist", dir$ + "/*.wav"
nowav_files = Get number of strings
for ifile to nowav_files
selectObject: wav_files
wavfilename$ = Get string... ifile
wav_file= Read from file: dir$ + "/" + wavfilename$
mytg=To TextGrid: "word phon vot", ""
endfor
If you go back to the Objects window, you will notice that each one of the three Sound files has now its corresponding TextGrid. They have three tiers, but they are empty.
3.2 Add boundaries to the TextGrid
We will fill the first tier with boundaries. We know that if we have an x number of words, the total number of boundaries that we need is x-1. This is what the following chunk does: it counts the number of words contained in the ObjectStrings myann
and stores it as the numeric variable numwords
, i.e. 18(lines 2-3), and then, lines 4-11 create a for loop for all numbers that are smaller than numwords
(i.e. 1 to 17) and places a boundary on Tier of the TexgGrid. It is important that you add the following chunk RIGHT BEFORE the
endfor line of the script above:
dur= Get total duration
selectObject: myann
numWords= Get number of strings
for iword to numWords
string$ = Get string: iword
if iword < numWords
selectObject: mytg
Insert boundary: 1, (dur/numWords) *iword
endif
selectObject: myann
endfor
Now, check the TextGrids. They should have 17 boundaries, which you can later move and adjust to match the Sound.
3.3 Add the text to the TextGrid
Now we are ready to put the text inside each interval in the TextGrid. Your script should now have two endfor lines at the end, one after the other. The following lines should be added BEFORE these two endfor lines:
selectObject: mytg
Set interval text: 1, iword, string$
selectObject: myann
Now look at the TextGrids. They should have the corresponding text inside each interval.
4 Boundary adjustment: Add a dialogue box
We can leave the script as is and take care of the boundary adjustment later, or we can add a little pause in our script so it gives us time to adjust the boundaries after each iteration. We add these lines BETWEEN the two endfor lines at the end of the script:
selectObject: mytg
plusObject: wav_file
View & Edit
pauseScript: "This is a pause in the script so you can adjust the boundaries. Click Continue when you’re finished."
minusObject: wav_file
5 Homework
Modify the script so that:
- The word "stuck" is extracted from all 6 sound files.
- Each instance of the word "stuck" gets its own TextGrid with one tier called
segment
. - Last week we created a for-loop using the command
Create Strings from tokens
. Create a Strings list like this, with the phonetic symbols for each sound in the word stuck (hint: the [ʌ] symbol is rendered with \vt). Add the line of code that creates this Object at the beginning of your script. - HINT: The chunk of script that you will be creating goes between the last
endfor
lines. - Feed each one of the TextGrids with the phonetic annotation contained in your
Strings tokens
object.