Oldest known version of this page was edited on 2008-07-28 05:57:35 by UdenSherpa []
Page view:
Collection Of Dzongkha Sentence Corpus
Collected about 40000 sentences
Phoneme Set Design
initial consonants: p, ph, b, t, th, d, k, kh, g, @, m, ny , ng, s, z, sh, zh, h, hh, w, r, l, y, ts, tsh, dz, c, ch, j
vowels: a, i, u, e, o, ue, oe, aa, ii, uu
clusters: dr, tr, thr, lhh
dipthongs: ai, ui, oi, au, ae, eu, ou, iu, ei, eo
final consonants: g, ng, n, b, m, r, l, p
Text Processing
Segment text into sentences, and then into syllables. Sort the text of 40,000 sentences into unique syllables. It comes around 4000 unique syllables.
Create a pronunciation dictionary of the 4000 syllables.
Map back the pronunciation dictionary with 40,000 sentences to get around 23,000 sentences with transcription.
Considering essential sentences for recording, we get 509 sentences, taking into account the possible diphones.