PAN L10n Wiki : Data

HomePage :: Categories :: PageIndex :: RecentChanges :: RecentlyCommented :: Login/Register
Oldest known version of this page was edited on 2008-07-28 05:57:35 by UdenSherpa []
Page view:
Collection Of Dzongkha Sentence Corpus
Collected about 40000 sentences

Phoneme Set Design
initial consonants: p, ph, b, t, th, d, k, kh, g, @, m, ny , ng, s, z, sh, zh, h, hh, w, r, l, y, ts, tsh, dz, c, ch, j
vowels: a, i, u, e, o, ue, oe, aa, ii, uu
clusters: dr, tr, thr, lhh
dipthongs: ai, ui, oi, au, ae, eu, ou, iu, ei, eo
final consonants: g, ng, n, b, m, r, l, p


Text Processing
Segment text into sentences, and then into syllables. Sort the text of 40,000 sentences into unique syllables. It comes around 4000 unique syllables.
Create a pronunciation dictionary of the 4000 syllables.
Map back the pronunciation dictionary with 40,000 sentences to get around 23,000 sentences with transcription.
Considering essential sentences for recording, we get 509 sentences, taking into account the possible diphones.
Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by Wikka Wakka Wiki 1.1.6.3
Page was generated in 0.1517 seconds