PAN L10n Wiki : Data

HomePage :: Categories :: PageIndex :: RecentChanges :: RecentlyCommented :: Login/Register
Collection Of Dzongkha Sentence Corpus
Collected about 40000 sentences

Phoneme Set Design
initial consonants: p, ph, b, t, th, d, k, kh, g, @, m, ny , ng, s, z, sh, zh, h, hh, w, r, l, y, ts, tsh, dz, c, ch, j
vowels: a, i, u, e, o, ue, oe, aa, ii, uu
clusters: dr, tr, thr, lhh
dipthongs: ai, ui, oi, au, ae, eu, ou, iu, ei, eo
final consonants: g, ng, n, b, m, r, l, p


Text Processing
Segment text into sentences, and then into syllables. Sort the text of 40,000 sentences into unique syllables. It comes around 4000 unique syllables.
Create a pronunciation dictionary of the 4000 syllables.
Map back the pronunciation dictionary with 40,000 sentences to get around 23,000 sentences with transcription.
Considering essential sentences for recording, we get 509 sentences, taking into account the possible diphones.

There are 8 comments on this page. [Display comments]

Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by Wikka Wakka Wiki 1.1.6.3
Page was generated in 0.8661 seconds