PAN L10n Wiki : Training

HomePage :: Categories :: PageIndex :: RecentChanges :: RecentlyCommented :: Login/Register
Speech Database
The 509 essential sentences are recorded at 44 khz and 16 bits.

Auto Labelling files
We create mono.dic, word.mlf.

Mono.dic is created from the syllable dictionary and is of the format:

!ENTER $0
!EXIT $0
(དང) d0 a0 ng0
(ལུ) l0 u0
(མི) m0 i0 ...................................

Word.mlf are syllables from the 509 sentences and is of the format:

#!MLF!#
"*/001.lab"
!ENTER
(མི)
(འཕེལ)
(འཕེལཝ)
(ནོར)
(སེམས)
(ཅན)
(འཕེལ)
(འཕེལཝ)
!EXIT

The file mono.dic was generated from the syllable dictionary and word.mlf from the 509 sentences. Hence mono.dic would contain all 4000 syllables along with their corresponding transcription and tone.
Word.mlf would contain all 509 sentences with their filenames, syllables of each sentence bounded within a bracket.

A tool called HLEd was used, which is an HTK tool to generate mono.mlf from the two files (mono.dic and word.mlf).
Mono.mlf is in the format:
#!MLF!#
"\*/1.lab"
$
m0
i0
x0
ph0
e0
l0
ph0
e0

These three files are used for auto labelling along with 509 recorded wav files to generate mono and full label files. Mono label files has phones with their duration and full Label files are phones with duration and context.

Parameter Generation
The Mel-cepstral coefficiant(mcep) and Log-fundamental frequency(lf0) are gererated from raw audio files during the training process.

Question Tree
The question files are files with phone and their context for clustering of similar phones based on a decision tree.

we now have speech database and labelling is also done. So in the training process, MCEP and LF0 parameters are extracted from this data and trained HMMs are generated.
Therefore the end of the training, results in a collection of tranined HMMs which are used to generate speech paramaters(MCEP, LF0 and duration). These are passed onto the MLSA filter to generate synthesized speech.


There are no comments on this page. [Add comment]

Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by Wikka Wakka Wiki 1.1.6.3
Page was generated in 0.1592 seconds