skip to content

丁炜炜

Overview
Works: 2 works in 3 publications in 2 languages and 6 library holdings
Roles: Author
Classifications: KNQ2720,
Publication Timeline
Key
Publications about 丁炜炜
Publications by 丁炜炜
Most widely held works by 丁炜炜
Xing zheng qiang zhi fa yan jiu ( Book )
2 editions published in 2003 in Chinese and held by 4 libraries worldwide
Weakly supervised part-of-speech tagging for Chinese using label propagation by Weiwei Ding( Computer File )
1 edition published in 2011 in English and held by 1 library worldwide
Part-of-speech (POS) tagging is one of the most fundamental and crucial tasks in Natural Language Processing. Chinese POS tagging is challenging because it also involves word segmentation. In this report, research will be focused on how to improve unsupervised Part-of-Speech (POS) tagging using Hidden Markov Models and the Expectation Maximization parameter estimation approach (EM-HMM). The traditional EM-HMM system uses a dictionary, which is used to constrain possible tag sequences and initialize the model parameters. This is a very crude initialization: the emission parameters are set uniformly in accordance with the tag dictionary. To improve this, word alignments can be used. Word alignments are the word-level translation correspondent pairs generated from parallel text between two languages. In this report, Chinese-English word alignment is used. The performance is expected to be better, as these two tasks are complementary to each other. The dictionary provides information on word types, while word alignment provides information on word tokens. However, it is found to be of limited benefit. In this report, another method is proposed. To improve the dictionary coverage and get better POS distribution, Modified Adsorption, a label propagation algorithm is used. We construct a graph connecting word tokens to feature types (such as word unigrams and bigrams) and connecting those tokens to information from knowledge sources, such as a small tag dictionary, Wiktionary, and word alignments. The core idea is to use a small amount of supervision, in the form of a tag dictionary and acquire POS distributions for each word (both known and unknown) and provide this as an improved initialization for EM learning for HMM. We find this strategy to work very well, especially when we have a small tag dictionary. Label propagation provides a better initialization for the EM-HMM method, because it greatly increases the coverage of the dictionary. In addition, label propagation is quite flexible to incorporate many kinds of knowledge. However, results also show that some resources, such as the word alignments, are not easily exploited with label propagation
 
Alternative Names
丁煒煒
丁維偉
Languages
Chinese (2)
English (1)
Close Window

Please sign in to WorldCat 

Don't have an account? You can easily create a free account.