在线查找
与资料的链接
在图书馆查找
正在查找有这资料的图书馆...
详细书目
| 材料类型: | 文献, 硕士论文/博士论文, 政府刊物, 州政府或者省政府刊物, 互联网资源 |
|---|---|
| 文件类型: | 互联网资源, 计算机文档 |
| 所有的著者/提供者: |
Dennis V Pereira |
| OCLC号码: | 56569925 |
| 注意: | Title from electronic submission form. Vita. Abstract. |
| 详述: | System requirements: PC, World Wide Web browser and PDF reader.; Available electronically via Internet. |
| 责任: | Dennis V. Pereira. |
摘要:
With the growing number of textual resources available, the ability to understand them becomes critical. An essential first step in understanding these sources is the ability to identify the parts-of-speech in each sentence. The goal of this research is to propose, improve, and implement an algorithm capable of finding terms (words in a corpus) that are used in similar ways - a term categorizer. Such a term categorizer can be used to find a particular part-of-speech, i.e. nouns in a corpus, and generate a lexicon. The proposed work is not dependent on any external sources of information, such as dictionaries, and it shows a significant improvement (30%) over an existing method of categorization. More importantly, the proposed algorithm can be applied as a component of an unsupervised part-of-speech tagger, making it truly unsupervised, requiring only unannotated text. The algorithm is discussed in detail, along with its background, and its performance. Experimentation shows that the proposed algorithm performs within 3% of the baseline, the Penn-TreeBank Lexicon.
标签
添加标签 为 "Automatic lexicon generation for unsupervised part-of-speech tagging using only unannotated text".
争取是第一个!
