skip to content
Close Window

Please sign in to WorldCat 

Don't have an account? You can easily create a free account.

Automatic lexicon generation for unsupervised part-of-speech tagging using only unannotated text
ClosePreview this item

Automatic lexicon generation for unsupervised part-of-speech tagging using only unannotated text

Author: Dennis V Pereira
Publisher: [Blacksburg, Va. : University Libraries, Virginia Polytechnic Institute and State University, 2004]
Edition/Format: eBook : Document : Thesis/dissertation : State or province government publication : English
Summary:
With the growing number of textual resources available, the ability to understand them becomes critical. An essential first step in understanding these sources is the ability to identify the parts-of-speech in each sentence. The goal of this research is to propose, improve, and implement an algorithm capable of finding terms (words in a corpus) that are used in similar ways - a term categorizer. Such a term  Read more...
Rating:

Retrieving ratings and reviews data...  

 

Find a copy online

Links to this item

Find a copy in the library

Retrieving... Finding libraries that hold this item...

Details

Material Type: Document, Thesis/dissertation, Government publication, State or province government publication, Internet resource
Document Type: Internet Resource, Computer File
All Authors / Contributors: Dennis V Pereira
OCLC Number: 56569925
Notes: Title from electronic submission form. Vita. Abstract.
Details: System requirements: PC, World Wide Web browser and PDF reader.; Available electronically via Internet.
Responsibility: Dennis V. Pereira.

Abstract:

With the growing number of textual resources available, the ability to understand them becomes critical. An essential first step in understanding these sources is the ability to identify the parts-of-speech in each sentence. The goal of this research is to propose, improve, and implement an algorithm capable of finding terms (words in a corpus) that are used in similar ways - a term categorizer. Such a term categorizer can be used to find a particular part-of-speech, i.e. nouns in a corpus, and generate a lexicon. The proposed work is not dependent on any external sources of information, such as dictionaries, and it shows a significant improvement (30%) over an existing method of categorization. More importantly, the proposed algorithm can be applied as a component of an unsupervised part-of-speech tagger, making it truly unsupervised, requiring only unannotated text. The algorithm is discussed in detail, along with its background, and its performance. Experimentation shows that the proposed algorithm performs within 3% of the baseline, the Penn-TreeBank Lexicon.

Reviews

Retrieving WorldCat reviews...
Retrieving EMRO reviews...
Retrieving weRead reviews...
Retrieving GoodReads reviews...
Retrieving Amazon reviews...

Tags

Be the first.

Similar Items

Confirm this request

You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.