skip to content
A green form-based information extraction system for historical documents Preview this item
ClosePreview this item
Checking...

A green form-based information extraction system for historical documents

Author: Tae Woo Kim
Publisher: 2017.
Dissertation: Master of Science Brigham Young University. Department of Computer Science 2017
Edition/Format:   Thesis/dissertation : Thesis/dissertation : Manuscript   Archival Material : English
Summary:
Many historical documents are rich in genealogical facts. Extracting these facts by hand is tedious and almost impossible considering the hundreds of thousands of genealogically rich family-history books currently scanned and online. As one approach for helping to make the extraction feasible, we propose GreenFIE--a "Green" Form-based Information-Extraction tool which is "green" in the sense that it improves with  Read more...
Rating:

(not yet rated) 0 with reviews - Be the first.

Subjects
More like this

Find a copy online

Links to this item

Find a copy in the library

&AllPage.SpinnerRetrieving; Finding libraries that hold this item...

Details

Genre/Form: Electronic dissertations
Academic theses
Material Type: Thesis/dissertation, Manuscript, Internet resource
Document Type: Book, Archival Material, Internet Resource
All Authors / Contributors: Tae Woo Kim
OCLC Number: 990416368
Description: 1 online resource (xii, 44 pages) : illustrations (chiefly color)
Responsibility: Tae Woo Kim.

Abstract:

Many historical documents are rich in genealogical facts. Extracting these facts by hand is tedious and almost impossible considering the hundreds of thousands of genealogically rich family-history books currently scanned and online. As one approach for helping to make the extraction feasible, we propose GreenFIE--a "Green" Form-based Information-Extraction tool which is "green" in the sense that it improves with use toward the goal of minimizing the cost of human labor while maintaining high extraction accuracy. Given a page in a historical document, the user's task is to fill out given forms with all facts on a page in a document called for by the forms (e.g. to collect the birth and death information, marriage information, and parent-child relationships for each person on the page). GreenFIE has a repository of extraction patterns that it applies to fill in forms. A user checks the correctness of GreenFIE's form filling, adds any missed facts, and fixes any mistakes. GreenFIE learns based on user feedback, adding new extraction rules to its repository. Ideally, GreenFIE improves as it proceeds so that it does most of the work, leaving little for the user to do other than confirm that its extraction is correct. We evaluate how well GreenFIE performs on family history books in terms of "greenness"--How much human labor diminishes during form filling, while simultaneously maintaining high accuracy.

Reviews

User-contributed reviews
Retrieving GoodReads reviews...
Retrieving DOGObooks reviews...

Tags

Be the first.
Confirm this request

You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.

Linked Data


Primary Entity

<http://www.worldcat.org/oclc/990416368> # A green form-based information extraction system for historical documents
    a schema:CreativeWork, schema:IndividualProduct, schema:Book, pto:Manuscript, bgn:Thesis ;
    bgn:inSupportOf "" ;
    library:oclcnum "990416368" ;
    schema:about <http://experiment.worldcat.org/entity/work/data/4617714525#Thing/green_systems_self_improving_systems_data_extraction_regular_expression_generation> ; # green systems, self-improving systems, data extraction, regular-expression generation
    schema:author <http://experiment.worldcat.org/entity/work/data/4617714525#Person/kim_tae_woo_1981> ; # Tae Woo Kim
    schema:datePublished "2017" ;
    schema:description "Many historical documents are rich in genealogical facts. Extracting these facts by hand is tedious and almost impossible considering the hundreds of thousands of genealogically rich family-history books currently scanned and online. As one approach for helping to make the extraction feasible, we propose GreenFIE--a "Green" Form-based Information-Extraction tool which is "green" in the sense that it improves with use toward the goal of minimizing the cost of human labor while maintaining high extraction accuracy. Given a page in a historical document, the user's task is to fill out given forms with all facts on a page in a document called for by the forms (e.g. to collect the birth and death information, marriage information, and parent-child relationships for each person on the page). GreenFIE has a repository of extraction patterns that it applies to fill in forms. A user checks the correctness of GreenFIE's form filling, adds any missed facts, and fixes any mistakes. GreenFIE learns based on user feedback, adding new extraction rules to its repository. Ideally, GreenFIE improves as it proceeds so that it does most of the work, leaving little for the user to do other than confirm that its extraction is correct. We evaluate how well GreenFIE performs on family history books in terms of "greenness"--How much human labor diminishes during form filling, while simultaneously maintaining high accuracy."@en ;
    schema:exampleOfWork <http://worldcat.org/entity/work/id/4617714525> ;
    schema:genre "Academic theses"@en ;
    schema:genre "Electronic dissertations"@en ;
    schema:inLanguage "en" ;
    schema:name "A green form-based information extraction system for historical documents"@en ;
    schema:productID "990416368" ;
    schema:url <http://hdl.lib.byu.edu/1877/etd9266> ;
    wdrs:describedby <http://www.worldcat.org/title/-/oclc/990416368> ;
    .


Related Entities

<http://experiment.worldcat.org/entity/work/data/4617714525#Person/kim_tae_woo_1981> # Tae Woo Kim
    a schema:Person ;
    schema:birthDate "1981" ;
    schema:familyName "Kim" ;
    schema:givenName "Tae Woo" ;
    schema:name "Tae Woo Kim" ;
    .

<http://experiment.worldcat.org/entity/work/data/4617714525#Thing/green_systems_self_improving_systems_data_extraction_regular_expression_generation> # green systems, self-improving systems, data extraction, regular-expression generation
    a schema:Thing ;
    schema:name "green systems, self-improving systems, data extraction, regular-expression generation" ;
    .


Content-negotiable representations

Close Window

Please sign in to WorldCat 

Don't have an account? You can easily create a free account.