|
A number of federal agencies, universities, laboratories, and companies are placing their collections online and making them searchable via metadata fields such as author, title, and publishing organization. Manually creating metadata for a large collection is an extremely time-consuming task, but is difficult to automate, particularly for collections consisting of documents with diverse layout and structure. Unfortunately a number of federal organizations such as DTIC, GPO, and NASA manage heterogeneous collections consisting of documents with diverse layout and structure, where existing approaches for automated metadata extraction do not work well. In this project, we are developing an automated process for metadata extraction for large, diverse, and evolving document collections.
|
|
|
Old Dominion University Digital Library Group. extract@cs.odu.edu
|
|