Accession Number ADA566559
Title Creating, Using and Updating Thesauri Files for AutoMap and ORA.
Publication Date Jul 2012
Media Count 60p
Personal Author A. Sangal K. M. Carley M. K. Martin N. Altman
Abstract AutoMap (1) is text analysis software that performs Network Text Analysis by running an automated process on a corpus of raw text data to generate one or more meta-networks which include the nodes and links representing relations among entities described. Automap uses thesaurus files (1) when creating meta-networks. These thesaurus files are list which allows the association of words or phrases found in texts with abstract concepts and/or node classes used in the extracted meta-networks. Over time, a large number of thesauri have been created. Many of the extant thesauri contain entries that are relevant to new text analysis projects. But thesaurus re-use is difficult due to the number of thesauri. In this report, we describe one approach to making thesaurus re-use easier by combining and reconciling multiple thesauri into one under user control. With this approach, the process of creating a Meta network out of a raw corpus of text data is more efficient and the user is able to perform a more accurate analysis of the Meta network, as the individual thesauri files can be merged to create a single and large Universal or Master Thesaurus containing all the general abstract concepts, along with several different Domain-specific thesauri. In the following report, we first discuss the differences between a Universal thesaurus and the domain or the project specific thesauri. We then go on to discuss the evolution in the formats of the thesauri used by AutoMap, followed by a discussion of the standard Dynamic Network Analysis (DNA) meta-ontology (1). We then detail the process used to create a single universal/master thesaurus and several different Domain thesauri. The process involves a mix of two major processes which we refer to as the Split routine and the Merge routine. We shall discuss the Split routine and the merge routine algorithm along with the process that has been used to merge and create a single thesaurus file by combining a large number of thesauri files.
Keywords Algorithms
Domain thesaurus
Network analysis(Management)
Nta(Network text analysis)
Software tools
Universal thesaurus

Source Agency Non Paid ADAS
NTIS Subject Category 62B - Computer Software
Corporate Author Carnegie-Mellon Univ., Pittsburgh, PA. Inst. of Software Research Internat.
Document Type Technical report
Title Note Technical rept.
NTIS Issue Number 1307
Contract Number N00014-08-1-1223 N00014-09-1-0667

Science and Technology Highlights

See a sampling of the latest scientific, technical and engineering information from NTIS in the NTIS Technical Reports Newsletter

Acrobat Reader Mobile    Acrobat Reader