AutoMap: Version History

Version	Date	Description of Changes (new features, bug fixes, etc.)
3.0.10.18	August 29, 2012	Improvements to POSProcessing and the D2M-thesauri process.
3.0.10.1	May 16, 2012	Details are forthcoming.
3.0.10	December 16, 2011	An ability to extract files from an SVN repository, allowing for version controlled files to be automatically updated when doing processing. Users are able to use a version repository to keep track of changes. Routine to flatten directories and put all text files into a common folder. Operations to be done on the text in the main window in addition to filtering the words, to hide known entries already added to a thesauri, and to show tags based on ontology or part of speech. Capability to remove semantic networks from the meta-network. Ability to specify the temporary workspace location in the GUI. This allows the use of a RAM disk to significantly improve performance. Greatly enhanced operations to suggest thesauri using machine learning. Operations on Lexis-Nexus documents include extracting metainformation and also to rename files based on date for over-time analysis. Extract potential acronyms, and potential n-grams. Infer kinship.
3.0.8	June 15, 2011	Extract text from HTML or RTF documents. Merge hyphenated words that occur at the end of lines, typically as a result of OCR conversion. Produce a list of approximate names, to be reviewed to determine which are aliases. Ability to derole names, to remove titles, and to resolve names within a name thesauri. Split compound thesauri entries. Apply rules to determine ontological categories (such as a road being a location). An improvement to the thesauri merge program, to be able to merge in the master format, and to improve performance by reducing the number of intermediate steps.
3.0.7	March 9, 2011	Macros for preprocessing to clean files, which preserve syntax for part of speech extraction, and to prepare files, which does additional standard preprocessing operations, which will interfere with the part of speech of the sentences. Additional preprocessing operations: replace HTML symbols, fix common typos, convert british to american spelling, expand common constractions and abbreviations, remove noise words. Conversion of common compound concepts into n-grams. The n-grams are all definition altering compound concepts such as "first aid" and "civil war." Extraction of nouns and verbs. Suggested thesauri routines based on machine learning and based on part of speech. Context-sensitive stemming, to depluralize common nouns and detense common verbs. An attribute editor.
3.0.6D	February 7, 2011	Bug fix update.
3.0.6	January 3, 2011	Ability to split/recombine text files. This is for very large text files to be able to increase the amount of parallelism to improve performance. A CSV splitter, to split a CSV file into it structured and unstructured components. The unstructured data can be run through AutoMap and then ORA with its structured part being imported into ORA. A Lexis-Nexus splitter, to split the files downloaded from Lexis-Nexus which contain several documents, into its individual documents. The tools within AutoMap have been adapted for a new master format for thesauri and delete lists. The master format puts all generalization, metanetwork, and delete files into one single file to reduce the misalignment of entries.
3.0.5	September 1, 2010	Ability to change the default GUI color. Routine to be able to compare text files, which augments are ability to deduplicate text files including in-exact deduplication. A storage preference to allow users to default all output to a particular destination, and not having to specify a location for each output generated. Routine to extract actions (verbs) from text into a list. This is often used to find the "task" concepts. A concept list trimmer routine to cull from the concept list low frequency concepts. A location distillation procedure to extract and produce a location-based thesauri from a gazateer database.
3.0.3	June 11, 2010	AutoMap is now available as a 32-bit and a 64-bit executable. Capability to change heap size preferences for larger files. Ability to generate a concept network as well as a concept list. The concept network contains all nodes and no link. Links can be manually added via ORA. In addition, it is no longer necessary to generate an entire semantic network for the hot topics report. Revamped script runner program, to be able to use the new drag-and-drop editor to edit the batch workflow script, which is used for processing a large number of files. Greater performance with the use of superscript, which is a multi-threaded extension to the script processing allowing the user to specify the number of cores in a machine to achieve maximum performance.
3.0.2	January 5, 2010	Extracting text content from all Microsoft Office 2003 documents including Word, Excel, and PowerPoint. Ability to compose a text file from within AutoMap and include it within the corpus of text being analyzed. Expanded ability to check and transform text documents in the UTF-8 encoding. Greater control for advanced users to be able to turn off pop-up hints. Ability to change the font size as well as the font for the main text display window. Preprocessing routine to be able to remove individual symbols specified by the user. Ability to un-do and re-do preprocessing operations, as well as ability to insert operation within the stack of operations already performed. Routine to be able to strip file headers, either by number of lines or by a key phrase. Thesauri routines to check for missing entry parts, duplicate entries, circular logic. Ability to apply a delete list to a thesauri.
3.0.1D	August 18, 2009	Bug fix update.
3.0.1	June 18, 2009	Ability to deduplicate text files by content. Copy files based on filename criteria so as to include the subset of files you care about. Extract web pages from a single source, extracting the text, and putting all the files into a single directory for processing. Extracting text content from Microsoft Word 2003 documents, and Adobe PDF files. Ability to change the text font to any font installed on the computer, to be able to view any foreign language character set. Supplemental tools have been developed to aid the AutoMap user. Tools include: Delete list editor. Able to interactively add or remove terms, to add terms from a previously generated concept list, to identify possible misspelled words in the delete list, to add stemmed versions of terms on the delete list. Thesauri editor. Able to interactively add or remove terms, to identify misspelled words, to add stemmed versions of terms in the thesauri, sort based on the number of terms, to merge multiple thesauri together. Concept list viewer. Able to sort the concept list by frequency or relative frequency, to compare to previously generated concept lists, to save concepts to a delete list, to select terms by a minimum or maximum frequency, to save a subset of the terms into a file.
3.0.0	June 7, 2009	Redesigned user interface. Included is a quick-launch area for commonly used preprocessing commands which are available to the user quickly. The message window provides user feedback and reminders--such as where you just stored that file. All of the commands have been moved to the menu bar to keep users from having to hunt for what is available. Performance has been significantly improved.
2.7.67	September 3, 2008	Bug fixes.
2.7.66	August 1, 2008	Fixed unicode handling Improved n-gram extraction
2.7.65	July 25, 2008
2.7.64	July 18, 2008
2.7.63	June 23, 2008
2.7.60	April 24, 2008
2.7.50	January 30, 2008
2.7.40	September 5, 2007
2.7.30	July 30, 2007	Preserve umlauts during auto-cleaning of data Option: Parts of Speech tags as node attributes in XML outputs Output List with parts of Speech per Word Anaphora Resolution
2.7.20	July 13, 2007
2.7.10	July 5, 2007
2.7.00	June 25, 2007
2.6.70	May 17, 2007	computation of semantic entropy integrated installer for GUI and batch mode
2.6.60	March 20, 2007	the generalization thesaurus interface is back to a list style updated visualizer and matrix editor corpus editing: under tools, there is a new tool that allows for removing text after certain words phrases
2.6.50	February 15, 2007	changes in union concept list and concept list to facilitate adding to delete list refine table functions in union concept list and concept list for sorting
2.6.40	February 1, 2007	CASOS Email Parser (CEMAP) The CASOS Email Parser (CEMAP) that is launched through AutoMap enables the extraction of different types of network information from emails. Parts of Speech Tagging This routine associates every word after the highest level of pre-processing applied so far with its Parts of Speech. Changes in DyNetML outputs for XML files per Text The DyNetML outputs per Map (Semantic Network Analysis) and per Text (Sub-Matrix Analysis) have been changed to reflect: The new DyNetML specification (http://casos.isri.cmu.edu/dynetml/index.html) Isolated nodes Frequency of nodes A set of attributes on the data and statistical results
2.6.30	December 1, 2006	AutoMap to Database Connector: Purpose: Data management. Store thesauri and AutoMap analysis outputs (XML files, DyNetML files) in a MySQL based relational database of your choice (Microsoft Access, MySQL database, Microsoft SQL server, PostgreSQL, Oracle). If multiple thesauri or outputs are written to the database, no previous data gets overwritten. This is because the database connector associates each data file a unique serial ID. Feature Selection: Purpose: Find the most important terms in a dataset, e.g. in order to build a generalization thesaurus or to get an idea of what is going on in the data. Routine that identifies the most important terms in the text set (language independent, dataset dependent). This functionality based on the TFIDF algorithm. Output stored as .csv list. Standard format for pre-processing material changed from .txt to .csv:* Purpose: Easier to work with these files in Excel or other csv based editors. All pre-processing material (delete list, thesauri, sub-matrix selection file, etc.) that used to be in .txt format is now in .csv format. Thus, the coma as delimiter got replace with a forward slash. If you would like to used previous thesauri, please replace all , with //. If you have any trouble doing this, please let me know. Thesauri are displayed in tables that can be sorted by each column in the graphical user interface of AutoMap: Purpose: Easy sorting per column, easy copy and paste from or to Excel. Automated removal of noise symbols and/or numbers: Purpose: Fast cleaning. More occurrences per term. Remove symbols (this is basically any non-significant noise, such as ^, # etc.) while maintaining end-of-the-sentence identifiers (. ? !) Remove symbols and numbers (this is basically any non-significant noise and all numbers, such as ^, # etc.) Text files after applying either routine can be stored to a directory of your choice. CompareMap: Panels displayed as tables that can be sorted by each column. Inputs other than DyNetML files changed from .txt to .csv file format.
2.6.20	June 19, 2006	Move from java 1.4.2 to jav 5.0 New tool suite for loading dynetml file, see only the attributes and what meta-matrix categories they are associated with, edit these associations, and write the changes back to the dynetml file (only applies to GUI version).
2.6.10	May 12, 2006	New features mainly designed for speeding up batch mode processing as well as coding of large data sets with any version: Option for automated edge attribute extraction in GUI version, single batch mode version and multi batch Option for storing map files and stat files for GUI version, single batch mode version and multi batch Other features: Language dependent stemmers for Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish and Swedish in GUI version. Feature for users to customize stemming for all foreign languages and English. Move concepts from Union Concept List interface to Delete List (only applies to GUI version). Designed and supported as an iterative process.
2.6.00	April 2006	Completely new multi batch mode processing tool – use one script for processing multiple input folders Logical inference engine: Fully flexible rule set. This enables the user to chose from pre-defined rules and/ or to build their own rule sets. Post Processor: Attribute-value thesauri functionality . New tool for merging BiGram file New tool for merging Named Entity Recognition (with threshold that users can define) BiGram generation: Faster now.
2.5.19	March 27, 2006	Routine for creating ego networks from existent DyNetML or XML files. The ego node can be an existent or new node in an existent or new meta-matrix category.
2.5.18	March 15, 2006	New batch mode version for script based back end processing Network Reasoner Tool (Rule-based inference of nodes and edges given information existant in a graph) Refined Krovetz Stemmer
2.5.17	February 20, 2006	New batch mode version for script based back end processing Output quantitative information on data reduction achieved via deletion and generalization Show and track coverage of input concepts in delete list and thesauri New routine in Map Post Processor that enables the association of different attributes with different meta-matrix categories
2.5.16	January 2006	Trace window that tracks all user's actions More default thesauri for generalization and meta-matrix coding
2.5.15 script version	December 30, 2005	Enhanced adding end editing of node attributes in the Post Processor
2.5.14 script version	December 28, 2005	Batch mode version of AutoMap. Runs under Windows, Linux and on Macs. Uses an XML script for specification of coding settings.
2.5.14	December 27, 2005	New editors for Node attributes and Edge Attributes in Post Processor (add or modify attributes and related ontological categories) Enhanced functionalities for DyNetML editing for all Thesauri in the Post processor
2.5.13	November 15, 2005	Add Date Extractor Add general time coding to interpretation thesaurus (Map Post Processor) Updated Matrix Editor Updated Social Insight Visualizer
2.5.12	November 8, 2005	1. Automated attribute interpretation: Enumerated attributes (property name, typically attribute 1, attribute 2, etc.) in the property specification section of edges and/ or nodes in a DyNetML will be interpreted. Interpretation here means that the property name will be translated into a meaningful category. If the data from the resulting DyNetML file is stored in a relational database the meaningful category could serve as the key. The type and value of an attribute remain unchanged. The value could serve as the result when searching relational data that is stored in a database. Attributes that were not covered with the translation thesaurus will be renamed into "nyi" (not yet interpreted). 2. Add predefined interpretation thesaurus for Map Post Processor.
2.5.11	November 2, 2005	New Tool (under Tools): Post Processor for Relational Data Purpose: Refine or edit attributes of nodes and/ or edges represented in XML/ DyNetML files that were generated with AutoMap or any other software. Five different sub-components that support different levels of details for editing different portions of XML/ DyNetML files.
2.5.10	October 19, 2005	Add Output Storage Manager Outputs from pre-processing and different types of analysis can now be stored with user-defined file names at user-defined directories (optional Remembers user-defined output file named and directories when AutoMap is exited and used again later on Pre-Procressing and Statement Formation routines are now located are separated into two tabs - pre-processing and analysis Add some more pre-defined material for helping to create thesauri: List of common punctuation as a possible delete list (can be customized) Generalization Thesaurus for converting common contractions in English into non-contracted form (can be customized) Generalization Thesaurus for converting common abbreviations in English into non-abbreviated form (can be customized) Generalization Thesaurus for known countries (can be customized) Meta-Matrix Thesaurus for known countries (can be customized) For meta-matrix coding, the entity "tasks-events" is split up into 2 separate entities: task and event. DyNetML files generated in AutoMap will reflect this change. For counting unique and total concepts for the union list per text and per corpus all words are treat ed as case insensitive
2.5.04	October 6, 2005	Add limited Parts of Speech Extraction functionality for English. Add Extractor for Numericals. Add functionality to use output baseline for delete list and/ or meta matrix thesaurus. Language independent feature. Add functionality for Bi-Gram Detector to use output as baseline for generalization thesaurus. Add functionality for Named Entity Extraction to use output as baseline for generalization thesaurus. Add functionality to generalization thesaurus panel to create a positive list thesaurus. Add information about workflow, language dependency, automation, customization and appropriateness of repetition of procedures for all pre-processing techniques to User's Guide. Mark pre-processing utilities in the interface as being language independent or not. User's Guide also contains FAQ section now. This contains information on how to output excel files from AutoMap, how to use Excel files as AutoMap input, how to view and print the meta-matrix thesaurus by entity class
2.5.03	September 8, 2005	Add Text File Splitter Add customization features for Krovetz Stemmer
2.5.02	July 24, 2005	upgraded Java (1.4.2._08) batch file for starting script version of AutoMap batch file for starting GUI version of AutoMap
2.5.01	July 12, 2005	Output option: XML (DyNetML) outputs of mental models (result of Map Analysis). This enables the creation of meta-matrix data when no meta-matrix coding has been performed. All nodes will be of type knowledge. This enables the automated extraction of knowledge networks. Updated Visualizer, Matrix Editor, Network Data Format Converter, DyNetML file Merger CompareMap: DL outputs from map consolidation
2.5.00	June 18, 2005	Ontology handling: use any ontology the user provides generate XML (DyNetML) files that represent user-defined ontologies user-defined exclusion of any (multiple) entities given in the ontology from node representation and thus edge formation, but considering them as attributes of entities (e.g. attributes, roles) Give user option to decide whether they measure textual networks or social networks that are represented in texts Knowledge grouping (cluster identification) in CompareMap Speed up KStem
2.0.13	May 11, 2005	Provides KStem, a inflection and derivation based stemmer invented by Bob Krovetz
2.0.12	May 5, 2005	Add CASOS tools for: Merging XML files Converting network data Visualizing networks Editing network data Enhance creation of binary maps
2.0.11	February 14, 2005	Enhancement of Thesaurus Application Installer for Linux and Unix
2.0.10	November 17, 2004	Utilities: Bi-Gram Identification Visualization: CASOS visualizer integrated Network Conversion: Updated Network Converter DyNetML: Inference of Edge Attributes CompareMap: .cvs output files for Hamming Distances
2.0.04	October 25, 2004	CompareMap: Computation of Hamming Distances: for Map Comparison the Hamming Distance between each given pair of maps is computed for Map Consolidation Hamming Distances between the consolidated map or central graph and each given map is computed
2.0.03	September 10, 2004	Symmetrize DL files by default Automatically store outputs from CompareMap in .map format Convert outputs from CompareMap to DyNetML upon user's request
2.0.02	August 9, 2004	Dynamic generation of full meta matrix for sub matrix selection from meta matrix model as specified be the user (including user-defined categories)
2.0.01	August 2, 2004	How to on Collocation Identification (Bigrams) with AutoMap Network Converter tool added that converts data from to: CVS, DL, UCINET, DyNetML, VNA VNA files can be visualized in NetDraw