|Version||Date||Description of Changes (new features, bug fixes, etc.)
|| August 29, 2012
- Improvements to POSProcessing and the D2M-thesauri process.
|| May 16, 2012
|| December 16, 2011
- An ability to extract files from an SVN repository, allowing for version controlled files to be automatically updated when doing processing. Users are able to use a version repository to keep track of changes.
- Routine to flatten directories and put all text files into a common folder.
- Operations to be done on the text in the main window in addition to filtering the words, to hide known entries already added to a thesauri, and to show tags based on ontology or part of speech.
- Capability to remove semantic networks from the meta-network.
- Ability to specify the temporary workspace location in the GUI. This allows the use of a RAM disk to significantly improve performance.
- Greatly enhanced operations to suggest thesauri using machine learning.
- Operations on Lexis-Nexus documents include extracting metainformation and also to rename files based on date for over-time analysis.
- Extract potential acronyms, and potential n-grams.
- Infer kinship.
|| June 15, 2011
- Extract text from HTML or RTF documents.
- Merge hyphenated words that occur at the end of lines, typically as a result of OCR conversion.
- Produce a list of approximate names, to be reviewed to determine which are aliases.
- Ability to derole names, to remove titles, and to resolve names within a name thesauri.
- Split compound thesauri entries.
- Apply rules to determine ontological categories (such as a road being a location).
- An improvement to the thesauri merge program, to be able to merge in the master format, and to improve performance by reducing the number of intermediate steps.
|| March 9, 2011
- Macros for preprocessing to clean files, which preserve syntax for part of speech extraction, and to prepare files, which does additional standard preprocessing operations, which will interfere with the part of speech of the sentences.
- Additional preprocessing operations: replace HTML symbols, fix common typos, convert british to american spelling, expand common constractions and abbreviations, remove noise words.
- Conversion of common compound concepts into n-grams. The n-grams are all definition altering compound concepts such as "first aid" and "civil war."
- Extraction of nouns and verbs.
- Suggested thesauri routines based on machine learning and based on part of speech.
- Context-sensitive stemming, to depluralize common nouns and detense common verbs.
- An attribute editor.
|| February 7, 2011
|| January 3, 2011
- Ability to split/recombine text files. This is for very large text files to be able to increase the amount of parallelism to improve performance.
- A CSV splitter, to split a CSV file into it structured and unstructured components. The unstructured data can be run through AutoMap and then ORA with its structured part being imported into ORA.
- A Lexis-Nexus splitter, to split the files downloaded from Lexis-Nexus which contain several documents, into its individual documents.
- The tools within AutoMap have been adapted for a new master format for thesauri and delete lists. The master format puts all generalization, metanetwork, and delete files into one single file to reduce the misalignment of entries.
|| September 1, 2010
- Ability to change the default GUI color.
- Routine to be able to compare text files, which augments are ability to deduplicate text files including in-exact deduplication.
- A storage preference to allow users to default all output to a particular destination, and not having to specify a location for each output generated.
- Routine to extract actions (verbs) from text into a list. This is often used to find the "task" concepts.
- A concept list trimmer routine to cull from the concept list low frequency concepts.
- A location distillation procedure to extract and produce a location-based thesauri from a gazateer database.
|| June 11, 2010
- AutoMap is now available as a 32-bit and a 64-bit executable.
- Capability to change heap size preferences for larger files.
- Ability to generate a concept network as well as a concept list. The concept network contains all nodes and no link. Links can be manually added via ORA. In addition, it is no longer necessary to generate an entire semantic network for the hot topics report.
- Revamped script runner program, to be able to use the new drag-and-drop editor to edit the batch workflow script, which is used for processing a large number of files.
- Greater performance with the use of superscript, which is a multi-threaded extension to the script processing allowing the user to specify the number of cores in a machine to achieve maximum performance.
|| January 5, 2010
- Extracting text content from all Microsoft Office 2003 documents including Word, Excel, and PowerPoint.
- Ability to compose a text file from within AutoMap and include it within the corpus of text being analyzed.
- Expanded ability to check and transform text documents in the UTF-8 encoding.
- Greater control for advanced users to be able to turn off pop-up hints.
- Ability to change the font size as well as the font for the main text display window.
- Preprocessing routine to be able to remove individual symbols specified by the user.
- Ability to un-do and re-do preprocessing operations, as well as ability to insert operation within the stack of operations already performed.
- Routine to be able to strip file headers, either by number of lines or by a key phrase.
- Thesauri routines to check for missing entry parts, duplicate entries, circular logic.
- Ability to apply a delete list to a thesauri.
||August 18, 2009
||June 18, 2009
- Ability to deduplicate text files by content.
- Copy files based on filename criteria so as to include the subset of files you care about.
- Extract web pages from a single source, extracting the text, and putting all the files into a single directory for processing.
- Extracting text content from Microsoft Word 2003 documents, and Adobe PDF files.
- Ability to change the text font to any font installed on the computer, to be able to view any foreign language character set.
- Supplemental tools have been developed to aid the AutoMap user. Tools include:
- Delete list editor. Able to interactively add or remove terms, to add terms from a previously generated concept list, to identify possible misspelled words in the delete list, to add stemmed versions of terms on the delete list.
- Thesauri editor. Able to interactively add or remove terms, to identify misspelled words, to add stemmed versions of terms in the thesauri, sort based on the number of terms, to merge multiple thesauri together.
- Concept list viewer. Able to sort the concept list by frequency or relative frequency, to compare to previously generated concept lists, to save concepts to a delete list, to select terms by a minimum or maximum frequency, to save a subset of the terms into a file.
|| June 7, 2009
- Redesigned user interface.
- Included is a quick-launch area for commonly used preprocessing commands which are available to the user quickly.
- The message window provides user feedback and reminders--such as where you just stored that file.
- All of the commands have been moved to the menu bar to keep users from having to hunt for what is available.
- Performance has been significantly improved.
|| September 3, 2008
|| August 1, 2008
- Fixed unicode handling
- Improved n-gram extraction
|| July 25, 2008
|| July 18, 2008
|| June 23, 2008
|| April 24, 2008
|| January 30, 2008
|| September 5, 2007
|| July 30, 2007
- Preserve umlauts during auto-cleaning of data
- Option: Parts of Speech tags as node attributes in XML outputs
- Output List with parts of Speech per Word
- Anaphora Resolution
|| July 13, 2007
|| July 5, 2007
|| June 25, 2007
|| May 17, 2007
- computation of semantic entropy
- integrated installer for GUI and batch mode
March 20, 2007
- the generalization thesaurus interface is back to a list style
- updated visualizer and matrix editor
- corpus editing: under tools, there is a new tool that allows for removing text after certain words phrases
February 15, 2007
- changes in union concept list and concept list to facilitate adding to delete list
- refine table functions in union concept list and concept list for sorting
February 1, 2007
CASOS Email Parser (CEMAP)
- The CASOS Email Parser (CEMAP) that is launched through AutoMap enables the extraction of different types of network information from emails.
Parts of Speech Tagging
- This routine associates every word after the highest level of pre-processing applied so far with its Parts of Speech.
Changes in DyNetML outputs for XML files per Text
- The DyNetML outputs per Map (Semantic Network Analysis) and per Text (Sub-Matrix Analysis) have been changed to reflect:
- The new DyNetML specification (http://casos.isri.cmu.edu/dynetml/index.html)
- Isolated nodes
- Frequency of nodes
- A set of attributes on the data and statistical results
|December 1, 2006
AutoMap to Database Connector:
- Purpose: Data management.
- Store thesauri and AutoMap analysis outputs (XML files, DyNetML files) in a MySQL based relational database of your choice (Microsoft Access, MySQL database, Microsoft SQL server, PostgreSQL, Oracle).
- If multiple thesauri or outputs are written to the database, no previous data gets overwritten. This is because the database connector associates each data file a unique serial ID.
- Purpose: Find the most important terms in a dataset, e.g. in order to build a generalization thesaurus or to get an idea of what is going on in the data.
- Routine that identifies the most important terms in the text set (language independent, dataset dependent). This functionality based on the TF*IDF algorithm.
- Output stored as .csv list.
Standard format for pre-processing material changed from .txt to .csv:
- Purpose: Easier to work with these files in Excel or other csv based editors.
- All pre-processing material (delete list, thesauri, sub-matrix selection file, etc.) that used to be in .txt format is now in .csv format. Thus, the coma as delimiter got replace with a forward slash. If you would like to used previous thesauri, please replace all , with //. If you have any trouble doing this, please let me know.
Thesauri are displayed in tables that can be sorted by each column in the graphical user interface of AutoMap:
- Purpose: Easy sorting per column, easy copy and paste from or to Excel.
Automated removal of noise symbols and/or numbers:
- Purpose: Fast cleaning. More occurrences per term.
- Remove symbols (this is basically any non-significant noise, such as ^, # etc.) while maintaining end-of-the-sentence identifiers (. ? !)
- Remove symbols and numbers (this is basically any non-significant noise and all numbers, such as ^, # etc.)
- Text files after applying either routine can be stored to a directory of your choice.
- Panels displayed as tables that can be sorted by each column.
- Inputs other than DyNetML files changed from .txt to .csv file format.
|June 19, 2006
- Move from java 1.4.2 to jav 5.0
- New tool suite for loading dynetml file, see only the attributes
and what meta-matrix categories they are associated with, edit
these associations, and write the changes back to the dynetml file
(only applies to GUI version).
|May 12, 2006
New features mainly designed for speeding up batch mode
processing as well as coding of large data sets with any version:
- Option for automated edge attribute
extraction in GUI version, single
batch mode version and multi
- Option for storing map files and stat files for
GUI version, single batch mode version and multi
- Language dependent stemmers for Danish, Dutch, English,
Finnish, French, German, Italian, Norwegian, Portuguese,
Russian, Spanish and Swedish in GUI version.
- Feature for users to customize stemming for all foreign
languages and English.
- Move concepts from Union Concept List interface to Delete
List (only applies to GUI version). Designed and supported
as an iterative process.
- Completely new multi batch mode processing tool – use
one script for processing multiple input folders
- Logical inference engine: Fully flexible rule set.
This enables the user to chose from pre-defined rules and/
or to build their own rule sets.
- Post Processor: Attribute-value thesauri functionality .
- New tool for merging BiGram file
- New tool for merging Named Entity Recognition (with threshold
- BiGram generation: Faster now.
|March 27, 2006
- Routine for creating ego networks from existent DyNetML or
XML files. The
ego node can be an existent or new node in an existent or new
|March 15, 2006
- New batch mode version for script based back end processing
- Network Reasoner Tool (Rule-based inference of
nodes and edges given information existant in a graph)
- Refined Krovetz Stemmer
|February 20, 2006
- New batch mode version for script based back end processing
- Output quantitative information on data reduction achieved
via deletion and generalization
- Show and track coverage of input concepts in delete list and
- New routine in Map Post Processor that enables the association
of different attributes with different meta-matrix categories
- Trace window that tracks all user's actions
default thesauri for generalization and meta-matrix coding
|December 30, 2005
- Enhanced adding end editing of node attributes in the Post
|December 28, 2005
- Batch mode version of AutoMap.
- Runs under Windows, Linux and
- Uses an XML script for specification of coding settings.
||December 27, 2005
- New editors for Node attributes and Edge Attributes in Post
Processor (add or modify attributes and related ontological categories)
- Enhanced functionalities for DyNetML editing for all Thesauri
in the Post processor
||November 15, 2005
- Add Date Extractor
- Add general time coding to interpretation thesaurus (Map Post
- Updated Matrix Editor
- Updated Social Insight Visualizer
||November 8, 2005
1. Automated attribute interpretation:
Enumerated attributes (property name, typically
attribute 1, attribute 2, etc.) in the property specification
edges and/ or nodes in a DyNetML will be interpreted. Interpretation here
means that the property name will be translated into a meaningful
If the data from the resulting DyNetML file is stored in a relational
database the meaningful category could serve as the key. The type and value
of an attribute remain unchanged. The value could serve as the result when
searching relational data that is stored in a database. Attributes that were
not covered with the translation thesaurus will be renamed into "nyi" (not
2. Add predefined interpretation thesaurus
for Map Post Processor.
||November 2, 2005
- New Tool (under Tools): Post Processor for Relational
- Purpose: Refine or edit attributes of nodes and/
or edges represented in XML/ DyNetML files that were
generated with AutoMap or any other software.
- Five different sub-components that support
different levels of details for editing different
portions of XML/ DyNetML files.
||October 19, 2005
- Add Output Storage Manager
- Outputs from pre-processing and different types
of analysis can now be stored
with user-defined file names at user-defined directories
- Remembers user-defined output file named and directories
when AutoMap is exited and used again later on
and Statement Formation routines are now located are
separated into two tabs - pre-processing and analysis
some more pre-defined material for helping to create
- List of common punctuation as a possible delete list (can
- Generalization Thesaurus for converting common contractions
in English into non-contracted form (can
- Generalization Thesaurus for converting common abbreviations
in English into non-abbreviated form (can be customized)
- Generalization Thesaurus for
known countries (can be customized)
- Meta-Matrix Thesaurus for known countries (can be customized)
- For meta-matrix coding, the entity "tasks-events"
is split up into 2 separate entities: task and event. DyNetML
files generated in AutoMap will reflect this change.
- For counting unique
and total concepts for the union list per text and per corpus
all words are treat ed as case insensitive
||October 6, 2005
- Add limited Parts of Speech Extraction functionality for English.
- Add Extractor for Numericals. Add functionality to use output
baseline for delete list and/ or meta matrix thesaurus.
Language independent feature.
- Add functionality for Bi-Gram Detector to use output as baseline
for generalization thesaurus.
- Add functionality for Named Entity Extraction to use output
as baseline for generalization thesaurus.
- Add functionality to generalization thesaurus panel to create
a positive list thesaurus.
- Add information about
workflow, language dependency, automation, customization and
appropriateness of repetition of procedures for all pre-processing
techniques to User's Guide.
- Mark pre-processing utilities in the interface as being language
independent or not.
- User's Guide also contains FAQ section now. This contains information
on how to output excel files from AutoMap, how to use Excel
files as AutoMap input, how to view and print the meta-matrix
thesaurus by entity class
||September 8, 2005
- Add Text File Splitter
- Add customization features for Krovetz Stemmer
||July 24, 2005
- upgraded Java (1.4.2._08)
- batch file for starting script version of AutoMap
- batch file for starting GUI version of AutoMap
||July 12, 2005
- Output option: XML (DyNetML) outputs of mental models (result
Analysis). This enables the creation of meta-matrix data when no meta-matrix
coding has been performed. All nodes will be of type knowledge. This enables
automated extraction of knowledge networks.
- Updated Visualizer, Matrix Editor, Network Data Format Converter,
- CompareMap: DL outputs from map consolidation
||June 18, 2005
- Ontology handling:
- use any ontology the user provides
- generate XML (DyNetML) files that represent user-defined
- user-defined exclusion of any (multiple) entities given
in the ontology from node representation and thus edge formation,
but considering them as attributes of entities (e.g. attributes,
- Give user option to decide whether they measure textual networks
or social networks that are represented in texts
- Knowledge grouping (cluster identification) in CompareMap
- Speed up KStem
||May 11, 2005
- Provides KStem, a inflection and derivation based stemmer invented
by Bob Krovetz
||May 5, 2005
- Add CASOS tools for:
- Merging XML files
- Converting network data
- Visualizing networks
- Editing network data
- Enhance creation of binary maps
||February 14, 2005
- Enhancement of Thesaurus Application
- Installer for Linux and Unix
||November 17, 2004
- Utilities: Bi-Gram Identification
- Visualization: CASOS visualizer integrated
- Network Conversion: Updated Network Converter
- DyNetML: Inference of Edge Attributes
- CompareMap: .cvs output files for Hamming Distances
||October 25, 2004
- CompareMap: Computation of Hamming Distances:
- for Map Comparison the Hamming Distance between each given pair of maps is computed
- for Map Consolidation Hamming Distances between the consolidated map or central graph and each given map is computed
||September 10, 2004
- Symmetrize DL files by default
- Automatically store outputs from CompareMap in .map format
- Convert outputs from CompareMap to DyNetML upon user's request
||August 9, 2004
- Dynamic generation of full meta matrix for sub matrix selection from meta matrix model as specified be the user (including user-defined categories)
||August 2, 2004