CRF for Text Analysis

Overview | People | Collaborators | Sponsors | Publications | Tools

Previous research suggests that one field with a strong yet unsatisfied need for automated extraction of instances of various entities classes from text data is the analysis of socio-technical systems. Domain-specific entity classes and the relations between them are often specified in ontologies or taxonomies. We work on Conditional Random Field-based approach to distilling a non-canonical set of entities, which is defined in an ontology that originates from organization science. This supervised sequential machine learning technique facilitates the derivation of relational data from corpora by locating and classifying instances of various entity classes. The classified entities can then be used as nodes for the construction of socio-technical networks. We envision researchers to use the presented methodology as one crucial step in the process of advanced modeling and analysis of complex and dynamic real-world organizations or networks. We find the outcome with respect to accuracy measures and the resulting model sufficiently successful for being applied in the described problem domain in the future.