Features and Specification

The LCM is an analysis tool to perform text mining tasks without explicit guidance by NLP experts. The LCM integrates several procedures for retrieving, annotating and mining textual data with lexicometrics and machine learning. Flexibility in combining these tools allows for various analysis interests ranging from quantitative corpus linguistics to qualitative methodologies.

Information retrieval

Assuming the availability of a large document collection, e.g. complete volumes of a daily newspaper over several decades, a common need is to identify documents of interest for certain research questions.


The LCM has implemented computation and visualization of basic corpus linguistic measures on stored collections. It allows for frequency analysis, co-occurrence analysis and automatic extraction of key terms.

Topic models

For analysis of topical structures in large text collections Topic Models have been shown to be useful in recent studies. Topic Models are statistical models which infer probability distributions over latent variables, assumed to represent topics, in text collections as well as in single documents.


Supervised learning from annotated text to assist coding of documents or parts of documents promises to be one major innovation to Content Analysis applications. The LCM allows for manual annotation of complete documents or snippets of documents with category labels. The analyst may initially develop a hierarchical category system and / or refine it during the process of annotation. Annotated text parts are used as training examples for automatic classification processes which output category labels for unseen analysis units (e.g. sentences, paragraphs or documents).

Leipzig Corpus Miner

A Text Mining Infrastructure
for Qualitative Data Analysis
For contact and information on the LCM please contact the NLP Group at University of Leipzig.