Text Mining with ‘tm’

It is possible to identify top level categories in qualitative data analysis by using text mining methods. One can count the frequency of terms or words in a text or texts. Words which occur frequently may be top level classifications or themes.

Text mining involves the creation of a corpus or collection of texts for analysis, some initial work to preprocess the corpus so that punctuation, capitalisation and numbers are removed as well as common words which are, ipso facto, very frequent in any text. A document term matrix is then created where the documents in the corpus are represented by rows and the words by columns. Analysis could then include identification of frequent terms and a ‘frequency of frequencies’ i.e. how many words occur in a corpus at specific frequencies?

For further detail, check out Kailash Awati’s Gentle Introduction to Text Mining with R here and an RStudio resource here which describe how to text mine with R’s tm package. The RStudio link also includes additional links to books on text and data mining as well as material on ‘clustering’ methods.

Both tutorials assume that R is already installed. If this is not the case, go to The R Project for Statistical Computing here and follow the instructions for your system.

R binaries are available for Windows, Mac and Linux distributions.


The R package RQDA may be one alternative for qualitative researchers who do not have access to, or do not wish to use, proprietary CAQDAS software. RQDA allows the user to import text files, create codes and file categories and to visualise file categories with  sociograms.

It’s also possible to run the package from the command line and to export RQDA data to LaTeX.

Further information is available from:

  • the RQDA site;
  • the RQDA User Manual;
  • and Metin Caliskan’s excellent YouTube tutorials.



Research as activism

JustPublics@365 have produced some interesting skills guides for scholars who wish to build an audience for their work beyond academia. Example guides include a Social Media Toolkita report: Engaging Academics and Reimagining Scholarly Communication for the Public Good and thought provoking material on altmetrics.

The site is particularly interesting because of the collaborations which it encourages between scholars, activists and journalists in the pursuit of social justice.

So why not take a look at their resources here ?