Text Mining with ‘tm’

It is possible to identify top level categories in qualitative data analysis by using text mining methods. One can count the frequency of terms or words in a text or texts. Words which occur frequently may be top level classifications or themes.

Text mining involves the creation of a corpus or collection of texts for analysis, some initial work to preprocess the corpus so that punctuation, capitalisation and numbers are removed as well as common words which are, ipso facto, very frequent in any text. A document term matrix is then created where the documents in the corpus are represented by rows and the words by columns. Analysis could then include identification of frequent terms and a ‘frequency of frequencies’ i.e. how many words occur in a corpus at specific frequencies?

For further detail, check out Kailash Awati’s Gentle Introduction to Text Mining with R here and an RStudio resource here which describe how to text mine with R’s tm package. The RStudio link also includes additional links to books on text and data mining as well as material on ‘clustering’ methods.

Both tutorials assume that R is already installed. If this is not the case, go to The R Project for Statistical Computing here and follow the instructions for your system.

R binaries are available for Windows, Mac and Linux distributions.

CAQDAS with RQDA

The R package RQDA may be one alternative for qualitative researchers who do not have access to, or do not wish to use, proprietary CAQDAS software. RQDA allows the user to import text files, create codes and file categories and to visualise file categories with  sociograms.

It’s also possible to run the package from the command line and to export RQDA data to LaTeX.

Further information is available from:

  • the RQDA site;
  • the RQDA User Manual;
  • and Metin Caliskan’s excellent YouTube tutorials.

 

 

Ethnography

In Watching Closely: A Guide to Ethnographic Observation, Christena Nippert-Eng presents a new guide to undertaking ethnographic observation, providing both exercises and advice for researchers. This book will be of use to scholars regardless of their level of experience [….. and combines] solid instruction in the technicalities of ethnographic research methodologies with an engaging, inspiring and insightful approach.

If this sound interesting, why not check out my review at LSE US Centre  here ?

­