Practical Guide: Logistic Regression

Hilbe, J. M. (2015) Practical Guide to Logistic Regression

This short book shows the reader how to model a binary response variable using basic logistic regression models – and despite its modest size, Joseph M. Hilbe manages to introduce the reader to logistic models with single or multiple predictors as well as to grouped and Bayesian logistic regression.

Hilbe suggests that the book would be appropriate for someone who has completed a basic course in statistics which includes linear regression. I would agree with this assessment, though I would also recommend working through an introductory tutorial on R. That said, this book is written in an exceptionally clear style which means that the reader can expect a treatment of the subject which is concise but comprehensible.

An additional selling point of this text is that it introduces new R functions which can be applied in one’s own work, as well as equivalent SAS and Stata code. The provision of complete code in the book and on a dedicated website will also be of benefit to readers who wish to spend more time learning about logistic regression models than hacking code.

Indeed, the emphasis on understanding logistic regression modelling rather than on the mechanistic application of techniques is one of the great strengths of the book. Anyone who reads this book will therefore feel that they have a good understanding of this subject which can be consolidated both by analysis of their own data and by further reading.

Review originally published in Reviews. Significance,13:2 45. doi: 10.1111/j.1740-9713.2016.00885.x

SSD for R and Single-Subject Data

Auerbach, C., Zeitlin, W. (2014) SSD for R: An R Package for Analyzing Single-Subject Data

This work is short but, in spite of its brevity, Charles Auerbach and Wendy Zeitlin’s book describes how to analyse single-subject data using their own package, SSD for R. They introduce its functions as well as providing advice on how to analyse baseline and intervention phase data.

I thought that their discussion of serial dependency was particularly well done, as was their emphasis on how to use SSD for R to visualise data. Other chapters provide introductions to statistical testing and to the analysis of group data.

Readers should note that the book does not deal with single-subject methodology in any depth, so additional resources will be needed in order to make best use of the package. Fortunately, the authors include useful references for those who need information on specific research designs.

R newbies may need to read an introductory R text as the book’s scope is understandably restricted to providing information about the package. But Auerbach and Zeitlin write well and the content does not demand much in the way of prior statistical knowledge or IT skills.

Statisticians may not need to avail themselves of this book, but practitioners who are working in applied disciplines such as social work, psychology and medicine will find it very appealing.

Review originally published in Reviews. Significance, 12:4 45. doi: 10.1111/j.1740-9713.2015.00846.x

Text Mining with ‘tm’

It is possible to identify top level categories in qualitative data analysis by using text mining methods. One can count the frequency of terms or words in a text or texts. Words which occur frequently may be top level classifications or themes.

Text mining involves the creation of a corpus or collection of texts for analysis, some initial work to preprocess the corpus so that punctuation, capitalisation and numbers are removed as well as common words which are, ipso facto, very frequent in any text. A document term matrix is then created where the documents in the corpus are represented by rows and the words by columns. Analysis could then include identification of frequent terms and a ‘frequency of frequencies’ i.e. how many words occur in a corpus at specific frequencies?

For further detail, check out Kailash Awati’s Gentle Introduction to Text Mining with R here and an RStudio resource here which describe how to text mine with R’s tm package. The RStudio link also includes additional links to books on text and data mining as well as material on ‘clustering’ methods.

Both tutorials assume that R is already installed. If this is not the case, go to The R Project for Statistical Computing here and follow the instructions for your system.

R binaries are available for Windows, Mac and Linux distributions.


The R package RQDA may be one alternative for qualitative researchers who do not have access to, or do not wish to use, proprietary CAQDAS software. RQDA allows the user to import text files, create codes and file categories and to visualise file categories with  sociograms.

It’s also possible to run the package from the command line and to export RQDA data to LaTeX.

Further information is available from:

  • the RQDA site;
  • the RQDA User Manual;
  • and Metin Caliskan’s excellent YouTube tutorials.