Discovering statistics using R

Field, A., Miles J., Field, Z. (2012) Discovering statistics using R

This book teaches statistics by using R – the free statistical environment and programming language. It will be of use to undergraduate and postgraduate students and professional researchers across the social sciences, including material which ranges from the introductory to the advanced. Divided into four levels of difficulty with ‘Level 1’ representing introductory material and ‘Level 4’ the most advanced material, it may be read from beginning to end or with reference to particular techniques. An understanding of the advanced material may require knowing the material in earlier chapters. There is a comprehensive glossary of specialised terms and a selection of statistical tables in the appendix. There is also material on the publisher’s companion website and on the principal author’s own web pages.

The main strength of this book is that it presents a lot of information in an accessible, engaging and irreverent way. The style is informal with interesting excursions into the history of statistics and psychology. There are entertaining references to research papers which illustrate the methods explained, and are also very entertaining. The authors manage to pull off the Herculean task of teaching statistics through the medium of R. This is an achievement when one considers that R can be difficult to use for researchers who have never manipulated data from the command line. Another plus point is that the authors describe how to ‘extend’ R’s capabilities with ‘packages’. This is a massive time saver for any researcher who does not know which package is required in order to extend R’s base system to conduct a particular test. Field et al. also succeed in placing many of the statistical procedures to which they allude within the framework of the ‘general linear model’ giving the book a sense of theoretical coherence.

But I think that the book would have benefited from an explanation of how R fits into the wider ‘tool chain’ of public domain programs which can be used to produce a publication-ready paper. Moreover, some of the exemplars of R code may not work or may be illustrative of deprecated techniques but the principal author is maintaining an errata file on his own website. Nevertheless, I would recommend this book to students, academics and applied researchers. Although heavily weighted towards the interests of psychological researchers, it would not be too difficult to transfer the techniques to a different area of expertise. All in all, an invaluable resource.

Review originally published in Research Matters, December 2013

Practical Guide: Logistic Regression

Hilbe, J. M. (2015) Practical Guide to Logistic Regression

This short book shows the reader how to model a binary response variable using basic logistic regression models – and despite its modest size, Joseph M. Hilbe manages to introduce the reader to logistic models with single or multiple predictors as well as to grouped and Bayesian logistic regression.

Hilbe suggests that the book would be appropriate for someone who has completed a basic course in statistics which includes linear regression. I would agree with this assessment, though I would also recommend working through an introductory tutorial on R. That said, this book is written in an exceptionally clear style which means that the reader can expect a treatment of the subject which is concise but comprehensible.

An additional selling point of this text is that it introduces new R functions which can be applied in one’s own work, as well as equivalent SAS and Stata code. The provision of complete code in the book and on a dedicated website will also be of benefit to readers who wish to spend more time learning about logistic regression models than hacking code.

Indeed, the emphasis on understanding logistic regression modelling rather than on the mechanistic application of techniques is one of the great strengths of the book. Anyone who reads this book will therefore feel that they have a good understanding of this subject which can be consolidated both by analysis of their own data and by further reading.

Review originally published in Reviews. Significance,13:2 45. doi: 10.1111/j.1740-9713.2016.00885.x

SSD for R and Single-Subject Data

Auerbach, C., Zeitlin, W. (2014) SSD for R: An R Package for Analyzing Single-Subject Data

This work is short but, in spite of its brevity, Charles Auerbach and Wendy Zeitlin’s book describes how to analyse single-subject data using their own package, SSD for R. They introduce its functions as well as providing advice on how to analyse baseline and intervention phase data.

I thought that their discussion of serial dependency was particularly well done, as was their emphasis on how to use SSD for R to visualise data. Other chapters provide introductions to statistical testing and to the analysis of group data.

Readers should note that the book does not deal with single-subject methodology in any depth, so additional resources will be needed in order to make best use of the package. Fortunately, the authors include useful references for those who need information on specific research designs.

R newbies may need to read an introductory R text as the book’s scope is understandably restricted to providing information about the package. But Auerbach and Zeitlin write well and the content does not demand much in the way of prior statistical knowledge or IT skills.

Statisticians may not need to avail themselves of this book, but practitioners who are working in applied disciplines such as social work, psychology and medicine will find it very appealing.

Review originally published in Reviews. Significance, 12:4 45. doi: 10.1111/j.1740-9713.2015.00846.x

Text Mining with ‘tm’

It is possible to identify top level categories in qualitative data analysis by using text mining methods. One can count the frequency of terms or words in a text or texts. Words which occur frequently may be top level classifications or themes.

Text mining involves the creation of a corpus or collection of texts for analysis, some initial work to preprocess the corpus so that punctuation, capitalisation and numbers are removed as well as common words which are, ipso facto, very frequent in any text. A document term matrix is then created where the documents in the corpus are represented by rows and the words by columns. Analysis could then include identification of frequent terms and a ‘frequency of frequencies’ i.e. how many words occur in a corpus at specific frequencies?

For further detail, check out Kailash Awati’s Gentle Introduction to Text Mining with R here and an RStudio resource here which describe how to text mine with R’s tm package. The RStudio link also includes additional links to books on text and data mining as well as material on ‘clustering’ methods.

Both tutorials assume that R is already installed. If this is not the case, go to The R Project for Statistical Computing here and follow the instructions for your system.

R binaries are available for Windows, Mac and Linux distributions.

CAQDAS with RQDA

The R package RQDA may be one alternative for qualitative researchers who do not have access to, or do not wish to use, proprietary CAQDAS software. RQDA allows the user to import text files, create codes and file categories and to visualise file categories with  sociograms.

It’s also possible to run the package from the command line and to export RQDA data to LaTeX.

Further information is available from:

  • the RQDA site;
  • the RQDA User Manual;
  • and Metin Caliskan’s excellent YouTube tutorials.