Social Network Analysis

Borgatti, S.P., Everett, M.G. and Johnson J.C. (2013) Analyzing social networks

This book takes the reader on a tour of key theoretical concepts in social network analysis. It is divided into four sections: introduction, research methods, core concepts and measures and a final section which deals with what the writers describe as ‘three cross-cutting chapters’ on ‘affiliation type data’, ‘large networks’ and ‘ego network data’. Although primarily theoretical, the book refers to interesting empirical work across the social sciences and health care in order to illustrate core concepts. It introduces readers to software – UCINET and NetDraw – which they can use to analyse and visualise network data but refers to a dedicated website for readers who require a software tutorial.

There is much to commend in this book. The authors provide a clear introduction to graph theory and matrix algebra for non-mathematicians. There is also an interesting introduction to core concepts like ‘centrality’, ‘sub-group’ and ‘equivalence’ and a fascinating discussion of how hypothesis testing is possible with network data when the assumptions of standard inferential tests are violated. The authors also provide invaluable advice on how best to lay out network diagrams in order to make interpretation easier.

However, I think that how information is presented may need to be reviewed. The authors assume that readers are familiar with research terminology without necessarily defining their terms. Although this is a reasonable assumption if the book is for established researchers, beginners may need to refer to an introductory research methods textbook in order to take full advantage of the material. Borgatti et al. also state that a sequential reading of each chapter isn’t needed although this suggestion doesn’t work for readers who assume that a book will begin with straightforward material before moving to advanced topics. A glossary would be useful.

This is an informative book for established social researchers with some prior exposure to social network analysis. Aspirant social network analysts may find the book a little too advanced.

Review originally published in Research Matters, March 2014

Discovering statistics using R

Field, A., Miles J., Field, Z. (2012) Discovering statistics using R

This book teaches statistics by using R – the free statistical environment and programming language. It will be of use to undergraduate and postgraduate students and professional researchers across the social sciences, including material which ranges from the introductory to the advanced. Divided into four levels of difficulty with ‘Level 1’ representing introductory material and ‘Level 4’ the most advanced material, it may be read from beginning to end or with reference to particular techniques. An understanding of the advanced material may require knowing the material in earlier chapters. There is a comprehensive glossary of specialised terms and a selection of statistical tables in the appendix. There is also material on the publisher’s companion website and on the principal author’s own web pages.

The main strength of this book is that it presents a lot of information in an accessible, engaging and irreverent way. The style is informal with interesting excursions into the history of statistics and psychology. There are entertaining references to research papers which illustrate the methods explained, and are also very entertaining. The authors manage to pull off the Herculean task of teaching statistics through the medium of R. This is an achievement when one considers that R can be difficult to use for researchers who have never manipulated data from the command line. Another plus point is that the authors describe how to ‘extend’ R’s capabilities with ‘packages’. This is a massive time saver for any researcher who does not know which package is required in order to extend R’s base system to conduct a particular test. Field et al. also succeed in placing many of the statistical procedures to which they allude within the framework of the ‘general linear model’ giving the book a sense of theoretical coherence.

But I think that the book would have benefited from an explanation of how R fits into the wider ‘tool chain’ of public domain programs which can be used to produce a publication-ready paper. Moreover, some of the exemplars of R code may not work or may be illustrative of deprecated techniques but the principal author is maintaining an errata file on his own website. Nevertheless, I would recommend this book to students, academics and applied researchers. Although heavily weighted towards the interests of psychological researchers, it would not be too difficult to transfer the techniques to a different area of expertise. All in all, an invaluable resource.

Review originally published in Research Matters, December 2013

Hard-to-Survey Populations

Tourangeau R, Edwards B, Johnson T.P., Wolter K.M. & Bates, N (Eds.) Hard to Survey Populations

This is an excellent book that fills a gap in the methodological literature. With contributions from some of the most notable practitioners of survey methodology in the world, this collection is exceptionally comprehensive. The book contains discussions of how to survey groups as diverse as people with intellectual disabilities, the homeless, political extremists and stigmatised groups, as well as a fascinating chapter on the challenges of surveying linguistically diverse populations. One should not therefore assume that this is a dry statistical tome; there is much here for the student, applied researcher and clinician who need a jargon-free introduction to this topic.

There are also discussions of sampling methods for the more methodologically inclined, including explanations of location sampling, which has been used to sample the homeless, nomads and immigrants. Some of the explanations of sampling strategies may however be difficult for readers who are not comfortable with mathematics with Part IV on sampling strategies being particularly challenging in this regard.

Each chapter is, however, self-contained with useful references for the reader who wishes to investigate any topic in more depth. A chapter-by-chapter reading of the book isn’t therefore necessary. The book may profitably be read either as a comprehensive introduction to hard-to-survey populations or as a reference text for those who are thinking about surveying a particular group.

In short, an indispensable resource for any psychologist – irrespective of specialism or level of expertise – who wishes to collect robust data about the lives of people who aren’t always given a voice.

Review originally published in The Psychologist, March 2015

Social Physics: A New Science

Pentland, A. (2014) Social Physics: How Good Ideas Spread  – the Lessons from a New Science

Alex Pentland’s book is a hugely readable introduction to “social physics”, which the author defines “as a quantitative social science that describes reliable, mathematical connections between information and idea flow on the one hand and people’s behaviour on the other”. In contradistinction to what the author defines as conventional “individual-centric economic and policy thinking”, Pentland suggests that the primary drivers of cultural evolution in our wired world are “social learning” and “social pressure”.

Pentland entertainingly describes a range of studies which he and colleagues have conducted that are both interesting and counterintuitive. He shows, for example, how equal “conversational turn-taking” is the most important factor in predicting “group intelligence”. Other studies focus on trading and the determinants of political opinion. Indeed, there seems to be nothing which is outside of the purview of social physics.

But Pentland’s enthusiasm for his subject carries an overtone of hubris. For Pentland, constructs like “market”, “class” and “capital” should be replaced by the concepts he outlines in the book. Moreover, he gives a very partial interpretation of history since the Enlightenment, which is puzzling because he simultaneously extols the virtues of Adam Smith and John Locke while suggesting that conventional economic concepts are redundant.

In order to gain a more nuanced view of what drives cultural, social and economic evolution, my advice would be to imagine Pentland in a dialogue with economists, historians, sociologists and philosophers and then to form your own view of the truth of the claims made in this book.

Review originally published in Reviews. Significance, 12:6 45. doi: 10.1111/j.1740-9713.2015.00871.x

SSD for R and Single-Subject Data

Auerbach, C., Zeitlin, W. (2014) SSD for R: An R Package for Analyzing Single-Subject Data

This work is short but, in spite of its brevity, Charles Auerbach and Wendy Zeitlin’s book describes how to analyse single-subject data using their own package, SSD for R. They introduce its functions as well as providing advice on how to analyse baseline and intervention phase data.

I thought that their discussion of serial dependency was particularly well done, as was their emphasis on how to use SSD for R to visualise data. Other chapters provide introductions to statistical testing and to the analysis of group data.

Readers should note that the book does not deal with single-subject methodology in any depth, so additional resources will be needed in order to make best use of the package. Fortunately, the authors include useful references for those who need information on specific research designs.

R newbies may need to read an introductory R text as the book’s scope is understandably restricted to providing information about the package. But Auerbach and Zeitlin write well and the content does not demand much in the way of prior statistical knowledge or IT skills.

Statisticians may not need to avail themselves of this book, but practitioners who are working in applied disciplines such as social work, psychology and medicine will find it very appealing.

Review originally published in Reviews. Significance, 12:4 45. doi: 10.1111/j.1740-9713.2015.00846.x

Using R for Introductory Statistics

Versani, J (2013) Using R for Introductory Statistics (Second Edition)

This book has a laudable aim: to introduce R and topics from an introductory statistics curriculum to students “outside of a classroom environment”. Now in its second edition, the book introduces the reader to exploratory data analysis and manipulation, statistical inference and statistical models. Particular attention is given to thoroughly learning base R before extending R’s capabilities with packages.

Author John Verzani includes information on computationally intensive approaches and manages to explain these topics with interesting, topical and challenging examples. The text includes a plethora of exercises which encourage the reader to test their understanding of the material as well as a useful appendix on R programming and a valuable bibliography.

Although informative, I don’t think this text will be useful for readers without any previous exposure to either statistical computing or statistics. The text does begin simply enough, but my impression is that the reader will need to refer to additional resources. I’m therefore not convinced by claims that the book may be used without a teacher. Indeed, the fact that the solutions to exercises are only available to those who adopt the book as a course text suggests that the book is intended for use by university teachers rather than autodidacts.

In short, a stimulating read for the classroom-based student, but too challenging for a neophyte learner studying at home.

Review originally published in Reviews. Significance, 12:2 44{45. doi: 10.1111/j.1740-9713.2015.00818.x

Text Mining with ‘tm’

It is possible to identify top level categories in qualitative data analysis by using text mining methods. One can count the frequency of terms or words in a text or texts. Words which occur frequently may be top level classifications or themes.

Text mining involves the creation of a corpus or collection of texts for analysis, some initial work to preprocess the corpus so that punctuation, capitalisation and numbers are removed as well as common words which are, ipso facto, very frequent in any text. A document term matrix is then created where the documents in the corpus are represented by rows and the words by columns. Analysis could then include identification of frequent terms and a ‘frequency of frequencies’ i.e. how many words occur in a corpus at specific frequencies?

For further detail, check out Kailash Awati’s Gentle Introduction to Text Mining with R here and an RStudio resource here which describe how to text mine with R’s tm package. The RStudio link also includes additional links to books on text and data mining as well as material on ‘clustering’ methods.

Both tutorials assume that R is already installed. If this is not the case, go to The R Project for Statistical Computing here and follow the instructions for your system.

R binaries are available for Windows, Mac and Linux distributions.