My R projects, plus some comments on programming and statistical software.
epmr, currently under development, is a package for educational and psychological methods in R. It's used throughout my introductory measurement book. The package supports a variety of basic statistical analyses used in measurement and psychometrics. The development version of epmr is available on github at https://github.com/talbano/epmr.
See the book for detailed demonstrations of the different functions. Examples cover:
A chapter on dimensionality is in the works. I'll also be adding validity analyses at some point.
Most of the functionality of the package is accessed via the different "study" functions:
equate is an R package for observed-score linking and equating of test scores. Here's the official description of the package, from the CRAN site:
The equate package contains methods for observed-score linking and equating under the single-group, equivalent-groups, and nonequivalent-groups with anchor test(s) designs. Equating types include identity, mean, linear, general linear, equipercentile, circle-arc, and composites of these. Equating methods include synthetic, nominal weights, Tucker, Levine observed score, Levine true score, Braun/Holland, frequency estimation, and chained equating. Plotting and summary methods, and methods for multivariate presmoothing and bootstrap error estimation are also provided.
In the last couple versions I've added some new features: some analytic standard errors, bootstrap error estimation, plotting capabilities, a general linear method, and equating with multiple anchor tests and covariates. In version 2.0-3 I also changed the way frequency tables are created and manipulated - they're now based on the table class rather than data.frame.
The development version of the package is available on the equate github repository.
For any equating run you can request empirical or parametric bootstrap standard error, bias, and RMSE. The empirical estimates come from bootstrap resampling of the raw score distributions. Parametric estimates come from smoothed score distributions. The example below is based on the documentation for the package.
This will load the package and prep the data, which is referred to as KBneat.
# Load equate package library(equate) # KBneat data neat.x <- freqtab(KBneat$x, scales = list(0:36, 0:12)) neat.y <- freqtab(KBneat$y, scales = list(0:36, 0:12)) # Smoothed population distributions neat.xp <- presmoothing(neat.x, "loglinear", degrees = 2) neat.yp <- presmoothing(neat.y, "loglinear", degrees = 2)
Then, a bit of code runs the equating, e.g., the Tucker linear method with bootstrap standard errors.
# Tucker mean equating neat.m.t <- equate(neat.x, neat.y, type = "mean", method = "tucker", boot = TRUE, xp = neat.xp, yp = neat.yp)
Finally, this will run an entire bootstrapping study, comparing identity, linear, equipercentile, and circle-arc equating with samples of 100 at each replication.
# Set seed and create the criterion set.seed(131031) crit <- equate(neat.xp, neat.yp, "e", "c")$conc$yx # Create equating arguments and run bootstrapping neat.args <- list(i = list(type = "i"), lt = list(type = "lin", method = "t"), ef = list(type = "equip", method = "f", smooth = "log", degrees = 2), ec = list(type = "equip", method = "c", smooth = "log", degrees = 2), cc = list(type = "circ", method = "c", chainmidp = "lin")) bootout <- bootstrap(x = neat.xp, y = neat.yp, xn = 100, yn = 100, reps = 100, crit = crit, args = neat.args)
Equating and bootstrapping objects both have corresponding plot methods for visualizing results. The first plot below compares the identity, linear, equipercentile, and circle-arc equating functions, and the second compares their bootstrap standard errors.
# Plotting equating output plot(equate(neat.x, neat.y, "identity"), equate(neat.x, neat.y, "linear", "tucker"), equate(neat.x, neat.y, "equip", "frequency", smooth = "log", degrees = 2), equate(neat.x, neat.y, "equip", "chain", smooth = "log", degrees = 2), equate(neat.x, neat.y, "circle", "chain"), addident = FALSE) # Plotting bootstrap standard errors plot(bootout, out = "se", addident = FALSE, legendplace = "top")
There are also plot methods for visualizing smoothing results. The plots below compare loglinear smoothing results for a univariate and bivariate distribution. The first shows smoothed curves for maximum polynomials of 2, 3, and 4; the second shows smoothed curves for the same univariate polynomials, and also the first bivariate polynomial.
# Univariate smoothing act.x <- as.freqtab(ACTmath[, 1:2]) plot(act.x, loglinear(act.x, stepup = TRUE)[, -1]) # Bivariate smoothing plot(neat.x, loglinear(neat.x, stepup = TRUE)[, -c(1, 5)])
Students in the QQPM program at UNL are expected to develop proficiency in at least one software package or programming language. Examples include MAXQDA or NVivo for qualitative applications, and R or SAS for quantitative ones.
I tend to advocate for R, so I'll provide some more background on it. I also encourage students to learn LaTeX, especially if they're planning to go into academia or a research position.
During my first semester of grad school, one of my instructors encouraged me to attend a presentation on R. Given that my background in computer programming equalled null, I mostly had no idea what the presenter was talking about. His fingers flew across the keyboard and he talked excitedly about "objects" and "methods" as lines of meaningless code piled up on the screen. I was pretty confused. But I knew something cool was happening and I wanted to learn more.
I learned R mostly on my own, by trial and error. This worked fine for me, but I do recommend taking an introductory course or workshop if possible, as it will save you some time. At the very least, find someone to mentor you through the process. There are also lots of free resources online, including some decent free books. For starters, check the contributed documentation on the CRAN site.
R is an interactive statistical environment. You interact with it primarily via syntax or commands. This is frustrating for the point-and-clicker for two reasons: first, your data aren't present in spreadsheet form, so you have to use your imagination and convince yourself that they still exist; second, the procedures you want aren't listed in any drop-down menu, so every analysis starts with a Google search. The Rstudio IDE may help smooth out your learning curve.
R has a number of strong points. First of all, it's free and open-source, with a large and dedicated user-base. Second, the interactivity of it makes certain procedures, like data management and manipulation, very quick. Third, the graphics capabilities are excellent.
My PhD students. Also, I think most quantitative methods students will be glad they did. It is slow-going at first, but once you're proficient you'll be able to restructure and analyze data in a fraction of the time it would take in other software.
With so many contributors, there's now an R package for just about anything, from multilevel modeling, to spatial analysis, to web analytics, to music informatics (I haven't tried these last two - I just noticed them online). Note that commercial software can be faster and more thorough, especially for advanced applications like IRT and SEM. In these situations, I recommend learning both the R way and the other way.
LaTeX is a markup language and typesetting system for creating manuscripts, especially ones laden with equations. See the Wikipedia entry and the LaTeX project site for more information. I write most of my papers with it, whether they're heavy or light on equations, because I like to focus on content over formatting, which LaTeX handles well, and because I manage my references with BibTeX and biblatex.
If you plan on typing a lot of math, LaTeX is worth a look. It integrates well with R, via Sweave and knitr, so you can include code, output, and plots within your documents. See the Rnw file used to create my equate documentation for an example. The Rnw file will open in any text editor.
I think this depends on two main questions. First, would you like to avoid most of the formatting hassle with your dissertation? Second, do you see yourself writing papers with formulas in them in the future? If you answered yes to either of these, LaTeX can help. Otherwise, it's probably not worth the trouble.
The main downsides to using LaTeX are: