equating plot


A brief bio and my main research interests, plus a list of current and past projects.

As an Assistant Professor of Educational Psychology at the University of Nebraska-Lincoln (UNL), I teach and conduct research on quantitative methods in the social sciences, primarily psychometric methods. I advise doctoral students in our methods program, supervise instruction of our undergraduate introductory statistics courses, and consult on state and federal projects in educational testing.

Prior to UNL, my work experiences included cleaning up popcorn at a movie theater, running low-voltage wiring in new homes, coordinating after-school activities for kids with physical and learning disabilities, residential and commercial roofing, making soup and sandwiches at a grocery store, researching relational aggression and victimization in preschoolers as an undergraduate at Brigham Young University, and interning in research and evaluation at an urban school district and in the psychometrics departments of a licensure company and educational testing company. I finished my PhD in 2012 in the Educational Psychology, Quantitative Methods program at the University of Minnesota, working with Dr Michael Rodriguez.

At UNL, I've worked on a variety of projects that have helped define my areas of interest, described next. Current and past research projects are listed after the following section.

Psychology, the study of mind and behavior, is unique among the sciences. While biologists are intervening in the evolutionary process via artificial selection at the genetic level, and astrophysicists are mapping out the origins of the universe, explaining heavenly bodies lightyears away, as psychologists we are just scratching the surface of human cognition, struggling to describe what happens inside our own heads.

The subject matter is especially elusive in psychology and education because it is unobservable. The variables that matter most to us, like knowledge, aptitude, attitudes, and beliefs, can’t be seen or touched. They can only be observed indirectly, as they manifest in our responses and behaviors. As a result, psychological research involves inferences within the data collection process itself. Before we can say how X influences Y, we must first ask, how well does X actually capture X? Is X what we think it is? Or is does it represent other underlying attributes or traits, like W and Z? Questions like these define the field of educational and psychological measurement.

My research interests are clustered in three main areas within the field of educational and psychological measurement: scaling, multilevel modeling, and assessment development.


Scaling, linking, and equating are statistical methods used to build and connect measurement scales. By measurement scales, I mean quantitative variables that describe individuals on some kind of continuum, for example, a personality trait or cognitive ability.

Right now, I'm focusing on methods for linking multiple test forms to a common scale. These are usually referred to as linking or equating methods. I have an R package (more here) that performs observed-score linking and equating, and I'm interested in finding linking methods that work well with small samples, unreliable measures, external anchor tests, or in other less-than-ideal testing situations (e.g., Albano, 2015).

Multilevel modeling

I use multilevel modeling in the context of item analysis, where items are the fundamental building blocks of educational and psychological assessments. The main goal here is to identify other variables that impact item-level performance, so as to inform and improve the assessment development process. Traditional measurement models assume that the biasing influence of extraneous variables is null or negligible. However, research shows that certain sources of bias can be problematic. Multilevel item-response models let us examine item-level bias.

Examples of variables that impact item performance are: time, causing what is called item parameter drift (Babcock, Albano, & Raymond, 2012); person grouping, e.g., special ed compared to general ed (Wyse & Albano, 2015), or women compared to men (Albano & Rodriguez, 2013), causing what is called differential item functioning; and other variables like item position, item type, response latency, motivation, test anxiety, and opportunity to learn (Albano, 2013).

Assessment development

My interest in the assessment development process itself has led to a few related subtopics. I'm working on ways to improve assessment literacy, the knowledge and skills required to utilize assessment effectively in the classroom, specifically by supporting teachers in the item-writing process. This was the main motivation for the Proola web app (proola.org), which facilitates collaborative assessment development.

Proola is currently being used to develop and share assessments within the context of higher education. In the summer of 2017, I worked with a group of 15 instructors to build openly licensed item banks for introductory college courses in macroeconomics, medical biology, and sociology (Miller & Albano, 2017). The banks are available in QTI format here. Analysis of the results is underway. I'm examining item quality and alignment to learning objectives using natural language processing tools.

In a recent book (Rodriguez & Albano, 2017; Routledge link; Amazon link) a colleague and I outline the assessment process, from development to implementation, in the higher ed classroom.


As I mentioned above, my work in assessment development, specifically with Proola, has led me to natural language processing (NLP) as a set of tools for evaluating and improving assessment content (NALPA, natural language processing in assessment). NLP can assist in identifying item content, structure, and style that deviate from established guidelines. NLP can also help quantify domain coverage and outcome alignment. I'm working mostly in R and Python, and hope to share results soon via github.


The Center for Science, Mathematics and Computer Education (CSMCE) supports research on teaching and learning in STEM. I'm helping with some of the many math education projects. Right now, we're using multilevel models to examine how teacher participation in a math program impacts student math performance over time. We presented findings in an AERA symposium (Kutaka et al., 2014) and recent article (Kutaka et al., 2017).


I've worked with the Michigan Department of Education (MDE) on a variety of equating and test development analyses, primarily in the area of alternate assessment (the MI-Access and MEAP-Access programs). A technical report came out in 2012 describing our findings (Wyse & Albano, 2012). Our first paper examines the feasibility of mixed-item computer adaptive tests for students with and without disabilities (Wyse & Albano, 2015).


The American Registry of Radiologic Technologists (ARRT) develops certification tests in the area of medical imaging. I've collaborated with them on a few different projects in the areas of equating (Babcock & Albano, 2012) and item analysis (Babcock, Albano, & Raymond, 2012), and we're continuing to explore optimal methods for equating with small samples. One paper in preparation examines composite equating functions that combine linear and nonlinear methods.


The Teacher Education and Development Study in Mathematics (TEDS-M) is an international study of primary and secondary teacher education programs, first conducted in 2008. I worked on instrument development and then performed the scaling for some of the non-cognitive measures, including the Opportunity to Learn and Beliefs scales. This work led to a paper on differential math performance by gender and opportunity to learn (Albano & Rodriguez, 2013). We presented some recent findings from the study in an AERA symposium (Albano et al., 2015; Kutaka et al., 2015). Currently, a related study of international math programs is in the early stages of development.


Previously, I worked in the Center for Response to Intervention in Early Childhood (CRTIEC). I was mainly responsible for item analysis and scale development for versions 1.0 and 2.0 of the Individual Growth and Development Indicators (Bradfield et al., 2013). We are currently in year two of an IES-funded project examining the development of early literacy measures for three-year-olds.


Over the past few years I have helped model and equate the oral reading fluency measures within the Formative Assessment System for Teachers (FAST), now part of FastBridge Learning. This work led to a paper on the challenges associated with equating general outcome measures of reading (Albano & Rodriguez, 2012). Two other papers, one in preparation, and another presented at NCME (Albano, 2014), address different issues that arise in the linking of oral reading fluency scales. Multilevel linear models that incorporate multiple anchor tests are showing some promise.