Abedi, J. (2004). The No Child Left behind Act and English language learners: assessment and accountability issues. Educational Researcher, 33, 4–14.

AERA, APA, & NCME. (1999). Standards for educational and psychological testing. Washington DC: American Educational Research Association.

Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of general psychiatry, 4, 53–63.

Bennett, R. E. (2011). Formative assessment: a critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25.

Black, P. & Wiliam, D. (1998). Inside the black box: raising standards through classroom assessment. Phi Delta Kappan, 80, 139–148.

Bloom, B. S. & Krathwohl, D. R. (1956). Taxonomy of educational objectives: the classification of educational goals. handbook i: cognitive domain.

Bond, L. A. (1996). Norm- and criterion-referenced testing. Practical Assessment Research & Evaluation, 5(2).

Bradfield, T. A., Besner, A. C., Wackerle-Hollman, A. K., Albano, A. D., Rodriguez, M. C., & McConnell, S. R. (2014). Redefining individual growth and development indicators: oral language. Assessment for Effective Intervention, 39, 233–244.

Brennan, R. L. (2013). Commentary on “Validating the Interpretations and Uses of Test Scores”. Journal of Educational Measurement, 50, 74–83.

Briggs, D. C. (2009). Preparation for college admission exams. Arlington, VA: National Association for College Admission Counseling.

Carter, S. D. (2002). Matching training methods and factors of cognitive ability: a means to improve training outcomes. Human Resource Development Quarterly, 13, 71–88.

Cizek, G. J. (2010). An introduction to formative assessment. In H. L. Andrade & G. J. Cizek (Eds.), Handbook of formative assessment (pp. 3–17). New York, NY: Routledge.

College Board. (2012). The SAT report on college and career readiness: 2012. New York, NY: College Board.

Cronbach, L. J. & Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64, 391–418.

Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219–232.

Deno, S. L., Fuchs, L. S., Marston, D., & Shin, J. (2001). Using curriculum-based measurement to establish growth standards for students with learning disabilities. School Psychology Review, 30, 507–524.

Ebel, R. (1961). Must all tests be valid? American Psychologist, 16(640–647).

Ferketich, S. (1991). Focus on psychometrics: aspects of item analysis. Research in Nursing & Health, 14, 165–168.

Fuchs, L. S. & Fuchs, D. (1999). Monitoring student progress toward the development of reading competence: A review of three forms of classroom-based assessment. School Psychology Review, 28, 659–671.

Gardner, W. L. & Martinko, M. J. (1996). Using the Myers-Briggs Type Indicator to study managers: a literature review and research agenda. Journal of Management, 22, 45–83.

Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercise Science, 5, 13–34.

Haladyna, T. M. & Downing, S. M. (1989). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2, 37–50.

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15, 309–334.

Haladyna, T. M. & Rodriguez, M. C. (2013). Developing and validating test items. New York, NY: Routledge.

Hambleton, R. K. & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 38–47.

Harvey, R. J. & Hammer, A. L. (1999). Item response theory. The Counseling Psychologist, 27, 353–383.

Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995). Content validity in psychological assessment: a functional approach to concepts and methods. Psychological Assessment, 7, 238–247.

Hockenberry, M. J. & Wilson, D. (2012). Wong’s essentials of pediatric nursing. St Louis, MO: Mosby.

Hursh, D. (2005). The growth of high-stakes testing in the USA: accountability, markets, and the decline in educational equality. British Educational Research Journal, 31, 605–622.

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.

Kelley, T. L. (1927). Interpretation of educational measurements. Yonkers, NY: World Book Co.

Kline, P. (1986). A handbook of test construction: introduction to psychometric design. New York, NY: Methuen.

Likert, R. (1932). A technique for the measurement of attitudes. Archives of psychology, 22, 5–55.

Linn, R. L., Baker, E. L., & Betebenner, D. W. (2002). Accountability systems: implications of requirements of the No Child Left Behind Act of 2001. Educational Researcher, 31(3–16).

Lord, F. M. (1952). A theory of test scores. Psychometric Monographs. No. 7.

Mathews, S. & Herzog, H. A. (1997). Personality and attitudes toward the treatment of animals. Society and Animals, 5, 169–175.

Mehrens, W. A. (1992). Using performance assessment for accountability purposes. Educational Measurement: Issues and Practice, 11, 3–9.

Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012–1027.

Meyer, A. N. D. & Logan, J. M. (2013). Taking the testing effect beyond the college freshman: benefits for lifelong learning. Psychology and Aging, 28, 142–147.

Militello, M., Schweid, J., & Sireci, S. G. (2010). Formative assessment systems: evaluating the fit between school districts’ needs and assessment systems’ characteristics. Educational Assessment, Evaluation and Accountability, 22(1), 29–52.

Miller, C. & Stassun, K. (2014). A test that fails. Nature, 510, 303–304.

Myers, I. B., McCaulley, M. H., Quenk, N. L., & Hammer, A. L. (1998). Manual: a guide to the development and use of the Myers-Briggs Type Indicator. Palo Alto, CA: Consulting Psychologist Press.

Nelson, D. A., Robinson, C. C., Hart, C. H., Albano, A. D., & Marshall, S. J. (2010). Italian preschoolers’ peer-status linkages with sociability and subtypes of aggression and victimization. Social Development, 19, 698–720.

Nelson, H. (2013). Testing more, teaching less: what america’s obsession with student testing costs in money and lost instructional time. American Federation of Teachers.

Nunnally, J. C. & Bernstein, I. H. (1994). Psychometric theory. New York, NY: McGraw-Hill.

Pittenger, D. J. (2005). Cautionary comments regarding the Myers-Brigg Type Indicator. Consulting Psychology Journal: Practive and Research, 57, 210–221.

Pope, K. S., Butcher, J. N., & Seelen, J. (2006). The MMPI, MMPI-2, & MMPI-A in court: a practical guide for expert witnesses and attorneys (3rd). Washington, DC: American Psychological Association.

Popham, W. J. & Husek, T. R. (1969). Implications of criterion-referenced measurement. Journal of Educational Measurement, 6, 1–9.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago, IL: University of Chicago Press.

Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: a meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24, 3–13.

Roediger, H. L., Agarwal, P. K., McDaniel, M. A., & McDermott, K. B. (2011). Test-enhanced learning in the classroom: long-term improvements from quizzing. Journal of Experimental Psychology: Applied, 17, 382–395.

Santelices, M. V. & Wilson, M. (2010). Unfair treatment? The case of Freedle, the SAT, and the standardization approach to differential item functioning. Harvard Educational Review, 80, 106–134.

Spector, P. E. (1992). Summated rating scale construction: an introduction. SAGE Publications.

Stevens, S. S. (1946). On the theory of scales of measurment. Science, 103, 677–680.

Stiggins, R. J. (1987). The design and development of performance assessments. Educational Measurement: Issues and Practice, 6, 33–42.

Stiggins, R. J. (1991). Assessment literacy. Phi Delta Kappan, 72(534–539).

Torrance, E. P. (1981a). Empirical validation of criterion-referenced indicators of creative ability through a longitudinal study. Creative Child and Adult Quarterly, 6, 136–140.

Torrance, E. P. (1981b). Predicting the creativity of elementary school children. Gifted Child Quarterly, 25, 55–62.

US Department of Education. (2002). A new era: revitalizing special education for children and their families. Washington, DC: US Department of Education.

Webb, N. L. (2002). Depth-of-knowledge levels for four content areas. Language Arts.

Wiliam, D. & Black, P. (1996). Meanings and consequences: A basis for distinguishing formative and summative functions of assessment? British Educational Research Journal, 22, 537–548.