Reference publications

There are a considerable number of resources on educational assessment and evaluation and on conducting large-scale surveys in higher education. These resources provide a wealth of information on theoretical and operational aspects of survey work. Many are available from institutional libraries.

Note that none of these resources is part of AGS official methodology or considered required reading for survey managers. The following list is, instead, provided as further reading for those with a keen interest in methodology and analysis. It can also be downloaded as a PDF:
Sample list of resources on evaluation and survey work

American Educational Research Association, American Psychological Association and the National Council on Measurement in Education (AERA, APA and NCME) (2005). Standards for Educational and Psychological Testing. Washington: AERA.

Anastasi, A. (1976). Psychological Testing. New York: Macmillan.

Andrich, D. L. (1978). Application of a psychometric model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2, 581-594.

Andrich, D. L. (1988). Rasch Models for Measurement. Newbury Park: Sage Publications.

Andrich, D. L. (1993). A hyperbolic cosine latent trait model for unfolding dichotomous single-simultaneous responses. Applied Psychological Measurement, 17(3), 253-276.

Astin, A. W. (1979). Four Critical Years: Effects of college on beliefs, attitudes and knowledge. San Francisco: Jossey Bass.

Astin, A. W. (1985). Achieving Educational Excellence: A critical analysis of priorities and practices in higher education. San Francisco: Jossey Bass.

Astin, A. W. (1990). Assessment for Excellence: The philosophy and practice of assessment and evaluation in higher education. New York: Maxwell Macmillan International.

Astin, A. W. (1993a). An empirical typology of college students. Journal of College Student Development, 34(1), 36-46.

Astin, A. W. (1993b). What Matters in College: Four critical years revisited. San Fransisco: Jossey Bass.

Astin, A. W., & Lee, J. J. (2003). How risky are one-shot cross-sectional assessments of undergraduate students? Research in Higher Education, 44(6), 657-672.

Barnett, J. J. (1999). Likert response alternative direction: SA to SD or SD to SA: Does it make a difference? Washington: ERIC.

Beaton, A. E. (1994). Missing scores in survey research. In T. N. Postlethwaite (Ed.), The International Encyclopaedia of Education. Oxford: Elsevier Science.

Berdie, D. R., Anderson, J. F., & Niebuhr, M. A. (1986). Questionnaires: Design and use. London: The Scarecrow Press.

Chan, J. C. (1991). Response order effects in likert type scales. Educational and Psychological Measurement, 51(3), 532-540.

Cohen, J. (1988). Statistical Power Analysis for the Behavioural Sciences. Hillsdale: Lawrence Earlbaum Associates.

Cohen, R. J., Swerdlik, M. E., & Smith, D. K. (1992). Psychological Testing and Assessment: An Introduction to tests and measurement. California: Mayfield Publishing Company.

Converse, J. M. & Presser, S. (1989). Survey Questions: Handcrafting the standardised questionnaire. CA: Sage.

Costin, F., Greenough, W. T., & Menges, R. J. (1971). Student ratings of college teaching: Reliability, validity and usefulness. Review of Educational Research, 41(5), 511-535.

Couper, M. P. (2000). Web surveys: A review of issues and approaches. Public Opinion Quarterly, 64(4), 464-494.

Creswell, J. W. (1995). Research Design: Qualitative and quantitative approaches. Thousand Oaks: Sage Publications.

Crocker, L. & Algina, J. (1986). An Introduction to Classical and Modern Test Theory. New York: Harcourt Brace.

Cronbach, L. J. (1946). Response sets and test validity. Educational and Psychological Measurement, 6, 475-494.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334.

Cronbach, L. J. & Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64(3), 391-418.

Davis, T. M. & Murrell, P. H. (1993). Turning Teaching into Learning: The role of student responsibility in the collegiate experience. Washington: ERIC Clearinghouse on Higher Education.

Divgi, D. R. (1986). Does the Rasch model really work for multiple choice items? Not if you look closely. Journal of Educational Measurement, 23(4), 283-298.

Dubois, B. & Burns, J. A. (1975). An analysis of the meaning of the Question Mark Response Category in Attitude Scales. Educational and Psychological Measurement, 35, 869-884.

Engelhard, G. (1992). Historical views of invariance: Evidence from the measurement theories of Thorndike, Thurstone and Rash. Educational and Psychological Measurement, 52(2), 275-291.

Entwistle, N. J. (1987). A model of the teaching-learning process. In J. T. E. Richardson, M. W. Eysenck & D. W. Piper (Eds.), Student Learning: Research in education and cognitive psychology. London: Open University Press.

Ewell, P. T. & Jones, D. P. (1993). Actions matter: the case for indirect measures in assessing higher education’s progress on the national education goals. The Journal of General Education, 42(2), 123-148.

Ewell, P. T. & Jones, D. P. (1996). Indicators of “Good Practice” in Undergraduate Education: A handbook for development and implementation. Colorado: National Centre for Higher Education Management Systems.

Fan, X. & Thompson, B. (2001). Confidence intervals about score reliability coefficients, please: An EPM guidelines editorial. Educational and Psychological Measurement, 61(1), 517-531.

Fan, X., Wang, L., & Thompson, B. (1996). The effects of sample size, estimation methods, and model specification on SEM fit indices. Paper presented at the American Educational Research Association, New York.

Fink, A. (1998). How to Conduct Surveys: A step by step guide. California: Sage Publications.

Foreman, E. K. (1991). Survey Sampling Principles. New York: Marcel Dekker.

Goldstein, H. (1980). Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of Mathematic and Statistical Psychology, 33, 234-246.

Goldstein, H. (1999). Multilevel Statistical Models. London: Edward Arnold.

Goldstein, H. & Healy, M. J. R. (1995). The graphical presentation of a collection of means. Journal of the Royal Statistical Society, 158(1), 175-177.

Guthrie, B., & Johnson, T. (1997). Study of Nonresponse to the 1996 Graduate Destination Survey. Canberra: Australian Government Publishing Service.

Guttman, L. (1944). A basis for scaling quantitative data. American Sociological Review, 9, 139-150.

Hair, J. F., Anderson, R. E., & Tatham, R. l. (1995). Multivariate Data Analysis with Readings. New York: Prentice Hall.

Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R.L. Linn (Ed.), Educational Measurement. London: Collier Macmillan Publishers.

Hambleton, R. K. and Rovinelli, R. J. (1986). Assessing the dimensionality of a set of test items. Applied Psychological Measurement, 10, 287-302.

Hambleton, R. K., Swaminathan, H. & Jane-Rogers, H. (1991). Fundamentals of Item Response Theory. California: Sage.

Hanushek, E. A. (1979). Conceptual and empirical issues in the estimation of education production functions. The Journal of Human Resources, 14(3), 351-388.

Hattie, J.A (1985). Methodology review: assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139-164.

Hattie, J. A. (1983). The tendency to omit items: another deviant response characteristic. Educational and Psychological Measurement, 43, 1041-1045.

Hox, J. J. (1995). Applied Multilevel Analysis. Amsterdam: TT-Publikaties.

Jaeger, R. M. (1984). Sampling in Education and the Social Sciences. New York: Longman.

Joreskog, K. G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109-133.

Keeves, J. P. (Ed.), Educational Research Methodology and Measurement: An international handbook. Oxford: Pergamon.

Keeves, J. P. & Masters, G. N. (1999). (Eds.) Advances in Measurement in Educational Research and Assessment. New York: Pergamon.

Kish, L. (1965). Survey Sampling. New York: Wiley Classics Library.

Linke, R. D. (1991). Report of the Research Group on Performance Indicators in Higher Education. Canberra: DETYA.

Little, R. J. A. & Rubin, D. B. (1983). On jointly estimating parameters and missing data. American Statistician, 37, 218-220.

Little, R. J. A. & Rubin, D. B. (1987). Statistical analysis with missing data. New York: John Wiley & Sons.

Lord, F. M. & Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Boston: Addison-Wesley.

Marsh, H. W. (1982). Validity of student’s evaluations of college teaching: A multitrait multimethod analysis. Journal of Educational Psychology, 74, 264-279.

Marsh, H. W. (1987). Students evaluations of university teaching: Research findings, methodological issues and directions for future research. International Journal of Educational Research, 11(3), 253-388.

Marsh, H. W. (1990). Multitrait-multimethod analysis. In J. P. Keeves (Ed.), Educational Research, Methodology and Measurement: An international handbook (pp. 4000-4007). Oxford: Pergamon.

Marsh, H. W. (1994). Confirmatory factor analysis models of factorial invariance: A multifaceted approach. Structural Equation Modeling, 1(1), 5-34.

Marsh, H. W. & Grayson, D. (1994). Multitrait-multimethod analysis. In T. Husen & N. Postlethwaite (Eds.), The International Encyclopaedia of Education (pp. 4000-4007). London: Pergamon.

Marsh, H. W. & Hocevar, D. (1984). The factorial invariance of student evaluations of college teaching. American Educational Research Journal, 21(2), 341-366.

Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness of fit in confirmatory factory analysis: The effect of sample size. Psychological Bulletin, 103, 391-410.

Marsh, H.W. & Grayson, D. (1994). Longitudinal stability of latent means and individual differences: a unified approach. Structural Equation Modeling, 1(2), 317-359.

Marton, F. (1981). Phenomenography - describing conceptions of the world around us. Instructional Science, 10, 177-200.

Marton, F. (1994). Phenomenology. In T. Husen & T. N. Postlethwaite (Eds.), International Encyclopaedia of Education. Oxford: Pergamon.

Marton, F. & Saljo, R. (1976a). On qualitative differences in learning-I: Outcome as a function of the learner’s conception of the task. British Journal of Educational Psychology, 46, 4-11.

Marton, F. & Saljo, R. (1976b). On qualitative differences in learning-II: Outcome as a function of the learner’s conception of the task. British Journal of Educational Psychology, 46, 115-127.

Marton, F. & Saljo, R. (1997). Approaches to learning. In F. Marton, D. Hounsell & N. Entwistle (Eds.), The Experience of Learning: Implications for teaching and studying in higher education. Edinburgh: Scottish Academic Press.

Mason, J. (2002). Qualitative Researching. London: Sage.

Masters, G. N. (1982). A Rasch model for partial credit scorings. Psychometrika, 47(2), 149-174.

McDonald, R. P. (1980). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 33, 205-233.

McDonald, R. P. (1985). Factor Analysis and Related Methods. London: Lawrence Erlbaum and Associates.

McInnis, C., Griffin, P., James, R. H., & Coates, H. B. (2001). Development of the Course Experience Questionnaire. Canberra: Department of Employment, Training and Youth Affairs.

Messick, S. (1984). The psychology of educational measurement. Journal of Educaitonal Measurement, 21(3), 215-237.

Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational Measurement. New York: Macmillan Publishing Company.

National Survey of Student Engagement (NSSE). (2001). Improving the College Experience: National benchmarks of effective educational Practice. National survey of student engagement 2001. Bloomington: Indiana University.

National Survey of Student Engagement (NSSE). (2002). From Promise to Progress: How colleges and universities are using student engagement results to improve collegiate quality. 2002 Annual report. Bloomington: Indiana University.

National Survey of Student Engagement (NSSE). (2003). Converting Data into Action: Expanding the boundaries of institutional improvement: National Survey of Student Engagement 2003 Annual report. Bloomington: Indiana University.

National Survey of Student Engagement (NSSE). (2004). Promoting Student Success: Using student engagement data to improve educational practice. Bloomington: Indiana University.

Pace, C. R. (1979). Measuring Outcomes of College: Fifty years of findings and recommendations for the future. San Francisco: Jossey Bass.

Pace, C. R. (1988). Measuring the Quality of College Student Experiences: An account of the development and use of the college student experiences questionnaire. Los Angeles: Higher Education Research Institute, University of California.

Pace, C. R. (1990a). College Student Experiences Questionnaire: Norms for the third edition. Los Angeles: University of California, Center for the Study of Evaluation.

Pace, C. R. (1990b). The Undergraduates: A report of their activities and progress in college in the 1980’s. Los Angeles: University of California, Center for the Study of Evaluation.

Pace, C. R. (1995). From good practices to good products: Relating good practices in undergraduate education to student achievement. Paper presented at the Association for Institutional Research, Boston.

Pace, C. R., & Kuh, G. D. (1998). College Student Experiences Questionnaire. Bloomington: Indiana University.

Pascarella, E. T. (1985). College environmental influences on learning and cognitive development: A critical review and synthesis. In J. C. Smart (Ed.), Higher Education: Handbook of theory and research. New York: Agathon Press.

Pascarella, E. T. (1991). The impact of college on students: The nature of the evidence. The Review of Higher Education, 14(4), 453-466.

Pascarella, E. T. (2001). Identifying excellence in undergraduate education: Are we even close? Change, 33(3), 18-23.

Pascarella, E. T. & Terenzini, P. T. (1976). Information interaction with faculty and freshman ratings of academic and non-academic experiences of college. Journal of Educational Research, 70, 35-41.

Pascarella, E. T., & Terenzini, P. T. (1979). Student-faculty informal relationships contact and college persistence: A further investigation. Journal of Educational Research, 72, 214-218.

Pascarella, E. T. & Terenzini, P. T. (1991). How College Affects Students: Findings and insights from twenty years of research. San Francisco: Jossey Bass.

Ramsden, P. (1991). A performance indicator of teaching quality in higher education: the Course Experience Questionnaire. Studies in Higher Education, 16(2), 129-150.

Ramsden, P. (1992). Learning to Teach in Higher Education. London: Routledge.

Ross, K. N. (1988a). Sampling. In J. P. Keeves (Ed.), Educational Research, Methodology, and Measurement: An international handbook. Oxford: Pergamon Press.

Ross, K. N. (1988b). Sampling errors. In J. P. Keeves (Ed.), Educational Research Methodology, and Measurement: An international handbook. Oxford: Pergamon Press.

Ross, K. N. (1992). Sampling design for international studies of educational achievement, Prospects, 22(3), 305-316.

Rubin, D. B. (1976). Inference and missing data. Biometricka, 63(3), 581-592.

Rubin, D. B. (1987). Multiple Imputation for Non Response in Surveys. New York: Wiley.

Terenzini, P. T. (1999). Research and practice in undergraduate education: And never the twain shall meet? Higher Education, 38, 33-48.

Terenzini, P. T. & Pascarella, E. T. (1980). Student-faculty relationships and freshman year educational outcomes: A further investigation. Journal of College Student Personnel, 21, 521-528.

Terenzini, P. T., Pascarella, E. T., & Blimling, G. S. (1996). Students out-of-class experiences and their influence on learning and cognitive development: A literature review. Journal of College Student Development, 37(2), 149-162.

Texas State University. (2005). National Survey of Student Engagement at Texas State: NSSE Annotated Bibliography. San Marcos: Texas State University.

Thompson, B. (1994). Guidelines for authors. Educational and Psychological Measurement, 54(4), 837-847.

Thurstone, L. L. (1959). The Measurement of Values. Chicago: University of Chicago Press.

Thurstone, L. L. (1931). Measurement of social attitudes. Journal of Abnormal and Social Psychology, 26, 249-69.

Thurstone, L. L. (1959). Psychophysical analysis. In L.L. Thurstone (Ed.), The Measurement of Values. Chicago: University of Chicago Press.

Tinto, V. (1993). Leaving College: Rethinking the causes and cures of student attrition. Chicago: University of Chicago Press.

Tinto, V. (1997). Classrooms as communities: Exploring the educational character of student persistence. Journal of Higher Education, 68(6), 599-623.

Tinto, V. (1998). Colleges as communities: Taking research on student persistence seriously. The Review of Higher Education, 21(2), 167-177.

Vacha-Haase, T. (1998). Reliability generalisation: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20.

Valenta, A., Terriault, D., Dieter, M., & Mrtek, R. (2001). Identifying student attitudes and learning styles in distance education. Journal of Asynchronous Learning Networks, 5(2), 111-127.

Wright, B. D. & Masters, G. N. (1982). Rating Scale Analysis. Chicago: MESA Press.

Wright, B. D. & Stone, M. H. (1979). Best Test Design. Chicago: MESA Press.