Using Keystroke Analytics to Improve Pass-Fail Classifiers


  • Kevin Casey Maynooth University



Learning analytics, keystroke analytics, data mining, virtual learning environments, student behaviour, early intervention


Learning analytics offers insights into student behaviour and the potential to detect poor performers before they fail exams. If the activity is primarily online (for example computer programming), a wealth of low-level data can be made available that allows unprecedented accuracy in predicting which students will pass or fail. In this paper, we present a classification system for early detection of poor performers based on student effort data, such as the complexity of the programs they write, and show how it can be improved by the use of low-level keystroke analytics.

Author Biography

Kevin Casey, Maynooth University

Kevin is a lecturer at Maynooth University. Before his current position he served as a lecturer at Dublin City University for 3 years and at Griffith College Dublin for 9 years. Having completed a BSc and MSc in Computer Science at University College, Dublin, he began lecturing at GCD in 1994. He left in 2002 to take up a position as researcher in the School of Computer Science and Statistics at Trinity College Dublin. He continues his involvement with his research group at DCU in the area of data analytics (including Learning Analytics) and IoT. Kevin holds a PhD in Computer Science from Trinity College, Dublin.


Ahadi, A., Lister, R., Haapala, H., & Vihavainen, A. (2015). Exploring machine learning methods to automatically identify students in need of assistance. Proceedings of the 11th Annual International Conference on International Computing Education Research (ICER ’15), 9–13 July 2015, Omaha, Nebraska, USA (pp. 121–130). New York: ACM.

Anderson, M. R., Antenucci, D., Bittorf, V., Burgess, M., Cafarella, M. J., Kumar, A., Niu, F., Park, Y., Ré, C., & Zhang, C. (2013). Brainwash: A data system for feature engineering. Proceedings of the 6th Biennial Conference on Innovative Data Systems Research (CIDR ’13) 6–9 January 2013, Asilomar, California, USA.

Arnold, K. E., & Pistilli, M. D. (2012). Course Signals at Purdue: Using learning analytics to increase student success. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (LAK ʼ12), 29 April–2 May 2012, Vancouver, BC, Canada (pp. 267–270). New York: ACM.

Baker, R. S., Gowda, S., & Corbett, A. (2010). Automatically detecting a student’s preparation for future learning: Help use is key. In M. Pechenizkiy et al. (Eds.), Proceedings of the 4th Annual Conference on Educational Data Mining (EDM2011), 6–8 July 2011, Eindhoven, Netherlands (pp. 179–188). International Educational Data Mining Society.

Baker, R. S., Gowda, S. M., & Corbett, A. T. (2011). Towards predicting future transfer of learning. International Conference on Artificial Intelligence in Education (pp. 23–30). Lecture Notes in Computer Science vol. 6738. Springer Berlin Heidelberg. doi:10.1007/978-3-642-21869-9_6

Baur, N. (2006). Microprocessor simulator for students. Available at:

Beal, C. R., Walles, R., Arroyo, I., & Woolf, B. P. (2007). On-line tutoring for math achievement testing: A controlled evaluation. Journal of Interactive Online Learning, 6(1), 43–55.

Beaubouef, T., & Mason, J. (2005). Why the high attrition rate for computer science students: Some thoughts and observations. ACM SIGCSE Bulletin, 37(2), 103–106.

Becker, B. A. (2015). An exploration of the effects of enhanced compiler error messages for computer programming novices (Master’s dissertation). Dublin Institute of Technology.

Becker, B. A., Glanville, G., Iwashima, R., McDonnell, C., Goslin, K., & Mooney, C. (2016). Effective compiler error message enhancement for novice programming students. Computer Science Education 26(2), 148–175.

Bergadano, F., Gunetti, D., & Picardi, C. (2003). Identity verification through dynamic keystroke analysis. Intelligent Data Analysis, 7(5), 469–496.

Berland, M., Martin, T., Benton, T., Petrick Smith, C., & Davis, D. (2013). Using learning analytics to understand the learning pathways of novice programmers. Journal of the Learning Sciences, 22(4), 564–599.

Biggers, M., Brauer, A., & Yilmaz, T. (2008). Student perceptions of computer science: A retention study comparing graduating seniors with CS leavers. ACM SIGCSE Bulletin, 40(1), 402–406.

Breiman, L., & Cutler, A. (2008). Random forests. ~breiman/RandomForests

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees (CART) Belmont, CA: Wadsworth International Group.

Breslow, L., Pritchard, D. E., DeBoer, J., Stump, G. S., Ho, A. D., & Seaton, D. T. (2013). Studying learning in the worldwide classroom: Research into edX’s first MOOC. Research & Practice in Assessment, 8, 13–25.

Brown, N. C. C., Kölling, M., McCall, D., & Utting, I. (2014). Blackbox: A large scale repository of novice programmers’ activity. Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE ’14), 5–8 March 2014, Atlanta, Georgia, USA (pp. 223–228). New York: ACM. http://dx.doi.or/10.1145/2538862.2538924

Casey, K., & Gibson, P. (2010). Mining Moodle to understand student behaviour. International Conference on Engaging Pedagogy (ICEP10), National University of Ireland Maynooth. Retrieved from

Caspersen, M. E., & Bennedsen, J. (2007). Instructional design of a programming course: A learning theoretic approach. Proceedings of the 3rd International Workshop on Computing Education Research (ICER ’07), 15–16 September 2007, Atlanta, Georgia, USA (pp. 111–122). New York: ACM.

Champaign, J., Colvin, K. F., Liu, A., Fredericks, C., Seaton, D., & Pritchard, D. E. (2014). Correlating skill and improvement in 2 MOOCs with a student’s time on tasks. Proceedings of the 1st ACM Conference on Learning @ Scale (L@S 2014), 4–5 March 2014, Atlanta, Georgia, USA (pp. 11–20). New York: ACM.

Davis, J. (2011). CompTIA: 400K IT jobs unfilled. Channel Insider, 2 August 2011.

Dowland, P. S., & Furnell, S. M. (2004). A long-term trial of keystroke profiling using digraph, trigraph and keyword latencies. IFIP International Information Security Conference (pp. 275–289). Springer US.

Edwards, S. (2013). Continuous Data-driven Learning Assessment. In Future Directions in Computing Education Summit White Papers (SC1186). Stanford, CA: Special Collections and University Archives, Stanford University Libraries.

Epp, C., Lippold, M., & Mandryk, R. L. (2011). Identifying emotional states using keystroke dynamics. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ʼ11), 7–12 May 2011, Vancouver, BC, Canada (pp. 715–724). New York: ACM.

Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Predicting state test scores better with intelligent tutoring systems: Developing metrics to measure assistance required. In M. Ikeda, K. Ashlay, T.-W. Chan (Eds.), Proceedings of the 8th International Conference on Intelligent Tutoring Systems (ITS 2006), 26–30 June 2006, Jhongli, Taiwan (pp. 31–40). Springer Berlin Heidelberg.

Ferreira, D. (2013). Instant HTML5 Presentations How-to. Birmingham, UK: Packt Publishing.

Garner, S. (2002). Reducing the cognitive load on novice programmers. In P. Barker & S. Rebelsky (Eds.), Proceedings of the 14th World Conference on Educational Multimedia, Hypermedia & Telecommunications (ED-MEDIA 2002), 24–29 June 2002, Denver, Colorado, USA (pp. 578–583). Association for the Advancement of Computing in Education (AACE).

Jbara, A., & Feitelson, D. G. (2014). Quantification of code regularity using preprocessing and compression.

Kelly, D., & Thorn, K. (2013, March). Should instructional designers care about the Tin Can API? eLearn Magazine.

Kramer, O. (2016). Machine learning in evolution strategies (Vol. 20). Springer Berlin Heidelberg.

Lang, C., McKay, J., & Lewis, S. (2007). Seven factors that influence ICT student achievement. ACM SIGCSE Bulletin, 39(3), 221–225).

Lister, R. (2008). After the gold rush: Toward sustainable scholarship in computing. Proceedings of the 10th Conference on Australasian Computing Education (ACE ’08), Vol. 78, 1 January 2008, Wollongong, NSW, Australia (pp. 3–17). Darlinghurst, Australia: Australian Computer Society.

Liu, D., & Xu, S. (2011). An Empirical Study of Programming Performance Based on Keystroke Characteristics. Computer and Information Science, 2011 (pp. 59–72). Springer Berlin Heidelberg.

Longi, K., Leinonen, J., Nygren, H., Salmi, J., Klami, A., & Vihavainen, A. (2015). Identification of programmers from typing patterns. Proceedings of the 15th Koli Calling International Conference on Computing Education Research (Koli Calling ’15), 19–22 November 2015, Koli, Finland (pp. 60–67), New York: ACM.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.

Ochoa, X., Chiluiza, K., Méndez, G., Luzardo, G., Guamán, B., & Castells, J. (2013). Expertise estimation based on simple multimodal features. Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI ’13), 9–13 December 2013, Sydney, Australia (pp. 583–590). New York: ACM.

O’Kelly, J., Bergin, S., Dunne, S., Gaughran, P., Ghent, J., & Mooney, A. (2004a). Initial findings on the impact of an alternative approach to problem based learning in computer science. Proceedings of the PBL International Conference, Cancun, Mexico, June, 2004.

O’Kelly, J., Mooney, A., Bergin, S., Gaughran, P., & Ghent, J. (2004b). An overview of the integration of problem based learning into an existing computer science programming module. Proceedings of the PBL International Conference, Cancun, Mexico, June, 2004.

Pardos, Z., Bergner, Y., Seaton, D., & Pritchard, D. (2013, July). Adapting Bayesian knowledge tracing to a massive open online course in edx. In S. K. DʼMello et al. (Eds.), Proceedings of the 6th International Conference on Educational Data Mining (EDM2013), 6–9 July 2013, Memphis, TN, USA (pp. 137–144). International Educational Data Mining Society/Springer.

Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence (AAAI’96), 4–8 August 1996, Portland, Oregon, USA (Vol. 1, pp. 725–730). Palo Alto, CA: AAAI Press.

Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. Encyclopedia of Database Systems, pp. 532–538. Springer.

Ragan-Kelley, M., Perez, F., Granger, B., Kluyver, T., Ivanov, P., Frederic, J., & Bussonier, M. (2014). The Jupyter/IPython architecture: A unified view of computational research, from interactive exploration to communication and publication. American Geophysical Union, Fall Meeting Abstracts, #H44D-07 (Vol. 1, p. 7).

Romero-Zaldivar, V. A., Pardo, A., Burgos, D., & Kloos, C. D. (2012). Monitoring student progress using virtual appliances: A case study. Computers & Education, 58(4), 1058–1067.

Scheffel, M., Niemann, K., Leony, D., Pardo, A., Schmitz, H. C., Wolpers, M., & Kloos, C. D. (2012). Key action extraction for learning analytics. Proceedings of the 7th European Conference on Technology Enhanced Learning (EC-TEL 2012), 18–21 September 2012, Saarbrücken, Germany (pp. 320–333). Springer Berlin Heidelberg.

Siemens, G., & Long, P. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE Review, 46(5), 30.

Slonim, J., Scully, S., & McAllister, M. (2008). Crossroads for Canadian CS enrollment. Communications of the ACM, 51(10), 66–70.

Teague, D., & Roe, P. (2007). Learning to program: Going pair-shaped. Innovation in Teaching and Learning in Information and Computer Sciences, 6(4), 4–22.

Thibodeau, P. (2011). Romney sees tech skills shortage: More H-1B visas needed. Computer World, 7 September 2011.

Thomas, R. C., Karahasanovic, A., & Kennedy, G. E. (2005). An investigation into keystroke latency metrics as an indicator of programming performance. Proceedings of the 7th Australasian Conference on Computing Education (ACE ’05), Vol. 42, January/February 2005, Newcastle, New South Wales, Australia (pp. 127–134). Darlinghurst, Australia: Australian Computer Society.

Yousoof, M., Sapiyan, M., & Kamaluddin, K. (2007). Measuring cognitive load: A solution to ease learning of programming. World Academy of Science, Engineering and Technology, 26, 216–219.

Yuan, K., Steedle, J., Shavelson, R., Alonzo, A., & Oppezzo, M. (2006). Working memory, fluid intelligence, and science learning. Educational Research Review, 1(2), 83–98.




How to Cite

Casey, K. (2017). Using Keystroke Analytics to Improve Pass-Fail Classifiers. Journal of Learning Analytics, 4(2), 189–211.