Using Keystroke Analytics to Improve Pass-Fail Classifiers
Keywords:Learning analytics, keystroke analytics, data mining, virtual learning environments, student behaviour, early intervention
Learning analytics offers insights into student behaviour and the potential to detect poor performers before they fail exams. If the activity is primarily online (for example computer programming), a wealth of low-level data can be made available that allows unprecedented accuracy in predicting which students will pass or fail. In this paper, we present a classification system for early detection of poor performers based on student effort data, such as the complexity of the programs they write, and show how it can be improved by the use of low-level keystroke analytics.
Ahadi, A., Lister, R., Haapala, H., & Vihavainen, A. (2015). Exploring machine learning methods to automatically identify students in need of assistance. Proceedings of the 11th Annual International Conference on International Computing Education Research (ICER ’15), 9–13 July 2015, Omaha, Nebraska, USA (pp. 121–130). New York: ACM. http://dx.doi.org/10.1145/2787622.2787717
Anderson, M. R., Antenucci, D., Bittorf, V., Burgess, M., Cafarella, M. J., Kumar, A., Niu, F., Park, Y., Ré, C., & Zhang, C. (2013). Brainwash: A data system for feature engineering. Proceedings of the 6th Biennial Conference on Innovative Data Systems Research (CIDR ’13) 6–9 January 2013, Asilomar, California, USA.
Arnold, K. E., & Pistilli, M. D. (2012). Course Signals at Purdue: Using learning analytics to increase student success. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (LAK ʼ12), 29 April–2 May 2012, Vancouver, BC, Canada (pp. 267–270). New York: ACM. http://dx.doi.org/10.1145/2330601.2330666
Baker, R. S., Gowda, S., & Corbett, A. (2010). Automatically detecting a student’s preparation for future learning: Help use is key. In M. Pechenizkiy et al. (Eds.), Proceedings of the 4th Annual Conference on Educational Data Mining (EDM2011), 6–8 July 2011, Eindhoven, Netherlands (pp. 179–188). International Educational Data Mining Society.
Baker, R. S., Gowda, S. M., & Corbett, A. T. (2011). Towards predicting future transfer of learning. International Conference on Artificial Intelligence in Education (pp. 23–30). Lecture Notes in Computer Science vol. 6738. Springer Berlin Heidelberg. doi:10.1007/978-3-642-21869-9_6
Baur, N. (2006). Microprocessor simulator for students. Available at: http://tinyurl.com/5pyhnk
Beal, C. R., Walles, R., Arroyo, I., & Woolf, B. P. (2007). On-line tutoring for math achievement testing: A controlled evaluation. Journal of Interactive Online Learning, 6(1), 43–55.
Beaubouef, T., & Mason, J. (2005). Why the high attrition rate for computer science students: Some thoughts and observations. ACM SIGCSE Bulletin, 37(2), 103–106. http://dx.doi.org/10.1145/1083431.1083474
Becker, B. A. (2015). An exploration of the effects of enhanced compiler error messages for computer programming novices (Master’s dissertation). Dublin Institute of Technology.
Becker, B. A., Glanville, G., Iwashima, R., McDonnell, C., Goslin, K., & Mooney, C. (2016). Effective compiler error message enhancement for novice programming students. Computer Science Education 26(2), 148–175. http://dx.doi.org/10.1080/08993408.2016.1225464
Bergadano, F., Gunetti, D., & Picardi, C. (2003). Identity verification through dynamic keystroke analysis. Intelligent Data Analysis, 7(5), 469–496.
Berland, M., Martin, T., Benton, T., Petrick Smith, C., & Davis, D. (2013). Using learning analytics to understand the learning pathways of novice programmers. Journal of the Learning Sciences, 22(4), 564–599. http://dx.doi.org/10.1080/10508406.2013.836655
Biggers, M., Brauer, A., & Yilmaz, T. (2008). Student perceptions of computer science: A retention study comparing graduating seniors with CS leavers. ACM SIGCSE Bulletin, 40(1), 402–406.
Breiman, L., & Cutler, A. (2008). Random forests. http://www.stat.berkeley.edu/ ~breiman/RandomForests
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees (CART) Belmont, CA: Wadsworth International Group.
Breslow, L., Pritchard, D. E., DeBoer, J., Stump, G. S., Ho, A. D., & Seaton, D. T. (2013). Studying learning in the worldwide classroom: Research into edX’s first MOOC. Research & Practice in Assessment, 8, 13–25. http://www.rpajournal.com/dev/wp-content/uploads/2013/05/SF2.pdf
Brown, N. C. C., Kölling, M., McCall, D., & Utting, I. (2014). Blackbox: A large scale repository of novice programmers’ activity. Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE ’14), 5–8 March 2014, Atlanta, Georgia, USA (pp. 223–228). New York: ACM. http://dx.doi.or/10.1145/2538862.2538924
Casey, K., & Gibson, P. (2010). Mining Moodle to understand student behaviour. International Conference on Engaging Pedagogy (ICEP10), National University of Ireland Maynooth. Retrieved from http://www-public.tem-tsp.eu/~gibson/Research/Publications/E-Copies/ICEP10.pdf
Caspersen, M. E., & Bennedsen, J. (2007). Instructional design of a programming course: A learning theoretic approach. Proceedings of the 3rd International Workshop on Computing Education Research (ICER ’07), 15–16 September 2007, Atlanta, Georgia, USA (pp. 111–122). New York: ACM. http://dx.doi.org/10.1145/1288580.1288595
Champaign, J., Colvin, K. F., Liu, A., Fredericks, C., Seaton, D., & Pritchard, D. E. (2014). Correlating skill and improvement in 2 MOOCs with a student’s time on tasks. Proceedings of the 1st ACM Conference on Learning @ Scale (L@S 2014), 4–5 March 2014, Atlanta, Georgia, USA (pp. 11–20). New York: ACM. http://dx.doi.org/10.1145/2556325.2566250
Davis, J. (2011). CompTIA: 400K IT jobs unfilled. Channel Insider, 2 August 2011. http://tinyurl.com/ca699dr
Dowland, P. S., & Furnell, S. M. (2004). A long-term trial of keystroke profiling using digraph, trigraph and keyword latencies. IFIP International Information Security Conference (pp. 275–289). Springer US. http://dx.doi.org/10.1007/1-4020-8143-X_18
Edwards, S. (2013). Continuous Data-driven Learning Assessment. In Future Directions in Computing Education Summit White Papers (SC1186). Stanford, CA: Special Collections and University Archives, Stanford University Libraries. http://tinyurl.com/jep5vgt
Epp, C., Lippold, M., & Mandryk, R. L. (2011). Identifying emotional states using keystroke dynamics. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ʼ11), 7–12 May 2011, Vancouver, BC, Canada (pp. 715–724). New York: ACM. http://dx.doi.org/10.1145/1978942.1979046
Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Predicting state test scores better with intelligent tutoring systems: Developing metrics to measure assistance required. In M. Ikeda, K. Ashlay, T.-W. Chan (Eds.), Proceedings of the 8th International Conference on Intelligent Tutoring Systems (ITS 2006), 26–30 June 2006, Jhongli, Taiwan (pp. 31–40). Springer Berlin Heidelberg.
Ferreira, D. (2013). Instant HTML5 Presentations How-to. Birmingham, UK: Packt Publishing.
Garner, S. (2002). Reducing the cognitive load on novice programmers. In P. Barker & S. Rebelsky (Eds.), Proceedings of the 14th World Conference on Educational Multimedia, Hypermedia & Telecommunications (ED-MEDIA 2002), 24–29 June 2002, Denver, Colorado, USA (pp. 578–583). Association for the Advancement of Computing in Education (AACE).
Jbara, A., & Feitelson, D. G. (2014). Quantification of code regularity using preprocessing and compression. http://www.cs.huji.ac.il/~feit/papers/RegMet14.pdf
Kelly, D., & Thorn, K. (2013, March). Should instructional designers care about the Tin Can API? eLearn Magazine. http://elearnmag.acm.org/archive.cfm?aid=2446579.
Kramer, O. (2016). Machine learning in evolution strategies (Vol. 20). Springer Berlin Heidelberg.
Lang, C., McKay, J., & Lewis, S. (2007). Seven factors that influence ICT student achievement. ACM SIGCSE Bulletin, 39(3), 221–225). http://dx.doi.org/10.1145/1268784.1268849
Lister, R. (2008). After the gold rush: Toward sustainable scholarship in computing. Proceedings of the 10th Conference on Australasian Computing Education (ACE ’08), Vol. 78, 1 January 2008, Wollongong, NSW, Australia (pp. 3–17). Darlinghurst, Australia: Australian Computer Society.
Liu, D., & Xu, S. (2011). An Empirical Study of Programming Performance Based on Keystroke Characteristics. Computer and Information Science, 2011 (pp. 59–72). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-21378-6_5
Longi, K., Leinonen, J., Nygren, H., Salmi, J., Klami, A., & Vihavainen, A. (2015). Identification of programmers from typing patterns. Proceedings of the 15th Koli Calling International Conference on Computing Education Research (Koli Calling ’15), 19–22 November 2015, Koli, Finland (pp. 60–67), New York: ACM. http://dx.doi.org/10.1145/2828959.2828960
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.
Ochoa, X., Chiluiza, K., Méndez, G., Luzardo, G., Guamán, B., & Castells, J. (2013). Expertise estimation based on simple multimodal features. Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI ’13), 9–13 December 2013, Sydney, Australia (pp. 583–590). New York: ACM. http://dx.doi.org/10.1145/2522848.2533789
O’Kelly, J., Bergin, S., Dunne, S., Gaughran, P., Ghent, J., & Mooney, A. (2004a). Initial findings on the impact of an alternative approach to problem based learning in computer science. Proceedings of the PBL International Conference, Cancun, Mexico, June, 2004.
O’Kelly, J., Mooney, A., Bergin, S., Gaughran, P., & Ghent, J. (2004b). An overview of the integration of problem based learning into an existing computer science programming module. Proceedings of the PBL International Conference, Cancun, Mexico, June, 2004.
Pardos, Z., Bergner, Y., Seaton, D., & Pritchard, D. (2013, July). Adapting Bayesian knowledge tracing to a massive open online course in edx. In S. K. DʼMello et al. (Eds.), Proceedings of the 6th International Conference on Educational Data Mining (EDM2013), 6–9 July 2013, Memphis, TN, USA (pp. 137–144). International Educational Data Mining Society/Springer.
Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence (AAAI’96), 4–8 August 1996, Portland, Oregon, USA (Vol. 1, pp. 725–730). Palo Alto, CA: AAAI Press. http://dx.doi.org/10.1243/095440505X32274
Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. Encyclopedia of Database Systems, pp. 532–538. Springer.
Ragan-Kelley, M., Perez, F., Granger, B., Kluyver, T., Ivanov, P., Frederic, J., & Bussonier, M. (2014). The Jupyter/IPython architecture: A unified view of computational research, from interactive exploration to communication and publication. American Geophysical Union, Fall Meeting Abstracts, #H44D-07 (Vol. 1, p. 7).
Romero-Zaldivar, V. A., Pardo, A., Burgos, D., & Kloos, C. D. (2012). Monitoring student progress using virtual appliances: A case study. Computers & Education, 58(4), 1058–1067. https://dx.doi.org/10.1016/j.compedu.2011.12.003
Scheffel, M., Niemann, K., Leony, D., Pardo, A., Schmitz, H. C., Wolpers, M., & Kloos, C. D. (2012). Key action extraction for learning analytics. Proceedings of the 7th European Conference on Technology Enhanced Learning (EC-TEL 2012), 18–21 September 2012, Saarbrücken, Germany (pp. 320–333). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-33263-0_25
Siemens, G., & Long, P. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE Review, 46(5), 30.
Slonim, J., Scully, S., & McAllister, M. (2008). Crossroads for Canadian CS enrollment. Communications of the ACM, 51(10), 66–70. http://dx.doi.org/10.1145/1400181.1400199
Teague, D., & Roe, P. (2007). Learning to program: Going pair-shaped. Innovation in Teaching and Learning in Information and Computer Sciences, 6(4), 4–22. http://dx.doi.org/10.11120/ital.2007.06040004
Thibodeau, P. (2011). Romney sees tech skills shortage: More H-1B visas needed. Computer World, 7 September 2011. http://tinyurl.com/76l4qxo
Thomas, R. C., Karahasanovic, A., & Kennedy, G. E. (2005). An investigation into keystroke latency metrics as an indicator of programming performance. Proceedings of the 7th Australasian Conference on Computing Education (ACE ’05), Vol. 42, January/February 2005, Newcastle, New South Wales, Australia (pp. 127–134). Darlinghurst, Australia: Australian Computer Society.
Yousoof, M., Sapiyan, M., & Kamaluddin, K. (2007). Measuring cognitive load: A solution to ease learning of programming. World Academy of Science, Engineering and Technology, 26, 216–219.
Yuan, K., Steedle, J., Shavelson, R., Alonzo, A., & Oppezzo, M. (2006). Working memory, fluid intelligence, and science learning. Educational Research Review, 1(2), 83–98. https://dx.doi.org/10.1016/j.edurev.2006.08.005
How to Cite
Copyright (c) 2017 Journal of Learning Analytics
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons License, Attribution - NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0) license that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).