A Sequence Data Model for Analyzing Temporal Patterns of Student Data
Keywords:Sequence data model, educational data mining, learning analytics, predictive modelling, knowledge discovery.
Data models built for analyzing student data often obfuscate temporal relationships for reasons of simplicity, or to aid in generalization. We present a model based on temporal relationships of heterogeneous data as the basis for building predictive models. We show how within- and between-semester temporal patterns can provide insight into the student experience. For example, in a within-semester model, the prediction of the final course grade can be based on weekly activities and submissions recorded in the LMS. In the between-semester model, the prediction of success or failure in a degree program can be based on sequence patterns of grades and activities across multiple semesters. The benefits of our sequence data model include temporal structure, segmentation, contextualization, and storytelling. To demonstrate these benefits, we have collected and analyzed 10 years of student data from the College of Computing at UNC Charlotte in a between-semester sequence model, and used data in an introductory course in computer science to build a within-semester sequence model. Our results for the two sequence models show that analytics based on the sequence data model can achieve higher predictive accuracy than non-temporal models with the same data.
Agrawal, R., Imieliński, T., & Swami, A. (1993, June). Mining association rules between sets of items in large databases. In Acm sigmod record (Vol. 22, No. 2, pp. 207-216). ACM.
Arnold, K. E. [Kimberly E]. (2010). Signals: applying academic analytics. Educause Quarterly, 33(1), n1.
Arnold, K. E. [Kimberly E.] & Pistilli, M. D. (2012). Course signals at Purdue: using learning analytics to increase student success, 267. doi:10.1145/2330601.2330666
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55, 77-84.
Campbell, J. P. [John P], Oblinger, D. G. et al. (2007). Academic analytics. EDUCAUSE review, 42(4), 40–57.
Campbell, J. P. [John Patrick]. (2007). Utilizing student data within the course management system to determine undergraduate student academic success: an exploratory study. ProQuest.
Campello, R. J., Moulavi, D., & Sander, J. (2013, April). Density-based clustering based on hierarchical density estimates. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 160-172). Springer, Berlin, Heidelberg.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Er, E. (2012). Identifying at-risk students using machine learning techniques: a case study with is 100. International Journal of Machine Learning and Computing, 2(4), 476.
Hulten, G., Spencer, L., & Domingos, P. (2001, August). Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 97-106). ACM.
Jayaprakash, S. M., Moody, E. W., Lauría, E. J., Regan, J. R., & Baron, J. D. (2014). Early alert of academically at-risk students: an open source analytics initiative. Journal of Learning Analytics, 1(1), 6–47.
Lakkaraju, H., Aguiar, E., Shan, C., Miller, D., Bhanpuri, N., Ghani, R., & Addison, K. L. (2015). A machine learning framework to identify students at risk of adverse academic outcomes. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1909–1918). KDD ’15. Sydney, NSW, Australia: ACM. doi:10.1145/2783258.2788620
Macfadyen, L. P. & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: a proof of concept. Computers & Education, 54 (2), 588–599. doi:10.1016/j.compedu.2009.09.008
Maher, M. L., & Mahzoon, M. J. (2015). Finding unexpected patterns in citizen science contributions using innovation analytics. Collective Intelligence Conference.
Mohamad, S. K. & Tasir, Z. (2013). Educational data mining: a review. Procedia - Social and Behavioral Sciences, 97, 320–324. The 9th International Conference on Cognitive Science. doi:http://dx.doi.org/10.1016/j.sbspro.2013.10.240
Padmanabhan, B. & Tuzhilin, A. (1999). Unexpectedness as a measure of interestingness in knowledge discovery. Decision Support Systems, 27 (3), 303–318. 267vc Times Cited:61 Cited References Count:17. doi:Doi10.1016/S0167-9236(99)00053-6
Peña-Ayala, A. (2014). Educational data mining: a survey and a data mining-based analysis of recent works. Expert Systems with Applications, 41 (4, Part 1), 1432–1462. doi:http://dx.doi.org/10.1016/j.eswa.2013.08.042
Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009, August). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1 (pp. 248-256). Association for Computational Linguistics.
Romero, C. [C.] & Ventura, S. (2007). Educational data mining: a survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146. doi:10.1016/j.eswa.2006.04.005
Romero, C. [C.] & Ventura, S. (2010). Educational data mining: a review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618. doi:10.1109/TSMCC.2010.2053532
Romero, C. [Cristóbal], Ventura, S., & García, E. (2008). Data mining in course management systems: moodle case study and tutorial. Computers & Education, 51(1), 368–384. doi:10.1016/j.compedu.2007.05.016
Tausczik, Y.R., & Pennebaker, J.W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29, 24-54.
Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine learning, 23(1), 69-101.
Wolff, A., Zdrahal, Z., Nikolov, A., & Pantucek, M. (2013). Improving retention: predicting at-risk students by analyzing clicking behaviour in a virtual learning environment. In Proceedings of the third international conference on learning analytics and knowledge (pp. 145–149). LAK ’13. Leuven, Belgium: ACM. doi:10.1145/2460296.2460324
How to Cite
Copyright (c) 2018 Journal of Learning Analytics
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons License, Attribution - NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0) license that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).