Modelability as a Strategy for Improving the Generalizability and Scalability of Predictive Models

Authors

DOI:

https://doi.org/10.18608/jla.2026.9099

Keywords:

modelability, generalizability, scalability, cross-institutional research, predictive learning analytics, research paper

Abstract

Learning analytics has the potential to enhance education through data-informed decision-making, but persistent challenges around generalizability and scalability continue to limit its real-world impact. In this paper, we introduce the concept of a modelable world: a learning ecosystem purposefully designed to support the development of predictive models that generalize across diverse contexts. We outline three core design principles of modelability: (1) valid and interpretable measurements, (2) scalable and stable implementation, and (3) a collaborative research–practice–technology ecosystem. We then illustrate how these principles can be operationalized in the real world through a case study of CourseKata, a platform offering a fully instrumented online textbook adopted across a wide range of institutions and disciplines. Using CourseKata data, we developed early prediction models of students’ final course grades using behavioral measures and tested the model generalizability across institutions (something rarely done in the modeling literature). Results show that a system designed with modelability in mind can produce predictive models that generalize effectively across diverse educational contexts.

References

Adnan, M., Habib, A., Ashraf, J., Mussadiq, S., Raza, A. A., Abid, M., ... & Khan, S. U. (2021). Predicting at-risk students at different percentages of course length for early intervention using machine learning models. IEEE Access, 9, 7519–7539. https://doi.org/10.1109/ACCESS.2021.3049446

Alomar, K., Aysel, H. I., & Cai, X. (2023). Data augmentation in classification and segmentation: A survey and new strategies. Journal of Imaging, 9(2), 46. https://doi.org/10.3390/jimaging9020046

Asafuddoula, M., Verma, B., & Zhang, M. (2017). A divide-and-conquer-based ensemble classifier learning by means of many-objective optimization. IEEE Transactions on Evolutionary Computation, 22(5), 762–777. https://doi.org/10.1109/TEVC.2017.2782826

Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305. https://www.jmlr.org/papers/v13/bergstra12a.html

Blake, A. B., & Stigler, J. W. (2021). Track student engagement on webpages. https://uclatall.github.io/engauge

Brooks, C., Thompson, C., & Teasley, S. (2015). A time series interaction analysis method for building predictive models of learners using log data. In LAK'15: Proceedings of the Fifth International Conference on Learning Analytics and Knowledge (pp. 126–135). Association for Computing Machinery. https://doi.org/10.1145/2723576.2723581

Brooks, C., & Thompson, C. (2017). Predictive Modelling in Teaching and Learning. In Lang, C., Siemens, G., Wise, A. F., and Gaevic, D., editors, The Handbook of Learning Analytics, pages 61–68. Society for Learning Analytics Research (SoLAR), Alberta. https://doi.org/10.18608/hla17.005

Bulut, O., Gorgun, G., Yildirim‐Erbasli, S. N., Wongvorachan, T., Daniels, L. M., Gao, Y., Lai, K. W., & Shin, J. (2023). Standing on the shoulders of giants: Online formative assessments as the foundation for predictive learning analytics models. British Journal of Educational Technology, 54(1), 19–39. https://doi.org/10.1111/bjet.13276

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Chen, W., Brinton, C. G., Cao, D., Mason-Singh, A., Lu, C., & Chiang, M. (2018). Early detection prediction of learning outcomes in online short-courses via learning behaviors. IEEE Transactions on Learning Technologies, 12(1), 44–58. https://doi.org/10.1109/TLT.2018.2793193

Chen, F., & Cui, Y. (2020). Utilizing student time series behaviour in learning management systems for early prediction of course performance. Journal of Learning Analytics, 7(2), 1–17. https://doi.org/10.18608/JLA.2020.72.1

Coburn, C. E., & Penuel, W. R. (2016). Research–practice partnerships in education: Outcomes, dynamics, and open questions. Educational Researcher, 45(1), 48–54. https://doi.org/10.3102/0013189X16631750

Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2016). Predicting student performance from LMS data: A comparison of 17 blended courses using Moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29. https://doi.org/10.1109/TLT.2016.2616312

Conole, G., Gašević, D., Long, P., & Siemens, G., (2011). Front matter (Cover, Message from the chairs, Committees, Sponsors, TOC). In LAK'11: Proceedings of the 1st International Conference on Learning Analytics and Knowledge. https://doi.org/10.1145/2090116

Dawson, S., Joksimovic, S., Poquet, O., & Siemens, G. (2019, March). Increasing the impact of learning analytics. In LAK'19: Proceedings of the First International Conference on Learning Analytics and Knowledge. (pp. 446–455). Association for Computing Machinery. https://doi.org/10.1145/3303772.3303784

Deho, O. B., Joksimovic, S., Li, J., Zhan, C., Liu, J., & Liu, L. (2022). Should learning analytics models include sensitive attributes? Explaining the why. IEEE Transactions on Learning Technologies, 16(4), 560–572. https://doi.org/10.1109/TLT.2022.3226474

DiNapoli, J., & Miller, E. K. (2022). Recognizing, supporting, and improving student perseverance in mathematical problem-solving: The role of conceptual thinking scaffolds. The Journal of Mathematical Behavior, 66, 100965. https://doi.org/10.1016/j.jmathb.2022.100965

Fan, Y., van der Graaf, J., Lim, L. et al. Towards investigating the validity of measurement of self-regulated learning based on trace data. Metacognition Learning 17, 949–987 (2022). https://doi.org/10.1007/s11409-022-09291-1

Farley, I. A., & Burbules, N. C. (2022). Online education viewed through an equity lens: Promoting engagement and success for all learners. Review of Education, 10(3), e3367. https://doi.org/10.1002/rev3.3367

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010

Figueroa-Cañas, J., & Sancho-Vinuesa, T. (2020). Early prediction of dropout and final exam performance in an online statistics course. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, 15(2), 86–94. https://doi.org/10.1109/RITA.2020.2987727

Fisher, R. A. (1938). Presidential Address. Sankhyā: The Indian Journal of Statistics (1933-1960), 4(1), 14–17. http://www.jstor.org/stable/40383882

FIRAT, M. (2016). Determining the Effects of LMS Learning Behaviors on Academic Achievement in a Learning Analytic Perspective. Journal of Information Technology Education Research, 15, 75–87. https://doi.org/10.28945/3405

Fries, L., Son, J. Y., Givvin, K. B., & Stigler, J. W. (2021). Practicing connections: A framework to guide instructional design for developing understanding in complex domains. Educational Psychology Review, 33(2), 739–762. https://doi.org/10.1007/s10648-020-09561-x

Gašević, D., Dawson, S., Rogers, T., & Gasevic, D. (2016). Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. The Internet and Higher Education, 28, 68–84. https://doi.org/10.1016/j.iheduc.2015.10.002

Ge, L., Gao, J., Ngo, H., Li, K., & Zhang, A. (2014). On handling negative transfer and imbalanced distributions in multiple source transfer learning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(4), 254–271. https://doi.org/10.1002/sam.11217

Gitinabard, N., Xu, Y., Heckman, S., Barnes, T., & Lynch, C. F. (2019). How widely can prediction models be generalized? Performance prediction in blended courses. IEEE Transactions on Learning Technologies, 12(2), 184–197. https://doi.org/10.1109/TLT.2019.2911832

Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2), 215-223. https://doi.org/10.1080/00401706.1979.10489751

Gray, C. C., & Perkins, D. (2019). Utilizing early engagement and machine learning to predict student outcomes. Computers & Education, 131, 22–32. https://doi.org/10.1016/j.compedu.2018.12.006

He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239

Hernández-García, Á., Cuenca-Enrique, C., Del-Río-Carazo, L., & Iglesias-Pradas, S. (2024). Exploring the relationship between LMS interactions and academic performance: A Learning Cycle approach. Computers in Human Behavior, 155(2024). 108183. https://doi.org/10.1016/j.chb.2024.108183

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Hossin, M., & Sulaiman, M. N. (2015). A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1. https://doi.org/10.5121/ijdkp.2015.5201

Hu, Y. H. (2022). Effects of the COVID-19 pandemic on the online learning behaviors of university students in Taiwan. Education and Information Technologies, 27(1), 469–491. https://doi.org/10.1007/s10639-021-10677-y

Huang, A. Y. Q., Lu, O. H. T., Huang, J. C. H., Yin, C. J., & Yang, S. J. H. (2019). Predicting students' academic performance by using educational big data and learning analytics: evaluation of classification methods and learning logs. Interactive Learning Environments, 28(2), 206–230. https://doi.org/10.1080/10494820.2019.1636086

Ifenthaler, D. (2017). Are higher education institutions prepared for learning analytics? TechTrends, 61(4), 366–371. https://doi.org/10.1007/s11528-016-0154-0

Junco, R., & Clem, C. (2015). Predicting course outcomes with digital textbook usage data. The Internet and Higher Education, 27, 54–63. https://doi.org/10.1016/j.iheduc.2015.06.001

Kim, D., Park, Y., Yoon, M., & Jo, I. H. (2016). Toward evidence-based learning analytics: Using proxy variables to improve asynchronous online discussion environments. The Internet and Higher Education, 30, 30–43. https://doi.org/10.1016/j.iheduc.2016.03.002

Kim, D., Yoon, M., Jo, I.-H., & Branch, R. M. (2018). Learning analytics to support self-regulated learning in asynchronous online courses: A case study at a women's university in South Korea. Computers & Education, 127, 233–251. https://doi.org/10.1016/j.compedu.2018.08.023

Kohavi, R. (1995). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, 1137–1143. Presented at the Montreal, Quebec, Canada. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Kovanović, V., Gašević, D., Dawson, S., Joksimović, S., Baker, R. S., & Hatala, M. (2015). Penetrating the black box of time-on-task estimation. In LAK '15: Proceedings of the Fifth International Conference on Learning Analytics and Knowledge. (pp.184–193.) Association for Computing Machinery. https://doi.org/10.1145/2723576.2723623

Langley, G. J., Moen, R. D., Nolan, K. M., Nolan, T. W., Norman, C. L., & Provost, L. P. (2009). The improvement guide: a practical approach to enhancing organizational performance. John Wiley & Sons.

Lewis, C. (2015). What is improvement science? Do we need it in education? Educational Researcher, 44(1), 54–61. https://doi.org/10.3102/0013189X15570388

Lidwell, W., Holden, K., & Butler, J. (2010). Universal principles of design, revised and updated: 125 ways to enhance usability, influence perception, increase appeal, make better design decisions, and teach through design. Rockport Pub.

Lipton, Z. C. (2018). The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3), 31–57.

Liu, Z., He, J., Xue, Y., Huang, Z., Li, M., & Du, Z. (2015). Modeling the learning behaviors of massive open online courses. 2015 IEEE International Conference on Big Data, 2883–2885. https://doi.org/10.1109/BigData.2015.7364110

Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 3(1), 91–99. https://doi.org/10.1016/j.gltp.2022.04.020

Maldonado-Mahauad, J., Pérez-Sanagustín, M., Kizilcec, R. F., Morales, N., & Munoz-Gama, J. (2018). Mining theory-based patterns from Big data: Identifying self-regulated learning strategies in Massive Open Online Courses. Computers in Human Behavior, 80, 179–196. https://doi.org/10.1016/j.chb.2017.11.011

Marbouti, Farshid, Heidi A. Diefes-Dux, and Krishna Madhavan. "Models for early prediction of at-risk students in a course using standards-based grading." Computers & Education 103, 1–15. https://doi.org/10.1016/j.compedu.2016.09.005

Mathrani, A., Susnjak, T., Ramaswami, G., & Barczak, A. (2021). Perspectives on the challenges of generalizability, transparency and ethics in predictive learning analytics. Computers and Education Open, 2, 100060. https://doi.org/10.1016/j.caeo.2021.100060

Mega, C., Ronconi, L., & De Beni, R. (2014). What makes a good student? How emotions, self-regulated learning, and motivation contribute to academic achievement. Journal of Educational Psychology, 106(1), 121–131. https://doi.org/10.1037/a0033546

Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18, 5–11. https://doi.org/10.3102/0013189X018002005

Mohammed, R., Rawashdeh, J., & Abdullah, M. (2020). Machine learning with oversampling and undersampling techniques: overview study and experimental results. In International Conference on Information and Communication Systems (ICICS) (pp. 243–248). IEEE.

Moreno-Marcos, P. M., Alario-Hoyos, C., Muñoz-Merino, P. J., & Kloos, C. D. (2018). Prediction in MOOCs: A review and future research directions. IEEE transactions on Learning Technologies, 12(3), 384–401. https://doi.org/10.1109/TLT.2018.2856808

Munir, H., Moayyed, M., & Petersen, K. (2014). Considering rigor and relevance when evaluating test driven development: A systematic review. Information and Software Technology, 56(4), 375–394. https://doi.org/10.1016/j.infsof.2014.01.002

Mwalumbwe, I., & Mtebe, J. S. (2017). Using learning analytics to predict students’ performance in Moodle learning management system: A case of Mbeya University of Science and Technology. The Electronic Journal of Information Systems in Developing Countries, 79(1), 1–13. https://doi.org/10.1002/j.1681-4835.2017.tb00577.x

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://doi.org/10.48550/arXiv.1201.0490

Pelánek, R. (2020, March). Learning analytics challenges: trade-offs, methodology, scalability. In LAK'20: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge. (pp. 554–558). Association for Computing Machinery. https://doi.org/10.1145/3375462.33754

Provost, F., & Domingos, P. (2000). Well-trained PETs: Improving probability estimation trees. Technical Report IS-00-04, Stern School of Business, New York University.

Samuelsen, J., Chen, W. & Wasson, B. (2019). Integrating multiple data sources for learning analytics—review of literature. Research and Practice in Technology Enhanced Learning, 13(11). https://doi.org/10.1186/s41039-019-0105-4

Sghir, N., Adadi, A., & Lahmer, M. (2023). Recent advances in Predictive Learning Analytics: A decade systematic review (2012–2022). Education and Information Technologies, 28(7), 8299–8333. https://doi.org/10.1007/s10639-022-11536-0

Singh, D., & Singh, B. (2020). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524

Sharma, R., Shen, H., & Goodwin, R. (2016). Voluntary participation in discussion forums as an engagement indicator: an empirical study of teaching first-year programming. In OzCHI '16: Proceedings of the 28th Australian Conference on Computer-Human Interaction. (pp. 489–493). Association for Computing Machinery. https://doi.org/10.1145/3010915.3010967

Shin, D., & Park, Y. J. (2019). Role of fairness, accountability, and transparency in algorithmic affordance. Computers in Human Behavior, 98, 277–284. https://doi.org/10.1016/j.chb.2019.04.019

Son, J. Y., & Stigler, J. W. (2017-2024). Data Science and Statistics: A Modeling Approach. https://coursekata.org/preview/default/program

Stigler, J. W., Son, J. Y., Givvin, K. B., Blake, A. B., Fries, L., Shaw, S. T., & Tucker, M. C. (2020). The Better Book Approach for Education Research and Development. Teachers College Record, 122(9), 1–32. https://doi.org/10.1177/016146812012200913

Storkey, A. (2009). When training and test sets are different: characterizing learning transfer. In Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D., Dataset Shift in Machine Learning, (pp. 2–28), MIT Press. https://doi.org/10.7551/mitpress/9780262170055.003.0001

Suleman, S., Antu, S. W., & Malanua, S. (2022). The Influence of Assignment Methods and Learning Behavior on Student Learning Outcomes. Journal La Edusci, 3(2), 28–36. https://doi.org/10.37899/journallaedusci.v3i2.607

Sutter, C. C., Totonchi, D. A., DeCoster, J., Barron, K. E., & Hulleman, C. S. (2024). How does expectancy-value-cost motivation vary during a semester? An intensive longitudinal study to explore individual and situational sources of variation in statistics motivation. Learning and Individual Differences, 113, 102484. https://doi.org/10.1016/j.lindif.2024.102484

Tate, T., & Warschauer, M. (2022). Equity in online learning. Educational Psychologist, 57(3), 192–206. https://doi.org/10.1080/00461520.2022.2062597

Trautwein, U., Niggli, A., Schnyder, I., & Lüdtke, O. (2009). Between-teacher differences in homework assignments and the development of students' homework effort, homework emotions, and achievement. Journal of Educational Psychology, 101(1), 176–189. https://doi.org/10.1037/0022-0663.101.1.176

Tricot, A., & Sweller, J. (2014). Domain-specific knowledge and why teaching generic skills does not work. Educational Psychology Review, 26(2), 265–283. https://doi.org/10.1007/s10648-013-9243-1

Tsai, Y. S., Rates, D., Moreno-Marcos, P. M., Munoz-Merino, P. J., Jivet, I., Scheffel, M., ... & Gašević, D. (2020). Learning analytics in European higher education—Trends and barriers. Computers & Education, 155, 103933. https://doi.org/10.1016/j.compedu.2020.103933

Tucker, M. C., Shaw, S. T., Son, J. Y., & Stigler, J. W. (2023). Teaching statistics and data analysis with R. Journal of Statistics and Data Science Education, 31(1), 18–32. https://doi.org/10.1080/26939169.2022.2089410

Vabalas, A., Gowen, E., Poliakoff, E., & Casson, A. J. (2019). Machine learning algorithm validation with a limited sample size. PLOS ONE, 14(11), e0224365. https://doi.org/10.1371/journal.pone.0224365

van Halem, N., van Klaveren, C., Drachsler, H., Schmitz, M., & Cornelisz, I. (2020). Tracking patterns in self-regulated learning using students' self-reports and online trace data. Frontline Learning Research, 8(3), 140–163. https://doi.org/10.14786/flr.v8i3.497

Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.

Wilson, A., Watson, C., Thompson, T. L., Drew, V., & Doyle, S. (2017). Learning analytics: Challenges and limitations. Teaching in Higher Education, 22(8), 991–1007. https://doi.org/10.1080/13562517.2017.1332026

Winne, P. H. (2010). Improving measurements of self-regulated learning. Educational Psychologist, 45, 267–276. https://doi.org/10.1080/00461520.2010.517150.

Winne, P. H. (2020). Construct and consequential validity for learning analytics based on trace data. Computers in Human Behavior, 112, 106457. https://doi.org/10.1016/j.chb.2020.106457

Xu, A., Blake, A. B., Zhang, I. Y., Zhao, Y., & Epner, R. (2023). Early Identification of Underperforming Students via Reading Patterns. In American Educational Research Association Online Paper Repository. https://doi.org/10.3102/2009101

Yang, T. Y., Brinton, C. G., Joe-Wong, C., & Chiang, M. (2017). Behavior-based grade prediction for MOOCs via time series neural networks. IEEE Journal of Selected Topics in Signal Processing, 11(5), 716–728. https://doi.org/10.1109/JSTSP.2017.2700227

Ying, X. (2019). An Overview of Overfitting and its Solutions. Journal of Physics: Conference Series, 1168(2). https://doi.org/10.1088/1742-6596/1168/2/022022

Zimmerman, T. D. (2012). Exploring learner to content interaction as a success factor in online courses. International Review of Research in Open and Distance Learning, 13(4). https://www.proquest.com/scholarly-journals/exploring-learner-content-interaction-as-success/docview/1634473616/se-2

Downloads

Published

2026-03-14

How to Cite

Xu, A., Zhang, Y., Blake, A., & Stigler, J. (2026). Modelability as a Strategy for Improving the Generalizability and Scalability of Predictive Models. Journal of Learning Analytics, 13(1), 89-109. https://doi.org/10.18608/jla.2026.9099