De-identification is Insufficient to Protect Student Privacy, or – What Can a Field Trip Reveal?




learning analytics, privacy, re-identification


Learning analytics have the potential to improve teaching and learning in K–12 education, but as student data is increasingly being collected and transferred for the purpose of analysis, it is important to take measures that will protect student privacy. A common approach to achieve this goal is the de-identification of the data, meaning the removal of personal details that can reveal student identity. However, as we demonstrate, de-identification alone is not a complete solution. We show how we can discover sensitive information about students by linking de-identified datasets with publicly available school data, using unsupervised machine learning techniques. This underlines that de-identification alone is insufficient if we wish to further learning analytics in K–12 without compromising student privacy.


Barbaro, M., & Zeller, T. (2006, 01). A face is exposed for AOL searcher no. 4417749. New York Times. (Accessed May 20, 2021) Retrieved from

Daries, J. P., Reich, J., Waldo, J., Young, E. M., Whittinghill, J., Ho, A. D., . . . Chuang, I. (2014). Privacy, anonymity, and big data in the social sciences. Communications of the ACM, 57(9), 56–63.

Dwork, C. (2008). Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation (TAMC 2008), 25–29 April, Xi’an, China (pp. 1–19). Springer.

EDP. (2020). Education During COVID-19; Moving towards e-Learning. (EUROPEAN DATA PORTAL; accessed May 12, 2021) Retrieved from

EDUCAUSE. (2015). Guidelines for Data De-identification or Anonymization. (Accessed May 12, 2021) Retrieved from

Henriksen-Bulmer, J., & Jeary, S. (2016). Re-identification attacks—A systematic literature review. International Journal of Information Management, 36, 1184–1192.

Hoel, T., & Chen,W. (2016). Privacy-driven design of learning analytics applications: Exploring the design space of solutions for data sharing and interoperability. Journal of Learning Analytics, 3(1), 139–158.

Hoel, T., & Chen, W. (2019). Privacy engineering for learning analytics in a global market: Defining a point of reference. The International Journal of Information and Learning Technology, 36(4), 288–298.

Hoel, T., Griffiths, D., & Chen,W. (2017). The influence of data protection and privacy frameworks on the design of learning analytics systems. In Proceedings of the Seventh International Conference on Learning Analytics and Knowledge (LAK 2017), 13–17 March 2017, Vancouver, BC, Canada (pp. 243–252). New York: ACM.

Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

Kabir, S., Wagner, C., Havens, T. C., Anderson, D. T., & Aickelin, U. (2017). Novel similarity measure for interval-valued data based on overlapping ratio. In Proceedings of the 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2017), 9–12 July 2017, Naples, Italy (pp. 1–6). IEEE.

Kay, J., & Kummerfeld, B. (2019). From data to personal user models for life-long, life-wide learners. British Journal of Educational Technology, 50(6), 2871–2884.

Khalil, M., & Ebner, M. (2016). De-identification in learning analytics. Journal of Learning Analytics, 3(1), 129–138.

Kitto, K., & Knight, S. (2019). Practical ethics for building learning analytics. British Journal of Educational Technology, 50(6), 2855–2870.

Krueger, K. R., & Moore, B. (2015). New technology “clouds” student data privacy. Phi Delta Kappan, 96(5), 19–24.

Li, C., & Lalani, F. (2020). The COVID-19 Pandemic Has Changed Education Forever. This Is How. (Accessed May 12, 2021) Retrieved from

Macfadyen, L. (2017). What does a learning analytics practitioner need to know? In Proceedings of the Workshop on Methodology in Learning Analytics and the Workshop on Building the Learning Analytics Curriculum (LAK 2017), 13–17 March 2017, Vancouver, BC, Canada.

Narayanan, A. R. V., & Felten, E. W. (2014). No Silver Bullet: De-identification Still Doesn’t Work. (Accessed May 12, 2021) Retrieved from

Nissenbaum, H. (2009). Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford University Press.

OECD. (2005). Glossary of Statistical Terms: Quasi-identifier. (Accessed May 12, 2021) Retrieved from

Ohm, P. (2009). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1701–1777.

Pardo, A., & Siemens, G. (2014). Ethical and privacy principles for learning analytics. British Journal of Educational Technology, 45(3), 438–450.

Peddy, A. M. (2017). Dangerous classroom “app”-titude: Protecting student privacy from third-party educational service providers. Brigham Young University Education and Law Journal, 2017(1), 125–159. (Accessed May 12, 2021) Retrieved from

Peterson, D. (2016). Edtech and student privacy: California law as a model. Berkeley Technology Law Journal, 31, 961–996. Retrieved from

Reidenberg, J. R. (2015). Hearing testimony on how emerging technology affects student privacy. In Hearing before the U.S. Congress, House Committee on Education and the Workforce, Subcommittee on Early Childhood, Elementary and Secondary Education, 114th Congress, 12 February 2015, Washington, DC, USA. Retrieved from

Reidenberg, J. R., & Schaub, F. (2018). Achieving big data privacy in education. Theory and Research in Education, 16(3), 263–279.

Roy, S., & Singh, S. N. (2017). Emerging trends in applications of big data in educational data mining and learning analytics. In Proceedings of the Seventh International Conference on Cloud Computing, Data Science Engineering—Confluence, 12–13 January 2017, Noida, India (pp. 193–198). IEEE.

Rubel, A., & Jones, K. (2016). Student privacy in learning analytics: An information ethics perspective. The Information Society, 32, 143–159.

Siemens, G. (2013). Learning analytics: The emergence of a discipline. American Behavioral Scientist, 57(10), 1380–1400.

Singer, N. (2015). Data security gaps in an industry student privacy pledge. (Accessed May 12, 2021) Retrieved from

Solove, D. (2005). A taxonomy of privacy. University of Pennsylvania Law Review, 154(3), 477–564.

Strauss, V. (11 April 2014). $100 million Gates-funded student data project ends in failure. Washington Post. (Accessed May 12, 2021) Retrieved from

Sweeney, L. (2000). Simple demographics often identify people uniquely. Health (San Francisco), 671, 1–34. Retrieved from

Sweeney, L. (2015). Only you, your doctor, and many others may know. Technology Science. Retrieved from

Taylor, L., Floridi, L., & Sloot, B. (2017). Group Privacy: New Challenges of Data Technologies. Springer.

Tibshirani, R.,Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411–423.

UNESCO. (2020). National Education Responses to COVID-19: Summary Report of UNESCO’s Online Survey. (Accessed May 12, 2021) Retrieved from

Zeide, E., & Nissenbaum, H. (2018). Learner privacy in MOOCs and virtual education. Theory and Research in Education, 16(3), 280–307.




How to Cite

Yacobson, E., Fuhrman, O., Hershkovitz, S., & Alexandron, G. (2021). De-identification is Insufficient to Protect Student Privacy, or – What Can a Field Trip Reveal?. Journal of Learning Analytics, 8(2), 83-92.



Special Section: Learning Analytics for Primary and Secondary Schools