De-identification is Insufficient to Protect Student Privacy, or – What Can a Field Trip Reveal?
Keywords:learning analytics, privacy, re-identification
Learning analytics have the potential to improve teaching and learning in K–12 education, but as student data is increasingly being collected and transferred for the purpose of analysis, it is important to take measures that will protect student privacy. A common approach to achieve this goal is the de-identification of the data, meaning the removal of personal details that can reveal student identity. However, as we demonstrate, de-identification alone is not a complete solution. We show how we can discover sensitive information about students by linking de-identified datasets with publicly available school data, using unsupervised machine learning techniques. This underlines that de-identification alone is insufficient if we wish to further learning analytics in K–12 without compromising student privacy.
Barbaro, M., & Zeller, T. (2006, 01). A face is exposed for AOL searcher no. 4417749. New York Times. (Accessed May 20, 2021) Retrieved from http://shawndra.pbworks.com/f/A+Face+Is+Exposed+for+AOL+Searcher+No.+4417749+-+New+York+T.pdf
Daries, J. P., Reich, J., Waldo, J., Young, E. M., Whittinghill, J., Ho, A. D., . . . Chuang, I. (2014). Privacy, anonymity, and big data in the social sciences. Communications of the ACM, 57(9), 56–63. https://doi.org/10.1145/2643132
Dwork, C. (2008). Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation (TAMC 2008), 25–29 April, Xi’an, China (pp. 1–19). Springer. https://doi.org/10.1007/978-3-540-79228-4-1
EDP. (2020). Education During COVID-19; Moving towards e-Learning. (EUROPEAN DATA PORTAL; accessed May 12, 2021) Retrieved from https://www.europeandataportal.eu/en/impact-studies/covid-19/education-during-covid-19-moving-towards-e-learning
EDUCAUSE. (2015). Guidelines for Data De-identification or Anonymization. (Accessed May 12, 2021) Retrieved from https://www.educause.edu/focus-areas-and-initiatives/policy-and-security/cybersecurity-program/resources/information-security-guide/
Henriksen-Bulmer, J., & Jeary, S. (2016). Re-identification attacks—A systematic literature review. International Journal of Information Management, 36, 1184–1192. https://doi.org/10.1016/j.ijinfomgt.2016.08.002
Hoel, T., & Chen,W. (2016). Privacy-driven design of learning analytics applications: Exploring the design space of solutions for data sharing and interoperability. Journal of Learning Analytics, 3(1), 139–158. https://doi.org/10.18608/jla.2016.31.9
Hoel, T., & Chen, W. (2019). Privacy engineering for learning analytics in a global market: Defining a point of reference. The International Journal of Information and Learning Technology, 36(4), 288–298. https://doi.org/10.1108/IJILT-02-2019-0025
Hoel, T., Griffiths, D., & Chen,W. (2017). The influence of data protection and privacy frameworks on the design of learning analytics systems. In Proceedings of the Seventh International Conference on Learning Analytics and Knowledge (LAK 2017), 13–17 March 2017, Vancouver, BC, Canada (pp. 243–252). New York: ACM. https://doi.org/10.1145/3027385.3027414
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. https://doi.org/10.1007/BF01908075
Kabir, S., Wagner, C., Havens, T. C., Anderson, D. T., & Aickelin, U. (2017). Novel similarity measure for interval-valued data based on overlapping ratio. In Proceedings of the 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2017), 9–12 July 2017, Naples, Italy (pp. 1–6). IEEE. https://doi.org/10.1109/FUZZ-IEEE.2017.8015623
Kay, J., & Kummerfeld, B. (2019). From data to personal user models for life-long, life-wide learners. British Journal of Educational Technology, 50(6), 2871–2884. https://doi.org/10.1111/bjet.12878
Khalil, M., & Ebner, M. (2016). De-identification in learning analytics. Journal of Learning Analytics, 3(1), 129–138. https://doi.org/10.18608/jla.2016.31.8
Kitto, K., & Knight, S. (2019). Practical ethics for building learning analytics. British Journal of Educational Technology, 50(6), 2855–2870. https://doi.org/10.1111/bjet.12868
Krueger, K. R., & Moore, B. (2015). New technology “clouds” student data privacy. Phi Delta Kappan, 96(5), 19–24. https://doi.org/10.1177/0031721715569464
Li, C., & Lalani, F. (2020). The COVID-19 Pandemic Has Changed Education Forever. This Is How. (Accessed May 12, 2021) Retrieved from https://www.weforum.org/agenda/2020/04/coronavirus-education-global-covid19-online-digital-learning/
Macfadyen, L. (2017). What does a learning analytics practitioner need to know? In Proceedings of the Workshop on Methodology in Learning Analytics and the Workshop on Building the Learning Analytics Curriculum (LAK 2017), 13–17 March 2017, Vancouver, BC, Canada.
Narayanan, A. R. V., & Felten, E. W. (2014). No Silver Bullet: De-identification Still Doesn’t Work. (Accessed May 12, 2021) Retrieved from http://www.randomwalker.info/publications/no-silver-bullet-de-identification.pdf
Nissenbaum, H. (2009). Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford University Press. https://doi.org/10.1515/9780804772891
OECD. (2005). Glossary of Statistical Terms: Quasi-identifier. (Accessed May 12, 2021) Retrieved from https://stats.oecd.org/glossary/detail.asp?ID=6961
Ohm, P. (2009). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1701–1777.
Pardo, A., & Siemens, G. (2014). Ethical and privacy principles for learning analytics. British Journal of Educational Technology, 45(3), 438–450. https://doi.org/10.1111/bjet.12152
Peddy, A. M. (2017). Dangerous classroom “app”-titude: Protecting student privacy from third-party educational service providers. Brigham Young University Education and Law Journal, 2017(1), 125–159. (Accessed May 12, 2021) Retrieved from https://digitalcommons.law.byu.edu/cgi/viewcontent.cgi?article=1395context=elj
Peterson, D. (2016). Edtech and student privacy: California law as a model. Berkeley Technology Law Journal, 31, 961–996. Retrieved from https://btlj.org/data/articles2016/vol31/31ar=09610996PetersonWEB:pdf
Reidenberg, J. R. (2015). Hearing testimony on how emerging technology affects student privacy. In Hearing before the U.S. Congress, House Committee on Education and the Workforce, Subcommittee on Early Childhood, Elementary and Secondary Education, 114th Congress, 12 February 2015, Washington, DC, USA. Retrieved from https://www.govinfo.gov/content/pkg/CHRG-114hhrg93208/pdf/CHRG-114hhrg93208.pdf
Reidenberg, J. R., & Schaub, F. (2018). Achieving big data privacy in education. Theory and Research in Education, 16(3), 263–279. https://doi.org/10.1177/1477878518805308
Roy, S., & Singh, S. N. (2017). Emerging trends in applications of big data in educational data mining and learning analytics. In Proceedings of the Seventh International Conference on Cloud Computing, Data Science Engineering—Confluence, 12–13 January 2017, Noida, India (pp. 193–198). IEEE. https://doi.org/10.1109/CONFLUENCE.2017.7943148
Rubel, A., & Jones, K. (2016). Student privacy in learning analytics: An information ethics perspective. The Information Society, 32, 143–159. https://doi.org/10.1080/01972243.2016.1130502
Siemens, G. (2013). Learning analytics: The emergence of a discipline. American Behavioral Scientist, 57(10), 1380–1400. https://doi.org/10.1177/0002764213498851
Singer, N. (2015). Data security gaps in an industry student privacy pledge. (Accessed May 12, 2021) Retrieved from https://bits.blogs.nytimes.com/2015/02/11/data-security-gaps-in-an-industry-student-privacy-pledge/?_r=0
Solove, D. (2005). A taxonomy of privacy. University of Pennsylvania Law Review, 154(3), 477–564. https://doi.org/10.2307/40041279
Strauss, V. (11 April 2014). $100 million Gates-funded student data project ends in failure. Washington Post. (Accessed May 12, 2021) Retrieved from https://www.washingtonpost.com/news/answer-sheet/wp/2014/04/21/100-million-gates-funded-student-data-project-ends-in-failure/
Sweeney, L. (2000). Simple demographics often identify people uniquely. Health (San Francisco), 671, 1–34. Retrieved from https://doi.org/10.1184/R1/6625769.v1
Sweeney, L. (2015). Only you, your doctor, and many others may know. Technology Science. Retrieved from https://techscience.org/a/2015092903/
Taylor, L., Floridi, L., & Sloot, B. (2017). Group Privacy: New Challenges of Data Technologies. Springer. https://doi.org/10.1007/978-3-319-46608-8
Tibshirani, R.,Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411–423. https://doi.org/10.1111/1467-9868.00293
UNESCO. (2020). National Education Responses to COVID-19: Summary Report of UNESCO’s Online Survey. (Accessed May 12, 2021) Retrieved from https://unesdoc.unesco.org/ark:/48223/pf0000373322
Zeide, E., & Nissenbaum, H. (2018). Learner privacy in MOOCs and virtual education. Theory and Research in Education, 16(3), 280–307. https://doi.org/10.19173/irrodl.v21i4.4643
How to Cite
Copyright (c) 2021 Journal of Learning Analytics
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons License, Attribution - NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0) license that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).