Evaluating 21st-Century Competencies in Postsecondary Curricula with Large Language Models: Performance Benchmarking and Reasoning-Based Prompting Strategies

Zhen Xu; Xin Guan; Chenxi Shi; Qinhao Chen; Renzhe Yu

doi:10.18608/jla.2026.9127

Authors

Zhen Xu Columbia University https://orcid.org/0009-0004-3131-910X
Xin Guan Columbia University https://orcid.org/0009-0001-6055-1555
Chenxi Shi Columbia University https://orcid.org/0009-0009-1630-7078
Qinhao Chen Columbia University https://orcid.org/0009-0001-9105-6899
Renzhe Yu Columbia University https://orcid.org/0000-0002-2375-3537

DOI:

https://doi.org/10.18608/jla.2026.9127

Keywords:

curricular analytics, 21st century competencies, large language models (LLMs), prompt engineering, chain of thought (CoT), research paper

Abstract

The growing emphasis on 21st-century competencies in postsecondary education, intensified by the transformative impact of generative artificial intelligence (GenAI) on the economy and society, underscores the urgent need to evaluate how they are embedded in curricula and how effectively academic programs align with evolving workforce and societal demands. Curricular analytics, particularly recent advancements powered by GenAI, offer a promising data-driven approach to this challenge. However, the analysis of 21st-century competencies requires pedagogical reasoning beyond surface-level information retrieval, and the capabilities of large language models (LLMs) in this context remain underexplored. In this study, we extend prior research on curricular analytics of 21st-century competencies across a broader range of curriculum documents, competency frameworks, and models. Using 7,600 manually annotated curriculum-competency alignment scores (38 competencies and 200 courses across five curriculum document types), we evaluate the informativeness of different curriculum document sources, benchmark the performance of general-purpose LLMs on mapping curricula to competencies, and analyze error patterns. We further introduce a reasoning-based prompting strategy, curricular chain-of-thought (CoT), to strengthen LLMs’ pedagogical reasoning. Our results show that detailed instructional activity descriptions are the most informative type of curriculum document for competency analytics. Open-weight LLMs achieve accuracy comparable to proprietary models on coarse-grained tasks, demonstrating their scalability and cost-effectiveness for institutional use. However, no model reaches human-level precision in fine-grained pedagogical reasoning. Our proposed curricular CoT yields modest improvements by reducing bias in instructional keyword inference and improving the detection of nuanced pedagogical evidence in long text. Together, these findings highlight the untapped potential of institutional curriculum documents and provide an empirical foundation for advancing AI-driven curricular analytics.

References

Arafeh, S. (2016). Curriculum mapping in higher education: A case study and proposed content scope and sequence mapping tool. Journal of Further and Higher Education, 40(5), 585–611. https://doi.org/10.1080/0309877X.2014.1000278

Buckingham Shum, S., & Crick, R. D. (2016). Learning analytics for 21st century competencies. Journal of Learning Analytics, 3(2), 6–21. https://doi.org/10.18608/jla.2016.32.2

Chou, C.-Y., Tseng, S.-F., Chih, W. -C., Chen, Z. -H., Chao, P. - Y., Lai, K. R., Chan, C.-L., Yu, L. -C., & Lin, Y.-L. (2015). Open student models of core competencies at the curriculum level: Using learning analytics for student reflection. IEEE Transactions on Emerging Topics in Computing, 5(1), 32–44. https://doi.org/10.1109/TETC.2015.2501805

Dawson, S., & Hubball, H. (2014). Curriculum analytics: Application of social network analysis for improving strategic curriculum decision-making in a research-intensive university. Teaching and Learning Inquiry, 2(2), 59–74. https://doi.org/10.20343/teachlearninqu.2.2.59

De Silva, L. M. H., Rodríguez-Triana, M. J., Chounta, I.-A., & Pishtari, G. (2024). Curriculum analytics in higher education institutions: A systematic literature review. Journal of Computing in Higher Education, 1–47. https://doi.org/10.1007/s12528-024-09410-8

Decorte, J.- J., Van Hautte, J., Deleu, J., Develder, C., & Demeester, T. (2022). Design of negative sampling strategies for distantly supervised skill extraction. arXiv preprint arXiv:2209.05987. https://doi.org/10.48550/arXiv.2209.05987

Deng, Y., Zhang, W., Chen, Z., & Gu, Q. (2024). Rephrase and respond: Let large language models ask better questions for themselves. arXiv preprint arXiv:2311.04205. https://doi.org/10.48550/arXiv.2311.04205

Department for Education. (2023). Generative artificial intelligence (AI) in education (tech. rep.). Government of the United Kingdom. https://www.gov.uk/government/publications/generative-artificial-intelligence-in-education

Doyle, A., Sridhar, P., Agarwal, A., Savelka, J., & Sakr, M. (2025). A comparative study of AI-generated and human-crafted learning objectives in computing education. Journal of Computer Assisted Learning, 41(1), e13092. https://doi.org/10.1111/jcal.13092

Durant, E., Impagliazzo, J., Conry, S., Reese, R., Lam, H., Nelson, V., Hughes, J., Liu, W., Lu, J., & McGettrick, A. (2015). CE2016: Updated computer engineering curriculum guidelines. In Proceedings of the 2015 IEEE Frontiers in Education Conference (FIE 2015), 21–24 October 2015, El Paso, Texas, USA (pp. 1–2). IEEE. https://doi.org/10.1109/FIE.2015.7344157

Fiesler, C., Garrett, N., & Beard, N. (2020). What do we teach when we teach tech ethics? A syllabi analysis. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE 2020), 11–14 March 2020, Portland, Oregon, USA (pp. 289–295). ACM. https://doi.org/10.1145/3328778.3366825

Ghanizadeh, A., Al-Hoorie, A. H., & Jahedizadeh, S. (2020). Higher order thinking skills. In Higher order thinking skills in the language classroom: A concise guide (pp. 1–51). Springer. https://doi.org/10.1007/978-3-030-56711-8_1

Gorski, P. C. (2009). What we’re teaching teachers: An analysis of multicultural teacher education coursework syllabi. Teaching and Teacher Education, 25(2), 309–318. https://doi.org/10.1016/j.tate.2008.07.008

Greer, J., Molinaro, M., Ochoa, X., & McKay, T. (2016). Learning analytics for curriculum and program quality improvement (pcla 2016). In Proceedings of the Sixth International Conference on Learning Analytics and Knowledge (LAK 2016), 25–29 April 2016, Edinburgh, Scotland, UK (pp. 494–495). ACM. https://doi.org/10.1145/2883851.2883899

Griffin, P., McGaw, B., & Care, E. (2012). Assessment and teaching of 21st century skills (Vol. 10). Springer. https://doi.org/10.1007/978-3-319-65368-6

Herandi, A., Li, Y., Liu, Z., Hu, X., & Cai, X. (2024). Skill-LLM: Repurposing general-purpose LLMs for skill extraction. arXiv preprint arXiv:2410.12052. https://doi.org/10.48550/arXiv.2410.12052

Hilliger, I., Aguirre, C., Miranda, C., Celis, S., & Pérez-Sanagustín, M. (2020). Design of a curriculum analytics tool to support continuous improvement processes in higher education. In Proceedings of the 10th International Conference on Learning Analytics and Knowledge (LAK 2020), 23–27 March 2020, Frankfurt, Germany (pp. 181–186). ACM. https://doi.org/10.1145/3375462.3375489

Hilliger, I., Miranda, C., Celis, S., & Pérez-Sanagustín M. (2024). Curriculum analytics adoption in higher education: A multiple case study engaging stakeholders in different phases of design. British Journal of Educational Technology, 55(3), 785–801. https://doi.org/10.1111/bjet.13374

Homa, N., Hackathorn, J., Brown, C. M., Garczynski, A., Solomon, E. D., Tennial, R., Sanborn, U. A., & Gurung, R. A. (2013). An analysis of learning objectives and content coverage in introductory psychology syllabi. Teaching of Psychology, 40(3), 169–174. https://doi.org/10.1177/0098628313487456

Hong, P. Y. P., & Hodge, D. R. (2009). Understanding social justice in social work: A content analysis of course syllabi. Families in Society, 90(2), 212–219. https://doi.org/10.1606/1044-3894.3874

Irwin, R. (2002). Characterizing the core: What catalog descriptions of mandatory courses reveal about LIS schools and librarianship. Journal of Education for Library and Information Science, 175–184. https://doi.org/10.2307/40323978

Javadian Sabet, A., Bana, S. H., Yu, R., & Frank, M. R. (2024). Course-Skill Atlas: A national longitudinal dataset of skills taught in US higher education curricula. Scientific Data, 11(1), 1086. https://doi.org/10.1038/s41597-024-03931-8

Jayalath, V., Barthakur, A., Dawson, S., Tingey, J., Crase, L., & Kovanović, V. (2025). Scaling curriculum mapping in higher education: Evaluating generative AI’s role in curriculum analytics. In A. I. Cristea, E. Walker, Y. Lu, O. C. Santos, & S. Isotani (Eds.), Proceedings of the 2025 International Conference on Artificial Intelligence in Education (AIED 2025), 22–26 July 2025, Palermo, Italy (pp. 294–308). ACM. https://doi.org/10.1007/978-3-031-98414-3_21

Jovanovic, J., Zamecnik, A., Barthakur, A., & Dawson, S. (2025). Curriculum analytics: Exploring assessment objectives, types, and grades in a study program. Education and Information Technologies, 30(4), 4843–4866. https://doi.org/10.1007/s10639-024-13015-0

Kawintiranon, K., Vateekul, P., Suchato, A., & Punyabukkana, P. (2016). Understanding knowledge areas in curriculum through text mining from course materials. In Proceedings of the 2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE 2016), 7–9 December 2016, Bangkok, Thailand (pp. 161–168). IEEE. https://doi.org/10.1109/TALE.2016.7851788

Kitto, K., Sarathy, N., Gromov, A., Liu, M., Musial, K., & Buckingham Shum, S. (2020). Towards skills-based curriculum analytics: Can we automate the recognition of prior learning? In Proceedings of the 10th International Conference on Learning Analytics and Knowledge (LAK 2020), 23–27 March 2020, Frankfurt, Germany (pp. 171–180). ACM. https://doi.org/10.1145/3375462.3375526

Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916. https://doi.org/10.48550/arXiv.2205.11916

Kotsiou, A., Fajardo-Tovar, D. D., Cowhitt, T., Major, L., & Wegerif, R. (2022). A scoping review of future skills frameworks. Irish Educational Studies, 41(1), 171–186. https://doi.org/10.1080/03323315.2021.2022522

Kozov, V., Ivanova, G., & Atanasova, D. (2024). Practical application of AI and large language models in software engineering education. International Journal of Advanced Computer Science and Applications, 15(1). https://doi.org/10.14569/IJACSA.2024.0150168

Li, X., Henriksson, A., Duneld, M., Nouri, J., & Wu, Y. (2024). Supporting teaching-to-the-curriculum by linking diagnostic tests to curriculum goals. Artificial Intelligence in Education, 14829, 118–132. https://doi.org/10.1007/978-3-031-64302-6_9

Light, J. (2024). Student demand and the supply of college courses. https://doi.org/10.2139/ssrn.4856488

Liu, C., Hoang, L., Stolman, A., & Wu, B. (2024). HiTA: A RAG-based educational platform that centers educators in the instructional loop. In A. Olney, I. Chounta, Z. Liu, O. Santos, & I. Bittencourt (Eds.), Artificial intelligence in education. AIED 2024. Lecture notes in computer science (pp. 405–412, Vol. 14830). Springer. https://doi.org/10.1007/978-3-031-64299-9_37

Lohr, D., Berges, M., Chugh, A., Kohlhase, M., & Müller, D. (2025). Leveraging large language models to generate course-specific semantically annotated learning objects. Journal of Computer Assisted Learning, 41(1), e13101. https://doi.org/10.1111/jcal.13101

Lyu, W., Wang, Y., Chung, T., Sun, Y., & Zhang, Y. (2024). Evaluating the effectiveness of LLMs in introductory computer science education: A semester-long field study. In Proceedings of the 11th ACM Conference on Learning at Scale (L@S 2024), 18–20 July 2024, Atlanta, Georgia, USA (pp. 63–74). ACM. https://doi.org/10.1145/3657604.3662036

McKinsey Global Institute. (2023). Generative AI and the future of work in America. https://www.mckinsey.com/mgi/our-research/generative-ai-and-the-future-of-work-in-america

Meyers, N. M., & Nulty, D. D. (2009). How to use (five) curriculum design principles to align authentic learning environments, assessment, students’ approaches to thinking and learning outcomes. Assessment & Evaluation in Higher Education, 34(5), 565–577. https://doi.org/10.1080/02602930802226502

Musa, F., Mufti, N., Latiff, R. A., & Amin, M. M. (2012). Project-based learning (PjBL): Inculcating soft skills in 21st century workplace. Procedia-Social and Behavioral Sciences, 59, 565–573. https://doi.org/10.1016/j.sbspro.2012.09.315

National Education Association. (2024). Teaching in the age of AI: NEA members’ roadmap for safe, effective, and accessible use of artificial intelligence in education (tech. rep.). Washington, DC. https://www.nea.org/resource-library/artificial-intelligence-education

Nguyen, K. C., Zhang, M., Montariol, S., & Bosselut, A. (2024). Rethinking skill extraction in the job market domain using large language models. arXiv preprint arXiv:2402.03832. https://doi.org/10.48550/arXiv.2402.03832

Nye, M., Hewitt, J., Chen, J., Krueger, D., Duvenaud, D., Lake, B., & Zemel, R. (2021). Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114. https://doi.org/10.48550/arXiv.2112.00114

OECD. (2023a). Innovating assessments to measure and support complex skills (N. Foster & M. Piacentini, Eds.). https://doi.org/10.1787/e5f3e341-en

OECD. (2023b). OECD digital education outlook 2023: Towards an effective digital education ecosystem. https://doi.org/10.1787/c74f03de-en

OECD. (2023c). OECD employment outlook 2023: Artificial intelligence and the labour market. https://doi.org/10.1787/08785bba-en

Office of Educational Technology. (2023). Artificial intelligence and the future of teaching and learning: Insights and recommendations (tech. rep.). U.S. Department of Education. Washington, DC. https://www.ed.gov/sites/ed/files/documents/ai-report/ai-report.pdf

Ohland, M., & Collins, R. (2002). Creating a catalog and meta analysis of freshman programs for engineering students: Part 2: Learning communities. In Proceedings of the 2002 American Society for Engineering Education Annual Conference and Exposition, 16–19 June 2002, Montréal, Québec, Canada (pp. 7–338). ASEE PEER. https://doi.org/10.18260/1-2--10110

Pistilli, M. D., & Heileman, G. L. (2017). Guiding early and often: Using curricular and learning analytics to shape teaching, learning, and student success in gateway courses. New Directions for Higher Education, 2017(180), 21–30. https://doi.org/10.1002/he.20258

Retnawati, H., Djidu, H., Apino, E., Anazifa, R. D., et al. (2018). Teachers’ knowledge about higher-order thinking skills and its learning strategy. Problems of Education in the 21st Century, 76(2), 215–230. https://doi.org/10.33225/pec/18.76.215

Senger, E., Zhang, M., van der Goot, R., & Plank, B. (2024). Deep learning-based computational job market analysis: A survey on skill extraction and classification from job postings. In E. Hruschka, T. Lake, N. Otani, & T. Mitchell (Eds.), Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024), 22 March 2024, St. Julian’s, Malta (pp. 1–15). Association for Computational Linguistics. https://doi.org/10.18653v1/2024.nlp4hr-1.1

Shorman, S., Khder, M., et al. (2024). Curriculum management system to measure the course and program outcomes. In 2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS 2024), 28–29 January 2024, Manama, Bahrain (pp. 391–397). IEEE. https://doi.org/10.1109/ICETSIS61505.2024.10459625

Siyan, L., Xu, Z., Raghuram, V. C., Zhang, X., Yu, R., & Yu, Z. (2025). Bringing pedagogy into focus: Evaluating virtual teaching assistants’ question-answering in asynchronous learning environments. In C. Christodoulopoulos, T. Chakraborty, C. Rose, & V. Peng (Eds.), Findings of the Association for Computational Linguistics (EMNLP 2025), 4–9 November 2025, Suzhou, China (pp. 9743–9774). Association for Computational Linguistics. https://doi.org/10.18653/v1/2025.findings-emnlp.518

Sridhar, P., Doyle, A., Agarwal, A., Bogart, C., Savelka, J., & Sakr, M. (2023). Harnessing LLMs in curricular design: Using GPT-4 to support authoring of learning objectives. arXiv preprint arXiv:2306.17459. https://doi.org/10.48550/arXiv.2306.17459

Tan, C. W., & Lim, K. Y. (2023). Revolutionizing formative assessment in STEM fields: Leveraging AI and NLP techniques. In 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2023), 31 October–3 November 2023, Taipei, Taiwan (pp. 1357–1364). IEEE. https://doi.org/10.1109/APSIPAASC58517.2023.10317226

Tang, R., & Sae-Lim, W. (2016). Data science programs in U.S. higher education: An exploratory content analysis of program description, curriculum structure, and course focus. Education for Information, 32(3), 269–290. https://doi.org/10.3233/efi-160977

Thakrar, K., & Young, N. (2025). Enhancing talent employment insights through feature extraction with LLM finetuning. arXiv preprint arXiv:2501.07663. https://doi.org/10.48550/arXiv.2501.07663

Tian, Z., Sun, M., Liu, A., Sarkar, S., & Liu, J. (2024). Enhancing instructional quality: Leveraging computer-assisted textual analysis to generate in-depth insights from educational artifacts. arXiv preprint arXiv:2403.03920. https://doi.org/10.48550/arXiv.2403.03920

UNESCO. (2023). Guidance for generative AI in education and research (tech. rep.). UNESCO. Paris. https://unesdoc.unesco.org/ark:/48223/pf0000386693

Walker, R. E. (2024). Mapping curricula to skills and occupations using course descriptions. In C. da Rocha Brito & M. M. Ciampi (Eds.), Proceedings of the 2024 IEEE World Engineering Education Conference (EDUNINE 2024), 10–13 March 2024, Guatemala City, Guatemala. IEEE. https://doi.org/10.1109/EDUNINE60625.2024.10500452

Wang, Y., Zhang, Z., & Wang, R. (2023). Element-aware summarization with large language models: Expert-aligned evaluation and chain-of-thought method. arXiv preprint arXiv:2305.13412. https://doi.org/10.48550/arXiv.2305.13412

Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). Emergent abilities of large language models. arXiv preprint arXiv:2206.07682. https://doi.org/10.48550/arXiv.2206.07682

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS 2022), 28 November 2022–9 December 2022, New Orleans, Louisiana, USA (pp. 24824–24837). ACM. https://dl.acm.org/doi/10.5555/3600270.3602070

World Economic Forum. (2025). The future of jobs report 2025. World Economic Forum. https://www.weforum.org/publications/the-future-of-jobs-report-2025/

Xu, Z., Li, X., Huan, Y., Minaya, V., & Yu, R. (2025). From course to skill: Evaluating large language model performance in curricular analytics. In A. Cristea, E. Walker, Y. Lu, O. Santos, & S. Isotani (Eds.), Artificial intelligence in education. AIED 2025. Lecture notes in computer science (pp. 203–211, Vol. 15882). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-98465-5_26

Yang, H., Kim, J., & Lee, W. (2023). Analyzing the alignment between AI curriculum and AI textbooks through text mining. Applied Sciences, 13(18), 10011. https://doi.org/10.3390/app131810011

Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601. https://doi.org/10.48550/arXiv.2305.10601

Zamecnik, A., Barthakur, A., Wang, H., & Dawson, S. (2024). Mapping employable skills in higher education curriculum using llms. In R. Ferreira Mello, N. Rummel, I. Jivet, G. Pishtari, & R. Valiente (Eds.), Technology enhanced learning for inclusive and equitable quality education. EC-TEL 2024. Lecture notes in computer science (pp. 18–32, Vol. 15160). Springer. https://doi.org/10.1007/978-3-031-72312-4_2

Zhang, M., Jensen, K., Sonniks, S., & Plank, B. (2022). Skillspan: Hard and soft skill extraction from english job postings. In M. Carpuat, M.- C. de Marneffe, & I. V. Meza Ruiz (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2022), 10–15 July 2022, Seattle, Washington, USA (pp. 4962–4984). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.366

Zhang, S., Qin, L., Zhou, D., Le, Q. V., Liu, P. J., et al. (2022). Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493. https://doi.org/10.48550/arXiv.2210.03493

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625. https://doi.org/10.48550/arXiv.2205.10625

Evaluating 21st-Century Competencies in Postsecondary Curricula with Large Language Models

Performance Benchmarking and Reasoning-Based Prompting Strategies

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License