AlignED Reports
Report 1: AlignED — Benchmarking AI Models for Educational Practice
Five evaluations across 32 models test neuromyth identification, diagnostic reasoning, teacher certification knowledge, and student work judgement. No single model ranks first on all tasks.
March 2026Report 2: Can LLMs Assess Complex Student Competencies?
Two Gemini models scored 18 student essays against rubrics. A contamination test reveals a ~25 percentage point gap in score recall between likely-contaminated and less-contaminated corpora.
March 2026Report 3: AI and Education — What 152,000 Conversations Reveal
Descriptive analysis of 152,088 education-related Claude.ai conversations from the Anthropic Economic Index V4. Students are the primary users (59.5% coursework), directive and iterative patterns dominate, and feedback loops are nearly absent.
March 2026Report 4: Do LLMs Fade Worked Examples?
A pilot study testing whether frontier LLMs apply the worked example fading effect from cognitive load theory. Two models know about fading in their reasoning traces but neither applies it unless specifically instructed.
March 2026Report 4.1: Do LLMs Fade Worked Examples? (Extended)
Expands the Report 4 pilot from 2 to 6 models (3 closed-source, 3 open-weight). The knowledge-application gap holds across all six: every model retrieves CLT knowledge when prompted but none applies fading without specific instruction.
About
The AlignED project benchmarks AI model performance on tasks relevant to professional teaching knowledge. Each report is published as a standalone site hosted on GitHub Pages. Reports are frozen at publication and not updated after release.
Source code, datasets, and evaluation configurations are available on GitHub. Data is hosted on the Open Science Framework under a CC BY 4.0 licence.