Most faculty I have encountered will tell me, without hesitation, that they "use Backward Design." They've read Wiggins and McTighe, or at least the summary their teaching center handed out. They can name the three stages. They've filled in a Understanding by Design template at some point, probably for a course redesign or an accreditation review. Then you ask to see the syllabus, and the outcomes listed in Stage 1 don't show up in the Stage 2 assessments. The midterm doesn't measure what the outcomes promised. The final project assesses an artifact, not the cognitive work the outcome claimed students would do.

This is not a small problem, and it is not rare. It is the modal case. The framework has been adopted; the work has not been done.

This essay argues that AI didn't break our assessments. It revealed that Stage 1 was already broken.

What Stage 1 Actually Requires

Wiggins and McTighe (2005) are explicit about what Stage 1 of Backward Design demands. It is not "write some learning outcomes." It is a structured identification of desired results across three nested categories: established goals, transfer goals, and the understandings and essential questions that constitute the conceptual core of the course (pp. 17–34). These are not synonyms. They are different units of design work, and each constrains what is possible in Stages 2 and 3.

Established goals are the externally given constraints: program-level outcomes, accreditation criteria, professional competencies, disciplinary standards. For a course in research methods, this might be an AACSB criterion or a graduate program's expectation that students can evaluate published research. These are typically the easiest part of Stage 1 because someone else wrote them. Faculty inherit them.

Transfer goals are the load-bearing element of Stage 1, and they are also the element most often skipped. A transfer goal answers the question: what will students be able to do, on their own, in a new context, after this course is over? Not "what will they know about" — what will they be able to do when nobody is watching, when the prompt isn't structured for them, when the situation doesn't look like a classroom. Wiggins and McTighe describe transfer as "the long-term goal of education" (2005, p. 7), and they mean it. The point of a course is not the artifacts produced inside it. The point is what students can do six months later when the artifact-producing context is gone.

Understandings and essential questions are the conceptual scaffolding. Enduring understandings are the propositional claims a student should still hold a year later — the "big ideas" that organize the discipline (Wiggins & McTighe, 2005, pp. 128–135). Essential questions are the open, recurring inquiries the discipline keeps returning to. Together they answer: what is this course actually about, beneath the topics?

Most courses do the first category. They paste in established goals from the program. They gesture at the third — every syllabus has a line about "students will develop a deeper understanding of X." Almost none do the second. Transfer goals are skipped because they're hard, and because nothing in the standard outcomes-writing workshop teaches faculty how to write them.

This is the first place courses quietly break. If Stage 1 has established goals but no transfer goals, Stage 2 has no way to know whether students can do anything beyond the course. The assessments end up measuring whether students completed the course — not whether the course mattered.

The Bloom's Taxonomy Trap

Here is the pattern I see most often when I read a syllabus. The outcomes use Bloom's verbs (Anderson & Krathwohl, 2001) to signal cognitive level: "students will analyze," "students will evaluate," "students will synthesize." The verbs do real work in the document. They tell the reader, and the accreditor, that this course operates at the higher levels of the revised taxonomy. They look rigorous.

Then you look at the assessment. The assessment for "students will analyze rhetorical strategies in primary sources" is a five-page essay. The assessment for "students will evaluate competing economic models" is a research paper. The assessment for "students will synthesize the literature on organizational change" is a literature review.

Notice what has happened. The outcome named a cognitive operation. The assessment names an artifact. These are not the same thing. An essay can be produced through analysis, but it can also be produced through summary, paraphrase, or the assembly of pre-existing analyses written by someone else. The artifact does not prove the cognitive operation occurred. It only proves an artifact exists.

This was already a problem before generative AI. Students have always been able to produce essays that look like analysis without actually performing analysis — patchwriting, sophisticated summary, the strategic deployment of disciplinary vocabulary. Faculty have always graded the artifact and inferred the cognitive process. The inference was sometimes wrong.

The Bloom's verb in the outcome was descriptive, not prescriptive. It described what the faculty member hoped would happen. It did not prescribe what the assessment had to verify. Constructive alignment (Biggs, 1996) requires the verb in the outcome to govern the design of the assessment — the assessment must be the thing that can only be completed if the cognitive operation actually occurred. Most outcomes do not pass this test. They name a level. They do not constrain the evidence.

What AI Exposes About Stage 1 Weakness

Generative AI is the canary in the coal mine for Stage 1. It did not change the framework. It changed the cost of avoiding the framework.

Walk through a concrete case. The outcome reads: "Students will analyze and critique research methodologies in published organizational behavior studies." The assessment is a critique paper, six to eight pages, due at the end of the term. The rubric grades clarity of argument, depth of analysis, use of evidence, and writing mechanics. This is a textbook setup. It would have passed any program review in 2018.

Now a student uses a generative model to draft the critique. The model produces a paper that demonstrates clarity of argument, depth of analysis, use of evidence, and acceptable writing mechanics. The student edits lightly. The rubric scores it well. The outcome appears met. The cognitive work — the actual analysis and critique of methodology — did not happen inside the student's head.

The error here was not the AI. The error was that the assessment was designed to measure the artifact rather than the cognitive process, and the outcome was written in a way that did not require the artifact and the process to be coupled. The outcome was descriptive. The assessment was artifact-shaped. The gap between them was always there. AI just made it visible at scale.

This is what I mean when I say AI is a Stage 1 problem, not a Stage 2 problem. Every course that is now panicking about AI-proofing its assessments is, whether it knows it or not, doing remedial Stage 1 work. The assessments were never doing what the outcomes claimed they were doing. The misalignment was tolerated because no widely available tool could exploit it. Now one can, and the misalignment has consequences.

Rewriting Stage 1 Outcomes to Survive AI

The practical move is to write outcomes that specify three things, not one: the cognitive operation, the artifact, and the verification pathway. Most current outcomes specify only the cognitive operation, and only loosely.

Consider two versions of the same outcome.

Weak

Students will analyze rhetorical strategies in primary sources.

This is the version that appears on most syllabi. It signals analysis. It does not specify what the analysis produces, what the constraint is, or how anyone would know analysis occurred rather than summary or paraphrase. The outcome is satisfied by any artifact that looks analytical.

Strong

Students will identify and contrast rhetorical strategies across three primary sources of their choice, defending which they consider most effective in a fifteen-minute oral defense that responds to questioning about their reasoning.

The strong version does four things the weak version does not. It specifies the cognitive operation (identify and contrast). It specifies the artifact (a defended position across three sources). It introduces a constraint (their choice — which prevents the assessment from being reusable as a template). And it specifies the verification pathway (an oral defense with questioning that probes reasoning).

AI can support the work in the strong version. A student can use it to draft, to test arguments, to surface counterexamples. AI cannot complete the verification. The oral defense, conducted live with substantive questioning that follows the student's reasoning rather than running through a script, requires the student to hold the analysis in their own head and respond to challenges in real time. The cognitive work is not optional. The verification pathway requires it.

This is not a return to oral examinations as a universal solution. Oral defenses are one verification pathway. There are others: structured viva-style discussions, in-class problem-solving where the prompt changes between sections, defended portfolio reviews, scaffolded process work where the trajectory of revision is itself the evidence. The point is not the modality. The point is that Stage 1 outcomes have to name the verification, not just the artifact.

A useful design heuristic: if the outcome is satisfied by an artifact that another author could plausibly produce on the student's behalf, the outcome has not specified the verification. Add it.

The Constructive Alignment Check

Biggs (1996) gave us the language of constructive alignment, and the principle is direct: outcomes, assessments, and learning activities must be aligned, and the verbs in the outcomes must govern the alignment. Constructive alignment is not satisfied by adjacency. The outcome being in the same document as the assessment is not alignment. The outcome appearing in a curriculum map is not alignment. Alignment is when the verb in the outcome is also the verb the assessment requires the student to perform, and is also the verb the in-class activities give students practice performing.

The audit is mechanical and worth doing on any course that has been taught more than twice. Build a three-column table. In column one, list every outcome in the syllabus. In column two, list every assessment, mapped to the outcome it claims to measure. In column three, list every major learning activity — readings, lectures, discussions, exercises — mapped to the outcomes it develops.

Then look for two failure modes. The first: outcomes that do not appear in column two. These are outcomes the course claims to deliver but never assesses. Common in syllabi that inherited program-level outcomes wholesale. The second: outcomes that do not appear in column three. These are outcomes the course assesses but never teaches. Common when assessment design was done before learning experiences were designed — when Stage 2 was completed without Stage 3 catching up.

Either failure mode is a constructive alignment break. Both are common. Both are usually invisible until someone runs the audit.

In my experience, a typical undergraduate course in a professional program — business, comp sci, management, public health, anything with external accreditation — has five to eight listed outcomes, of which two are constructively aligned across all three stages, three are partially aligned, and one or two are not aligned at all. The course works fine. Students complete it. They get grades. The accreditor signs off because the outcomes are listed and the artifacts exist. The misalignment is structural and undisclosed.

This is what "courses quietly break" means. They do not break loudly. They break in the gap between what Stage 1 promised and what Stages 2 and 3 actually do. The students absorb the gap as confusion about what the course is for. The faculty absorb the gap as the suspicion that grades don't quite track learning. The institution absorbs the gap as graduates who completed coursework but cannot transfer the work to new contexts. Everyone keeps moving.

What Changes When AI Is Part of Stage 1 Design

Used carefully, generative AI is a serious Stage 1 design partner. Used carelessly, it accelerates exactly the failure modes above by generating plausible-sounding outcomes that constrain nothing.

What AI can usefully do at Stage 1:

What AI cannot do at Stage 1:

The pattern that works is: AI generates, faculty filters; AI drafts, faculty decides; AI surfaces gaps, faculty closes them. This is what I'm building TeachingsByDesign around — tooling that does the generative and audit work so faculty can spend their time on the judgment work Stage 1 actually requires. It is not a replacement for the framework. It is a way to lower the cost of doing the framework properly so more courses actually get done properly.

Fink (2003) made a related argument from a different angle: significant learning experiences require integrated course design where the foundational decisions about what matters are made before everything else. AI does not change this. It changes how fast the downstream consequences of skipping the foundational decisions become visible.

Closing

Stage 1 of Backward Design is where courses quietly break because Stage 1 is where the constraining decisions get made. If those decisions get skipped, deferred, or made loosely, every downstream stage has more freedom than it should have. The assessments drift toward what is easy to grade. The instruction drifts toward what is easy to deliver. The outcomes sit on the syllabus as descriptions of an aspiration the rest of the course is not actually engineered to produce.

This was tolerable for a long time. The tools available to students could not consistently exploit the gap. The artifacts they turned in mostly reflected the cognitive work the outcomes named, because there was no efficient way to produce the artifacts otherwise. We graded the artifact. We inferred the work. The inference was usually close enough.

It is not close enough now. The cost of writing outcomes that do not constrain assessments has gone up. The cost of running courses without transfer goals has gone up. The cost of confusing the verb in the outcome with the verb the assessment requires has gone up. AI did not change the framework Wiggins and McTighe published in 1998 or expanded in 2005. It changed the price of ignoring the framework.

The work of Stage 1 has always been the same: identify established goals, specify transfer goals, articulate enduring understandings and essential questions, and write outcomes that name the cognitive operation, the artifact, and the verification pathway tightly enough that Stage 2 and Stage 3 have no room to drift. What has changed is that the downstream stages can no longer hide a Stage 1 we shortcut.

The honest read is that AI didn't break our courses. It revealed which ones were already broken at Stage 1. That is uncomfortable, and it is also useful. The fix is not new tooling for Stage 2. The fix is the work Backward Design has always asked us to do, done properly this time, because the assessments downstream can no longer carry the weight of a Stage 1 we never finished.

References

Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Longman.

Biggs, J. (1996). Enhancing teaching through constructive alignment. Higher Education, 32(3), 347–364.

Fink, L. D. (2003). Creating significant learning experiences: An integrated approach to designing college courses. Jossey-Bass.

Wiggins, G., & McTighe, J. (2005). Understanding by Design (Expanded 2nd ed.). ASCD.

About the author

Thomas R. Christian is the founder of TeachingsByDesign, an AI-native academic platform built around the coherence engine thesis — that alignment from outcomes downward, not feature-by-feature LMS plumbing, is where higher education actually breaks and where AI can actually help.

He holds a Master's in Adult and Continuing Education from Rutgers University and has spent twenty years designing instruction, training, and curriculum across enterprise CX, healthcare, and financial services. He writes about course design, AI in higher education, and the discipline of getting Stage 1 right.