Assessing Thinking, Not Just Artifacts: Rethinking Classroom Assessment in the Age of AI

3 Key Factors When Designing Assessments That Account for AI Use

Teachers confront a practical choice: ban AI tools outright or teach students to use them critically. The latter demands changes in how we assess learning. When comparing assessment approaches, three factors matter most.

1. The evidentiary power of the student’s process

Assessment should privilege evidence that shows how a student arrived at an answer. Research notes, draft versions, annotated bibliographies, and revision rationales reveal reasoning, judgment, and problem solving. These artifacts make it possible to judge whether a student understands material or merely produced a polished final product, possibly with outside assistance.

2. Fairness and access

Not every student has consistent access to the same devices, subscriptions, or quiet spaces. A policy that treats AI as a black-or-white rule can create unequal outcomes. Assessments need to be designed so that students with limited access can still demonstrate their thinking without being unfairly penalized.

3. Scalability for teachers

Collecting process artifacts increases grading time. Sustainable assessment models balance depth of evidence with teacher workload: targeted checkpoints, sampling strategies, and clear rubrics keep the system manageable while improving validity.

In contrast to rules that focus only on final output, assessments built around these factors produce richer and more defensible measures of student understanding.

Traditional Assessment: High-Stakes Final Artifact Focus and Its Limits

The most common approach remains grading a single final artifact: an essay, a lab report, a problem set. It is efficient and familiar. Teachers can set clear expectations, apply a rubric, and grade a stack of similar products quickly.

Pros of this approach:

    Efficiency: streamlined grading and straightforward expectations. Standardization: easier to compare students when they produce the same deliverable. Preparation for summative testing: mirrors the structure of many formal exams and college assignments.

Cons that have become more pressing with AI tools:

image

    Opacity: the final artifact conceals the cognitive steps that led to it. A well-written essay could represent original thinking or AI-assisted assembly. False signals: grades may reward surface features like polish and citation formatting rather than depth of analysis. Integrity gaps: students can use AI to draft answers, shortening the time needed to produce the artifact while bypassing learning.

On the other hand, defenders of traditional artifacts argue that policing tool use is impractical and that focusing on outcomes is what matters in many real-world tasks. Their point is that professionals often produce final products with help and that workplace performance is judged by results. That view has merit, but it assumes students already possess the judgment to use tools responsibly. We cannot assume that without teaching and assessment aligned to that aim.

Assessing Process: Research Notes, Drafts, and Revision Rationale in an AI Era

Shifting assessment to include process artifacts makes the student's thinking visible. Here are practical methods and how they compare to the traditional model.

image

Key process artifacts to collect

    Research logs or annotated bibliographies that show sources consulted and why they were chosen. Sequential drafts with teacher or peer comments preserved, showing development over time. Revision memos in which students explain what they changed and why. Short reflective write-ups that describe decision points, uncertainties, and trade-offs.

Benefits compared to final-artifact-only grading:

    Authenticity: teachers can see the intellectual moves students make, not just the final arrangement of ideas. Actionable feedback: instructors can target instruction to gaps in research habits, argument structure, or evidence use. Better support for academic honesty: when revision histories and logs align with final work, it’s easier to confirm ownership.

Practical strategies for implementation

Require short checkpoints rather than long process reports. For example, a 200-word research plan in week one, a draft in week two, and a revision rationale with the final submission. Use simple digital tools that track version history automatically - document platforms, submission timestamps, or learning management systems. Create a rubric that assigns explicit points to process artifacts. Make those points meaningful so students commit to the work. Model transparent use of AI tools. Ask students to annotate any AI-generated material and explain how they edited or verified it.

In contrast to blanket bans, this approach teaches students how to use tools responsibly while producing stronger evidence of learning. It anticipates use rather than pretending tools do not exist. That said, there are trade-offs: teachers will need to set clear expectations and enforce them consistently, which takes upfront effort.

Sample rubric elements that emphasize thinking

Dimension What to collect Why it matters Research quality Annotated bibliography, source selection rationale Shows ability to evaluate sources and build an evidence base Draft development Two or more drafts with comments Reveals revision decisions and growth in argument Argument clarity Final artifact plus revision memo Links final claims to reasons and evidence Tool transparency Declaration of AI use and edits made Assesses judgment and verification practices

Some critics point out that students can fake process artifacts. They might paste AI-generated text into drafts or fabricate research logs. This is a real risk. To counter it, combine process artifacts with short in-class checkpoints, oral questioning, or quick in-person reflections. The combination raises the cost of fabrication and makes process evidence more reliable.

Oral Defenses, Portfolios, and Authentic Tasks: Alternative Assessment Models

Beyond process artifacts, there are several other approaches that can be mixed into a coherent assessment strategy. Each has different affordances and constraints.

Oral defenses and interviews

Short, structured oral exams let teachers probe understanding quickly. A five- to ten-minute defense requires students to explain key choices, respond to a teacher’s question, and show command of their work.

    Pros: Harder to fake, reveals deeper comprehension, provides immediate feedback. Cons: Time-intensive, anxiety-inducing for some students, scheduling overhead in large classes.

Portfolios

A curated collection of work across a term offers a longitudinal view of student growth. Portfolios pair well with reflective pieces where students articulate learning goals and evidence.

    Pros: Shows development, supports diverse learning evidence, encourages metacognition. Cons: Workload for students and teachers, requires clear guidelines to be meaningful.

Project-based and authentic tasks

Tasks that mirror real-world problems ask students to produce things that matter outside the classroom. In those contexts, use of external tools is normal. Assessment focuses on how students frame problems, justify solutions, and communicate results.

    Pros: High relevance, connects skills to practice, accepts tools as part of modern workflows. Cons: Requires careful scaffolding, rubrics must value process as much as product.

In contrast to grading a single essay, these alternatives ask students to show what they can do across contexts. They can integrate AI https://blogs.ubc.ca/technut/from-media-ecology-to-digital-pedagogy-re-thinking-classroom-practices-in-the-age-of-ai/ use as a legitimate resource while still demanding that students articulate and defend their choices.

Choosing an Assessment Strategy That Measures Thinking, Not Just Output

Deciding which approach to use depends on course goals, class size, available technology, and teacher capacity. Below are practical steps teachers can take to move from rhetorical bans to assessment practices that actually measure thinking.

Step 1: Define the learning outcome precisely

Ask what specific thinking skill you want to assess. Is it the ability to evaluate sources? To construct a logical argument? To design and test an experiment? When outcomes are concrete, you can choose artifacts that display relevant evidence.

Step 2: Decide what artifacts will best show those skills

Choose from drafts, annotated sources, revision memos, brief oral defenses, and timed in-class tasks. Combine two or three artifacts so students have multiple ways to demonstrate mastery. For example, pair a final report with a 300-word revision rationale and a five-minute oral check.

Step 3: Be explicit about acceptable tool use

Create a short policy that explains how students should document use of AI tools: what must be declared, what counts as acceptable editing, and how to verify sources that AI suggested. Teach students how to check AI output for accuracy and bias. This turns tool use into a skill to be assessed rather than a rule to be broken.

Step 4: Use rubrics that weight process and reasoning

Allocate a meaningful portion of the grade to process. For instance, 50 percent for content quality and argument, 30 percent for process artifacts, and 20 percent for clarity and mechanics. The exact split will depend on course goals, but the point is to make process count.

Step 5: Keep teacher workload sustainable

    Sample process artifacts instead of grading every single one in depth. Randomly select a subset for detailed feedback and grade the rest with a simpler checklist. Use peer review to add formative feedback and reduce grading time for teachers. Automate version tracking with digital tools so submission timestamps and draft histories are easy to retrieve.

On the other hand, if a teacher’s class has 150 students and limited support, it may not be feasible to run elaborate oral defenses. In such cases, prioritize a small set of high-impact interventions: require a research plan, a short revision memo, and an in-class timed quiz on key concepts. These create multiple checkpoints without overwhelming staff.

Addressing contrasting viewpoints

Some educators argue that workplace expectations prioritize results, so schooling should too. They worry that process-heavy assessment reduces emphasis on final performance. That concern is valid when assessments reward only process at the expense of product quality. The better path is balanced assessment: demand a high-quality final product while also requiring transparent evidence of thinking. This prepares students for workplaces where both results and professional judgment matter.

Others fear that teaching students to use AI will encourage dependency. The answer is not prohibition but structured practice: assign tasks that force students to evaluate AI outputs, compare alternative answers, and note their changes. Over time, students build better judgment because they must own their decisions publicly in classroom artifacts.

Final recommendations: Practical next steps for teachers

    Start small. Add one process artifact to your next assignment and make it worth 10-20 percent of the grade. Teach a mini-lesson on checking AI-generated claims and on source evaluation. Require a short AI-use declaration for any assignment where tools might be used. Mix checkpoint types: one in-class timed task, one submission with draft history, and one brief oral or peer-review exchange. Share expectations with students early and model the reasoning steps you want to see.

In contrast to a ban, these steps acknowledge reality and aim to build student competence. They make assessment more meaningful and more honest about what grades represent: not just a polished final product but the thinking behind it.

Closing thought

Banning AI tools buys simplicity at the cost of opportunity. If schools instead teach how to use tools responsibly and redesign assessments to capture process, students practice the judgment they will need in college and work. That shift requires careful planning and some added effort, but it restores the central purpose of assessment: to measure and promote learning, not to police artifacts.