Enable javascript in your browser for better experience. Need to know to enable it? Go here.

Towards transparent AI grading:Entropy as a signal for human-AI disagreement

By Karrtik IyerManikandan RavikiranPrasanna Pendse and Shayan Mohanty

Automated grading system can quickly score short-answer questions but they don’t always score when the decision is uncertain or controversial. This work proposes a novel method called “Semantic entropy” that measures how different GPT-4 explanations for the same students are especially when the human-graders have a disagreement. This work similar explanations and calculate how diverse these groups are without just looking at the final scores. Three research questions are addressed. They are:

 

  1. Does semantic entropy align with human grader disagreement? 

  2. Does it generalize across academic subjects? 

  3. Is it sensitive to structural task features such as source dependency?

 

Experiments on the ASAP-SAS dataset show that semantic entropy correlates with rater disagreement, varies meaningfully across subjects, and increases in tasks requiring interpretive reasoning. These results underscore semantic entropy’s potential as a domain- and task-sensitive signal for triaging ambiguous or contentious grading cases in educational settings.

 

 (Research submission here.)