AI-Enhanced Analysis of Feedback

Overview

This study was the first to examine the potential for large language models (LLMs) to derive insight from education feedback surveys. The study demonstrated, validated, and evaluated a wide range of qualitative survey analysis tasks, including complex, compound tasks such as inductive thematic analysis, comparing LLM output to human annotation. By applying effective prompting practices, we achieved human-level performance on multiple tasks.

We also showed the potential of inspecting LLMs’ chain-of-thought (CoT) reasoning for providing transparency that may foster confidence in practice. Finally, we developed a set of survey response tags/labels that are suitable for use by other educators teaching in-person or online courses.

The process described can save educators dozens of hours of manual labor, depending on the number of survey responses, enabling course improvements, teaching evaluations, and other high-impact applications. This study also has broader significance by demonstrating the feasibility of using LLMs for deriving insight from unstructured data, thereby expanding and streamlining educational quality improvement.

Technical notes

We initially performed this work shortly after OpenAI released GPT-4 API access in 2023. Models used included GPT-3.5 (model: gpt-3.5-turbo-0301) and GPT-4 (model: gpt-4–0314) were used for the multi-label classification task; all other tasks described used GPT-3.5 (model: gpt-3.5-turbo-0613) and GPT-4 (model: gpt-4–0613), along with SetFit (Tunstall et al., 2022), a SentenceTransformers finetuning approach based on Sentence-BERT, for multi-label classification, and a RoBERTa-based model trained on 124 M Tweets for sentiment analysis for comparison to the GPT-based approach. The project involved extensive evaluations, both with more standard metrics as well as using LLM-as-a-judge for tasks where there were not accepted metrics. I later adapted the work for use with Claude 3.5 Sonnet and with Gemini 1.5 Pro and also added a basic agent loop.

Also see my post on some of the behind-the-scenes aspects of this paper.