Thinking Students' AI-Augmented Thinking

This project aims to understand students’ thinking processes when programming with the supported of generative AI or not; and examine

What factors will impact AI-augmented thinking?

AI-augmented thinking framework constructed based on 20-hour screen recording and discourse with ChatGPT

 

We use eye tracker to examine what do participants pay more attention to (e.g., code, outcomes, output by generative AI) during the programming process. 

Eye gaze heatmap of a participant programming with AI

Eye gaze heatmap of a participant programming without AI


Challenges of Chinese Automated Essay Scoring
  • Lack of High-Quality Public Datasets.
  • BERT-based methods struggle with text truncation in long essays.
  • Lack of research in zero-shot and few-shot learning of LLMs.
Our work
  • Developed Novel Chinese Essay Dataset
  • Pioneered the Exploration of Advanced LLMs in Chinese AES by prompt engineering and fine-tuning techniques
  • Comprehensive Performance Analysis  grading tendency, performance of different LLMs, etc.

Model Performance Comparison: GPT-3.5, Qwen, and BERT exhibit similar QWK values but differ in grading tendencies.
Grading Tendencies:

BERT:
Specializes in assigning grade B.
GPT:
Excels in scoring A and C, shows a lenient grading style.
Qwen:
Despite lower performance in grade B, scores other categories with high accuracy,
particularly excels at grading D.
Discrepancies:

GPT-3.5 and BERT:
Show grading discrepancies over one grade level, especially in categories A and D.

Confusion Matrices of Fine-tuning (Normalized by Human Rating)


The project will examine the effects of different collaboration styles between students and Generative AI on students’ creativity, decision-making, and agency and use structural equation modelling (SEM) to examine how these factors collectively affect individual learning performance while controlling for gender and disciplinary background.

Maze design and solving


Hypothesized results


ChatGPT's ability to engage in human-like conversations and massive knowledge grounded in different disciplines holds promise in enriching students with the disciplinary knowledge that they lack. This exploratory study examined how ChatGPT influences students’ interdisciplinary learning quality. 


A significant difference was observed regarding disciplinary grounding: the posts written in the ChatGPT condition exhibited a higher level of disciplinary grounding than those in the ChatGPT Persona and non-ChatGPT conditions. 


This research explored the effectiveness of prompt engineering and fine-tuning approaches of GPT for deductive coding of context-dependent (requires contextual understanding, i.e., Theorizing, Integration, Reflection) and context-independent (based on the current text itself, i.e., Appraisal, Questioning, Social, Curiosity, Surprise) dimensions in social annotation.

The fine-tuned models demonstrated substantial agreement with ground truth in context-independent dimensions and elevated the inter-rater reliability of context-dependent categories to moderate levels.

Cohen’s Kappa for context-independent dimensions 

Cohen’s Kappa for context-dependent dimensions