Thinking Students' AI-Augmented Thinking
This project aims to understand students’ thinking processes when programming with the supported of generative AI or not; and examine
What factors will impact AI-augmented thinking?
AI-augmented thinking framework constructed based on 20-hour screen recording and discourse with ChatGPT
We use eye tracker to examine what do participants pay more attention to (e.g., code, outcomes, output by generative AI) during the programming process.
Eye gaze heatmap of a participant programming with AI |
Eye gaze heatmap of a participant programming without AI |
Challenges of Chinese Automated Essay Scoring
- Lack of High-Quality Public Datasets.
- BERT-based methods struggle with text truncation in long essays.
- Lack of research in zero-shot and few-shot learning of LLMs.
Our work
- Developed Novel Chinese Essay Dataset
- Pioneered the Exploration of Advanced LLMs in Chinese AES by prompt engineering and fine-tuning techniques
- Comprehensive Performance Analysis grading tendency, performance of different LLMs, etc.
Model Performance Comparison: GPT-3.5, Qwen, and
BERT exhibit similar QWK values but differ in grading tendencies.
Grading Tendencies:
BERT: Specializes in assigning grade B.
GPT: Excels in scoring A and C, shows a
lenient grading style.
Qwen: Despite lower performance in grade B,
scores other categories with high accuracy,
particularly excels at grading D.
Discrepancies:
GPT-3.5 and BERT:
Show grading discrepancies over one grade level, especially in categories A and
D.
Confusion Matrices of Fine-tuning (Normalized by Human Rating)
The project will examine the effects of different collaboration styles between students and Generative AI on students’ creativity, decision-making, and agency and use structural equation modelling (SEM) to examine how these factors collectively affect individual learning performance while controlling for gender and disciplinary background.
Maze design and solving
Hypothesized results
ChatGPT's ability to engage in human-like conversations and massive knowledge grounded in different disciplines holds promise in enriching students with the disciplinary knowledge that they lack. This exploratory study examined how ChatGPT influences students’ interdisciplinary learning quality.
A significant difference was observed regarding disciplinary grounding: the posts written in the ChatGPT condition exhibited a higher level of disciplinary grounding than those in the ChatGPT Persona and non-ChatGPT conditions.
This research explored the effectiveness of prompt engineering and fine-tuning approaches of GPT for deductive coding of context-dependent (requires contextual understanding, i.e., Theorizing, Integration, Reflection) and context-independent (based on the current text itself, i.e., Appraisal, Questioning, Social, Curiosity, Surprise) dimensions in social annotation.
The fine-tuned models demonstrated substantial agreement with ground truth in context-independent dimensions and elevated the inter-rater reliability of context-dependent categories to moderate levels.
Cohen’s Kappa for context-independent dimensions | Cohen’s Kappa for context-dependent dimensions |