SCALE@NTU Invited Talk: Tailoring Pretrained Language Models for Data Preparation
This research seminar is organized by Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). Please find below the registration information:
To attend the seminar physically (limited seats available), click here
To attend the seminar on Teams, click here
Abstract: Data preparation, which is the process of turning big data into good data, is a crucial step of data science and machine learning. A well-known statistic is that data scientists spend at least 80% of their time on data preparation. Recently, there has been a lot of studies on data preparation using pretrained language models (PLMs). The key idea of these studies is to leverage PLMs to generate rich data representation, and allow customization on data preparation tasks. PLMs have been shown very promising: they can outperform many existing data preparation solutions with significantly less labeling efforts. In this task, I will present our recent works on tailoring PLMs for fundamental tasks for data preparation, such as entity resolution and text-to-SQL. I will also discuss research challenges and future directions.
Speaker: Ju FAN is a professor at Renmin University of China. He received his Ph.D. from Tsinghua University, and worked as a research fellow at National University of Singapore. His research interests are in general area of data management, and his current research focuses on building next-generation data preparation systems to allow anyone to access the good data for data science and machine learning applications. He has published more than 50 papers at top conferences/journals, including SIGMOD, VLDB, ICDE and TKDE. He served as a publication chair for VLDB 2023 and a PC member for SIGMOD, VLDB, ICDE and KDD. He is also a recipient of ACM China Rising Star award.