About me

Hi, I am Xiaolong Li, a research assistant and advised by Professor Reynold C.K. Cheng at the University of Hong Kong. I am interested in Text-to-SQL, Data Science Code Generation, LLM Agent.

Education

  • The University of Hong Kong
    Master of Science in Computer Science (General Track)
    Nov 2024
    • Overall GPA: 3.8/4.0
    • Graduate with Distinction
  • The University of British Columbia
    Bachelor of Science in Computer Science
    Mar 2023
    • Overall GPA: 3.8/4.0
    • Dean’s Honour List (Nov 2021)
    • Graduate with Distinction

Research Experience

BIRD-SQL: Data Quality Check, Submission Evaluation, and Metric Analysis & Optimization

May 2024 — Aug 2024

  • Performed comprehensive data quality checks on the BIRD dataset development set to ensure data integrity and consistency.
  • Evaluated more than 30 submissions from companies and universities, including cutting-edge text-to-SQL parsers from Google Cloud, AT&T, and Distyl AI on the BIRD benchmark.
  • Developed two novel text-to-SQL evaluation metrics:
    • Soft-F1 score: A more lenient metric that reduces the impact of column order and missing values in the tables produced by predicted SQL queries, providing a more accurate assessment of parser performance.
    • R-VES (Reward-based VES): A metric for measuring the efficiency of text-to-SQL parsers using discrete functions.

BIRD-SQL Mini-Dev: A Multilingual, Lightweight Version of the BIRD Development Set

May 2024 — Aug 2024

  • Designed the Mini-Dev dataset to facilitate efficient and cost-effective development cycles.
  • Reduced the development set size from 1,534 to 500 examples by prioritizing error correction and ensuring a balanced difficulty distribution. The subset includes all relevant keywords as outlined in the BIRD benchmark and samples across all databases.
  • Enhanced the practicality of the BIRD system by supporting multiple SQL dialects, including MySQL and PostgreSQL, to better align with industry applications.
  • Collaborated with Arcwise, an AI data analysis company, to conduct a comprehensive analysis of the BIRD Mini-Dev dataset, further improving the dataset’s robustness and reliability.
  • Evaluated Mini-Dev baseline performance on over 10 models, demonstrating comparable performance to the full test set and confirming its representativeness of the BIRD dataset’s features and attributes.

Work Experience

Teaching Assistant / Introduction to Relational Databases

Department of Computer Science, The University of British Columbia
July 2022 — Dec 2022

  • Assisted professor in lecture preparation, including topics such as database systems, ER models, normalization, formal relational query language, and SQL, as well as curriculum discussions and student feedback collection.
  • Graded assignments, mentored group projects, and maintained workflow.