Zhengyuan (Dora) Dong
Ph.D. Student, Data Systems Group Cheriton School of Computer Science, University of Waterloo
My Research Interests: Data Lake, Model Lake, Multi-agent System, AI for Science
I like jogging, ai for music, ai for productivity, ai for occult, and have two parrots.
🏃🎹🧑💻🔮🦜🦜
I am open for collaboration, always welcoming discussion.
News
- 2025 Dec. Our demo paper LazyVLM got accepted by ICDE DEMO
- 2025 Dec. We released our paper ModelTables on arXiv
Publications
- [Preprint] PaperHuggingFaceGithubModelTables: A Corpus of Tables about Models
- BioMANIA: Simplifying bioinformatics data analysis through conversation
Service
Open Source Projects

ModelTables
Status: Completed ✅ at Jun 2025. Updated at Dec 2025
ModelTables is a benchmark corpus of tables in Model Lakes that captures structured semantics of performance and configuration tables often overlooked by text-only retrieval. Built from Hugging Face model cards, GitHub READMEs, and referenced papers, it links tables to their surrounding model and publication context. The corpus covers over 60K models and 90K tables, with multi-source ground truth using citation links, model-card inheritance, and shared training datasets. We evaluate table search methods including Data Lake operators (unionable, joinable, keyword) and IR baselines (dense, sparse, hybrid retrieval), demonstrating the first large-scale benchmark for structured model knowledge discovery.

LazyVLM
Status: Completed ✅ at Mar 2025. To Be Released
LazyVLM is a neuro-symbolic video analytics system that combines the flexibility of Vision Language Models (VLMs) with the efficiency of symbolic methods. It allows users to query open-domain video data at scale using a semi-structured text interface, decomposing complex video queries into efficient operations for robust and scalable analytics.

BioMANIA
Status: Completed ✅ at Oct 2023. Updated at Oct 2024
An AI-driven chatbot platform that simplifies bioinformatics data analysis through conversation. Features include front-end and back-end components, extensive data setup, model fine-tuning, and deployment solutions across Docker, Railway, and terminal CLI.

DocLocal
Status: Completed ✅ in Jun 2023
A GUI application that downloads and manages GitHub repository README files locally while offering integrated web search functionality through popular search engines. The tool streamlines documentation access by automatically fetching README files from repositories and displaying them in a user-friendly interface for offline browsing.
Teaching
- Mentor, CS 399 Readings in Computer Science (F25)
- Teaching Assistant, CS 348 Introduction to Database Systems (S24, S25, F25)
- Teaching Assistant, CS 136 Elementary Algorithm Design and Data Abstraction (W24, F24, W25)
Honors
- Prov-Doc Entrance Award, University of Waterloo, 2024
- International Doctoral Student Award (IDSA), University of Waterloo, 2024
Talks
- 2025 Dec. An Interactive Tool for SPARQL Query Refinement Using Natural Language Explanations at OnDBD 2025
- 2025 Nov. Raw Table Synthesis through Decomposition at CAN-CWIC
- 2025 Jul. Semantic Table Discovery in Model Lakes: A Benchmark at DSG Talk & Ph.D. Seminar Talk , University of Waterloo
- 2025 Mar. Introduction to Data Lakes and Tabular Data in NLP at R2L Lab
- 2024 Oct. Scientific Discovery Agent at R2L Lab
- 2024 Jul. BioMANIA: Simplifying Bioinformatics Data Analysis (Poster) at ISMB
- 2024 Jan. Language Model Pretraining at CS886 Presentation