Welcome!
My name is Shahriar Golchin, and I’m a final-year PhD candidate in the Department of Computer Science at the University of Arizona, advised by Prof. Mihai Surdeanu. Previously, I was a Machine Learning Scientist Intern at Walmart Global Tech and Harvard Medical School. I received my MSc and BSc in Electrical Engineering, Communication Systems, from Tarbiat Modares University and University of Zanjan, respectively, both with first-class honor.
My research interests include Natural Language Processing, Large Language Models (LLMs), and Machine/Deep Learning. My PhD research focuses on the issue of data contamination (data leakage) within LLMs. I’m particularly interested in developing methods that can detect data contamination in fully black-box settings. Currently, I’m working on Machine Unlearning techniques to fix the issue of data contamination.
News
Mar 4, 2024: Awarded Outstanding Graduate Scholarship by the Department of Computer Science, University of Arizona.
Mar 1, 2024: Awarded Galileo Circle Scholarship by the College of Science, University of Arizona.
Jan 15, 2024: Time Travel in LLMs: Tracing Data Contamination in Large Language Models accepted at ICLR 2024 as a Spotlight paper!
Selected Publications
(For a full list of all publications, please see my Google Scholar page.)
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Shahriar Golchin, Mihai Surdeanu
ICLR 2024 — Spotlight Paper (notable top 5%) [Featured in The New Stack, March 2024]
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
Shahriar Golchin, Mihai Surdeanu
Preprint available on arXiv [Featured in The New Stack, March 2024]
Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords
Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour
ACL 2023 the 8th Workshop on Representation Learning for NLP (RepL4NLP)
A Natural Language Processing Pipeline to Study Disparities in Cannabis Use and Documentation Among Children and Young Adults: A Survey of 21 Years of Electronic Health Records
Nazgol Tavabi, Marium Raza, Mallika Singh, Shahriar Golchin, Harsev Singh, Grant Hogue, Ata Kiapour
Nature Digital Medicine
Building Large-scale Registries from Unstructured Clinical Notes Using a Low-resource Natural Language Processing Pipeline
Nazgol Tavabi, James Pruneski, Shahriar Golchin, Mallika Singh, Ryan Sanborn, Benton Heyworth, Amir Kimia, Ata Kiapour
Artificial Intelligence in Medicine