Built with

Welcome!

My name is Shahriar Golchin, and I’m a final-year PhD candidate in the Department of Computer Science at the University of Arizona, advised by Prof. Mihai Surdeanu. Previously, I was a Machine Learning Scientist Intern at Walmart Global Tech and Harvard Medical School. I received my MSc and BSc in Electrical Engineering, Communication Systems, from Tarbiat Modares University and University of Zanjan, respectively, both with first-class honor.
My research interests include Natural Language Processing, Large Language Models (LLMs), and Machine/Deep Learning. My PhD research focuses on the issue of data contamination (data leakage) within LLMs. I’m particularly interested in developing methods that can detect data contamination in fully black-box settings. Currently, I’m working on Machine Unlearning techniques to fix the issue of data contamination.

News

Mar 4, 2024: Awarded Outstanding Graduate Scholarship by the Department of Computer Science, University of Arizona.
Mar 1, 2024: Awarded Galileo Circle Scholarship by the College of Science, University of Arizona.
Jan 15, 2024: Time Travel in LLMs: Tracing Data Contamination in Large Language Models accepted at ICLR 2024 as a Spotlight paper!

Selected Publications

(For a full list of all publications, please see my Google Scholar page.)
Time Travel in LLMs: Tracing Data Contamination in Large Language Models Shahriar Golchin, Mihai Surdeanu ICLR 2024Spotlight Paper (notable top 5%) [Featured in The New Stack, March 2024]
Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour ACL 2023 the 8th Workshop on Representation Learning for NLP (RepL4NLP)
A Natural Language Processing Pipeline to Study Disparities in Cannabis Use and Documentation Among Children and Young Adults: A Survey of 21 Years of Electronic Health Records Nazgol Tavabi, Marium Raza, Mallika Singh, Shahriar Golchin, Harsev Singh, Grant Hogue, Ata Kiapour Nature Digital Medicine
Building Large-scale Registries from Unstructured Clinical Notes Using a Low-resource Natural Language Processing Pipeline Nazgol Tavabi, James Pruneski, Shahriar Golchin, Mallika Singh, Ryan Sanborn, Benton Heyworth, Amir Kimia, Ata Kiapour Artificial Intelligence in Medicine