3

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
NeurIPS 2024
Knowledge Acquisition through Continued Pretraining is Difficult: A Case Study on r/AskHistorians
KnowLLM@ACL 2024
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
COLM 2024, previously WANT@NeurIPS 2023 (Best Paper Award)