3

Knowledge Acquisition through Continued Pretraining is Difficult: A Case Study on r/AskHistorians
KnowLLM@ACL 2024
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
COLM 2024, previously WANT@NeurIPS 2023 (Best Paper Award)