Publications

mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR
Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
Knowledge Acquisition through Continued Pretraining is Difficult: A Case Study on r/AskHistorians
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
Art Creation with Multi-Conditional StyleGANs
Generation of Bots Based on Observed Behavior