blog archive.
I use this space to write about AI systems, model training, and the practical engineering decisions behind shipping reliable machine learning work.
The writing follows the same pattern as the rest of this portfolio: direct notes, measurable trade-offs, and lessons that came out of real implementation work rather than abstract speculation.
Why OpenAI Sent Me $500 for a Research Project
Apr 20268 min read- A complete breakdown of how I built XSA4 + EMA + GPTQ-Int6, the submission that landed at 1.1271 BPB and placed top-5 globally in OpenAI's Parameter Golf challenge.
- Every technique explained from first principles: what bits-per-byte is, how Cross-Sparse Attention works, why EMA produces better weights than a single checkpoint, and what GPTQ actually does differently from naive quantization.
Read article- Parameter Golf
- OpenAI
- Cross-Sparse Attention
- GPTQ
- Model Compression
How I Reached #1 on ARC-AGI-2
Apr 20267 min read- How I adapted the parallel agent and budget allocation patterns I built for AIMO3 to reach the top of the ARC Prize 2026 leaderboard, using per-puzzle test-time training, a vocabulary-restricted DFS beam search with KV cache reuse, and augmented re-scoring.
- A breakdown of the full pipeline: what I tried, what finally worked, and the three engineering tricks that made the difference.
Read article- ARC-AGI-2
- Test-Time Training
- Qwen
- Kaggle
- Reasoning
How I Won a Solver Medal at AIMO3
Apr 20266 min read- A walkthrough of the agentic system I built for AIMO3, the $2.2M Kaggle competition to make AI solve International Mathematical Olympiad problems, using Gemma-4-31B, parallel sandboxed code execution, weighted voting, and a budget-aware time allocator.
- What I tried, what flopped, and the three engineering tricks that actually moved the needle.
Read article- AIMO3
- Gemma 4
- Math Reasoning
- Agentic LLMs
- Kaggle
Designing LLM Systems That Stay Fast Under Load
Apr 20266 min read- A practical note on how I break a strict latency budget across retrieval, prompt assembly, inference, and post-processing when moving AI systems from demo to production.
- I focus on concrete engineering trade-offs such as caching strategy, parallel tool execution, observability, and the cases where a smaller model beats a larger default on total system throughput.
Read article- LLM Systems
- Latency
- Inference
- Observability
What Fine-Tuning Smaller Models Taught Me About Reasoning
Mar 20268 min read- A notes-driven write-up on building compact reasoning models under limited compute, covering dataset curation, PEFT setup, evaluation discipline, and failure analysis.
- The piece argues for tighter experiment loops, stronger benchmark hygiene, and reproducible training pipelines over oversized runs that are expensive to iterate on and difficult to trust.
Read article- Fine-tuning
- Reasoning
- PEFT
- Evaluation
Pragnyan Ramtha ยท 2026