Posted 2026-06-29Updated 2026-07-01Study8 minutes read (About 1128 words)2026-06-29-QWenVLQwen-VL Paper Notes: Visual Adapter, Cross-Attention, 2D Position Encoding, and RoPERead more
Posted 2026-06-29Updated 2026-07-01Study Experiment6 minutes read (About 918 words)2026-06-29-SlimeSearchR1ExampleTraining and evaluation of the slime Search-R1-trained Qwen2.5-3B model on NQ and HotpotQA.Read more
Posted 2026-06-10Updated 2026-07-01Study Experiment16 minutes read (About 2394 words)PPO vs GRPO — Post-Training Qwen2.5-0.5B-Instruct on GSM8K with veRLComparing PPO and GRPO for post-training Qwen2.5-0.5B-Instruct on GSM8K with veRL, focusing on a small-model, verifiable-reward setting where the reward is rule-based correctness rather than a learned reward model.Read more
Posted 2026-06-07Updated 2026-06-17Study18 minutes read (About 2732 words)2026-06-07-RLClassicClassic RL Methods including DP, MC, and TDRead more
Posted 2026-06-06Updated 2026-06-16Study14 minutes read (About 2119 words)2026-06-06-RLPreRL PreliminaryRead more
Posted 2026-06-03Updated 2026-07-01Study Experiment13 minutes read (About 1982 words)2026-06-03-SwingUpCartpoleFrom CartPole-v1 to CartPole Swing-UpRead more
Posted 2026-06-01Updated 2026-07-01Research Experiment9 minutes read (About 1401 words)2026-06-01-Slahmr_examplesslahmr examplesRead more
Posted 2025-05-03Updated 2026-06-01Study4 minutes read (About 580 words)2025-05-03-ScanKogge-stone scan and Brent-Kung scanRead more
Posted 2025-05-03Updated 2026-06-01Study5 minutes read (About 696 words)2025-05-03-Reductionreduction treesRead more
Posted 2025-04-27Updated 2026-06-01Study10 minutes read (About 1571 words)The CUDA programming modelGPU architecture and CUDA programming modelRead more