<- All tags
- nanoGPT: 300 dòng PyTorch tái tạo GPT từ đầu · ~12 min read
- Distributed training: DP, DDP, FSDP, pipeline parallel · ~9 min read
- Mixed precision FP16 BF16 và gradient checkpointing · ~9 min read
- Training loop: forward, backward, optimizer, lr schedule · ~9 min read