HN comments for: Fast LLM Inference From Scratch (using CUDA)