-
Notifications
You must be signed in to change notification settings - Fork 76
Open
Description
torchrun --nproc_per_node=8 --master_port=29503 test/torch/correctness_test.py --collective allreduce --nelem 10556587 --dtype float
github.com/microsoft/mscclpp/apps/nccl/src/allreduce.hpp:679: void allreduce11(const void *, void *, void *, mscclpp::BaseMemoryChannelDeviceHandle *, mscclpp::SwitchChannelDeviceHandle *, unsigned long, unsigned long, int, int) [with T = float]: block: [31,0,0], thread: [943,0,0] Assertion sizePerRank % alignment == 0 failed.
any plan to provide a fallback version of allreduce ?
Metadata
Metadata
Assignees
Labels
No labels