by zozbot234 a day ago
SMSP = Streaming Multiprocessor Sub-Partition, found in recent nVidia architectures - effectively partitioning each Streaming Multiprocessor into multiple complete sub-cores with separate register files and program counters, but accessing the same local memory. (AMD architectures have a similar development, with 'dual' compute units.) This creates overhead when running very large warps, since they can only have access to a fraction of the complete SM. But warps under the VectorWare model should be fairly small (running CPU-like code with fairly limited use of lane parallelism), so this doesn't have that much impact from that POV.