Ranter
Join devRant
Do all the things like
				++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
				Sign Up
			Pipeless API
 
				From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
				Learn More
			Comments
		- 
				
				Update: Actually I'm kinda wrong. I have some fullscreen workload and that is fastest with 8*8*1 workgroups. Both 8*4*1 (wave sized) and 16*16*1 are noticibly slower...
 
 Guess if you're reading from an image per globalInvocationId, cache also plays a big role and having 64 threads closer together in terms of cache access outweighs some of the gains of smaller workgroup sizes?
- 
				
				SIMD performance can be really hard to analytically predict.
 
 In the case of compute shaders, it really boils down in the end on them needing access to something else besides their own vertex/geometry/pixel/whatever.
 
 That forces intrinsic dependencies between them, which coupled with, as you correctly said, caching and threading phenomena, can unpredictably impact performance.
- 
				
				@Lensflare I just came here to say this. No idea, but sounds cool, and also write more about the topic we don't understand because it has the same flavor of fun as reading about pseudo-esoteric wizard rituals in third party DnD supplements.
 
 More blood sacrifice please.
Related Rants
- 
						
							 spongessuck10PSA: An exclamation point is not devRant's version of a hashtag. It means 'not,' as in != means 'not equals.' ... spongessuck10PSA: An exclamation point is not devRant's version of a hashtag. It means 'not,' as in != means 'not equals.' ...
- 
						
							 BobbyTables7PSA: Please don't dump 10GB of your personal photos on your company's shared drives. Especially dont have the ... BobbyTables7PSA: Please don't dump 10GB of your personal photos on your company's shared drives. Especially dont have the ...
- 
						
							 Meta34 Meta34 So according to some reddit user IKEA sends your password as a GET parameter in plain text.
https://reddit.co... So according to some reddit user IKEA sends your password as a GET parameter in plain text.
https://reddit.co...





PSA: The smaller the compute shader workgroups the more efficient they are, down to the wave size (32 on nvidia). Not exactly sure why, but looks like if you don't need group shared memory always have your workgroups be wave sized
Just this alone gave me a 30%+ performance increase. And combined with a few other changes got me from 50 µs to 10 µs, yay!
random
vulkan
psa