This guy doesn't exactly sound like a couch expert so I do disagree with delegating him as such. He's in the trenches working with Vulcan and DX12. Here's a little more of the quote
@DrezKill had posted.
"
According to Hans-Kristian Arntzen, a prominent open-source developer working on Vkd3d, a DirectX 12 to Vulkan translation layer,
Starfield is not interacting properly with graphics card drivers. "
and from his page on github:
"NOTE: The meaning of this PR, potential perf impact and the problem it tries to solve is being grossly misrepresented on other sites.
The goal of this refactor is to optimize for cases where games (Starfield in particular) uses advanced ExecuteIndirect in very inefficient ways.
- Indirect count, but indirect count ends up being 0.
- Non-indirect count, but none of the active draws inside the multi-draw indirect actually result in a draw. Multiple back-to-back ExecuteIndirects are called like this, causing big bubbles on the GPU.
In RADV, a special optimization was added which uses indirectCount in DGC as a predicate when possible. This lets us skip over the prepare cmdbuffer as well as the INDIRECT_BUFFER execution which is about 10x faster than spawning a small CS, adding extra sync and then executing a NOP-ed out indirect buffer.
We can take advantage of this by doing our own prologue which scans through the ExecuteIndirect buffer in the scenario where non-indirect count is used. Using indirect count is not slower than direct count at all.
To make this efficient, this PR refactors the command buffer emit system so that instead of having one init command buffer and one "real" command buffer, we have N sequences of this pattern. This allows us to split a command stream when observing an INDIRECT_ARGUMENT barrier, and we can batch up any patch CS easily. For example, given a D3D12 command stream like:
- CS to generate indirect arguments
- ResourceBarrier(UAV -> INDIRECT_ARGUMENT)
- ExecuteIndirect (arg0, command_count = 16)
- ExecuteIndirect (arg1, command_count = 16)
- ExecuteIndirect (arg2, command_count = 16)
- ExecuteIndirect (arg3, command_count = 16)
we can now transform this into:
iteration[0].vk_cmd_buffer:
- CS to generate
- UAV -> INDIRECT_ARGUMENT
- Begin new sequence
iteration[1].vk_init_buffer:
- Scan arg0 (emit count 0 if all empty draws)
- Scan arg1 (emit count 0 if all empty draws)
- Scan arg2 (emit count 0 if all empty draws)
- Scan arg3 (emit count 0 if all empty draws)
- CS -> INDIRECT_ARGUMENT
iteration[1].vk_cmd_buffer:
- ExecuteIndirect(arg0, count0) (over 10x faster throughput if we can predicate out work)
- ExecuteIndirect(arg0, count0)
- ExecuteIndirect(arg0, count0)
- ExecuteIndirect(arg0, count0)
This kind of command reordering can be extended later to whatever use case we have in mind, but reordering indirect work is the main important case I think.
As a heuristic, splitting command buffers isn't ideal, so we only consider a split if a device ever created a fancy execute indirect command signature and existing content outside Halo and Starfield should not observe any difference in behavior here."
To me, this does not look like the work of a couch expert.
And now there are more reports from AMD and NV sides of the fence:
AMD's previous-generation (RDNA 2) Radeon RX 6800 XT, when paired with a 12th Gen Intel Core i9-12900K processor, offers up to 46 percent faster performance than the NVIDIA GeForce RTX 3080 for Starfield on similar hardware at ultra settings. This is according to a new, 32-minute analysis of the...
www.thefpsreview.com
Sure we've seen games in the distant past that greatly favored NV cards over AMD but recently that has not been true to the degree of 46% when not using NV-specific hardware (i.e. RT and Tensor cores). In terms of rasterization, many recent similar tier cards are often within 20% and sometimes even single-digit ranges so a 46% difference is a flag.
and
Starfield players who want to see all of the game's effects on PC may need to switch to something other than AMD graphics. According to a "Dear AMD Card User" post on the official Starfield subreddit that has been gaining traction, red team's GPUs are failing to render local stars from any...
www.thefpsreview.com
The bottom line is that even as Bethesda puckered up to smooch AMD you-know-where, it needs to get some patches out. The game, for no reason anyone can clearly identify why, does not run well on NV cards and now reports are surfacing that it has other issues on AMD cards. Perhaps Todd's crew were too scared to inform him of the facts before release and before he tried to throw PC users under the bus.