Ok I'll give you that... but it's going to take a while to get out of the cloud and into integrations of Datacenters. Hell it's taking a long time to get some companies to even try an EPYC CPU in the datacenter and you know Intel will be fighting tooth and nail to keep ARM out.
To buy Epyc... you actually have to be able to buy Epyc. Cloud is really where Epyc should be making gains, but corporate datacenters? It's a change of platform, and that carries some predictable and some unpredictable risks. Including supply!
AMD is still trying to figure out how to make
enough CPUs. Granted it's a nice place for them to be relative to their recent past, but the basic reality is that Intel still actually physically produces an order of magnitude more, and TSMC cannot match that for AMD, even if AMD CPUs were all they produced.
Right now I still feel like ARM is more of a ultra portable or ASIC type of a processor than a general purpose processor capable of performance. But that can change I am open minded.
x86, as currently and broadly implemented, is a beast at chewing through out-of-order instructions. ARM is a supremely competent base, but it's not really ever been pushed in that direction, mostly and quite simply because Intel (and occasionally AMD) have that use-case covered.
Now, Apple's goal is to stop paying Intel, at least on a per-unit basis. Their ARM project is a decade in the making; it's not just the CPUs, but also all of the dedicated co-processors to handle all of the stuff that ARM isn't good at, and then all of the software from micro-code to drivers to OS stacks to APIs and user-facing software.
And they're still not trying to beat x86 at what x86 is good at. The thing is, most consumers- most
end users- don't need what x86 is good at either. x86 is the lazy way out; the CPU is fast enough that you can just throw stuff at it and it will work.
But that's not efficient, and efficiency matters. What's efficient, now that we have circuits small enough, is to build some general-purpose control logic like ARM, and then to use the rest of your silicon and power budget for application-specific coprocessors. That takes real integration and real know-how, but as we've seen time and again, when it's done right it pays off.
Apple's one example, but even Intel's SSE2, which represents the vast majority of FP workloads, or Nvidia's NVENC which is the standard for hardware video encoding (or Intel's Quicksync which is literally everywhere), are examples of this process put into play.
(And yes, as has been stated before, limiting it to 5000 series CPU's is a bit lame. At the very least make it available on anything PCIe 4 or better, as certainly PCIe bandwidth is a limiting factor)
That should be the only factor, really. So long as the GPU has sixteen direct lanes to the CPU running at PCIe 4.0, rock on.