AMD Patents Active Bridge Chiplet with Integrated Cache

Peter_Brosdahl

Moderator
Staff member
Joined
May 28, 2019
Messages
8,054
Points
113
amd-logo-blue-teal-1024x576.jpg
Image: AMD



AMD has filed a patent for an active bridge chiplet with integrated cache that would allow partitioning of smaller-group SoC chiplets, improving communication among them by providing high-bandwidth die-to-die interconnectivity. The chiplets would access the memory cache via a single registry instead of individual channels; no software-specific implementation would be needed. AMD presently uses an L3 cache called Infinity Cache for its RDNA 2 architecture. This new design could help alleviate issues with the functioning of multiple GPU dies seen as one device by the CPU and applications...

Continue reading...
 
Last edited by a moderator:
I'm curious as to how can AMD/intel/nvidia manage to make all the chiplets appear as one for the OS.
 
I'm curious as to how can AMD/intel/nvidia manage to make all the chiplets appear as one for the OS.
I'd imagine to some extent they won't, they will have to give some data to the OS, or at the very least, the driver - although the difference between the two amounts to splitting hairs: either it's software controlled to some extent, or it's entirely hardware... and I think it will be software, similar to multi-threaded execution for CPU cores.

I see it kinda like back before multi-core CPUs were a thing. When OSes went pre-emptive multi-tasking, (Win95, kinda.. OS/2, Mac OS 8, most every unix varient) - they needed a scheduler to know how to handle all the various threads on the CPU. That scheduler is software-based for the most part, at least for CPU scheduling**.

Shortly after that, you had (and still have) multi-socket motherboards, and discrete CPU packages. The OS definitely knew about that, and the role of the scheduler stayed the same, it just had more resources to distribute threads across.... but it needed to be aware: if you had a thread arbitrarily bouncing between different discrete CPUs, you'd lose things like cache advantage.

Then we went multi-core CPUs. Same thing, the scheduler just evolved yet again, and got tweaked for things like AMD's Cluster multithreaded CPU (Bulldozer), the Infinity Fabric, etc.

So at the heart of it will be some sort of scheduler. I'm certain that already exists for GPUs - there are already dozens/hundreds/thousands of Shader Compute cores in there, and something has to feed them. It has some driver component, as they also work in multi-GPU configurations to share load there too across discrete cards. So I think you'll just see another evolution of that scheduler get tweaked to take advantage of specific hardware architectures.

** There is a lower level hardware scheduler for out-of-order execution, but that's a bit different, and that is more or less transparent to the OS. I don't think that will be the basis for chiplet implementation, as that is extremely specific to an architecture and not terribly flexible.
 
Multi-GPU came with two big caveats- the first being that they had to be treated like separate GPUs through the entire software stack, and the second that this means that cooperation could only really happen at most on separate parts of a frame, but usually separate frames. Something that always comes with some latency penalty even when things are going perfectly, and well, few developers have succeeded at perfect implementations.

That's where chiplets can change things. Different cores on different dies can work on different tasks, and even be different core compositions themselves. Bandwidth can be added to the chiplet cluster through HBM techniques as well as standard DRAM techniques, and in a pinch both. And almost certainly with many orders of magnitude less latency.

The biggest questions then are the software stack, for which I'll echo @Brian_B's concerns, and really the same as with HBM: yields of the assembled packages. That one bit AMD in the butt several times already, and Nvidia has only leverage the technique for their highest-end parts.


But I don't want to be too much of a Debbie-downer, so let me end with a positive note: chiplets bring the promise of scaling performance up, yes, but also down, and in different configurations of chiplets to meet the needs and limitations of different markets.

And a parting shot: perhaps the production of chiplets will help GPU manufacturers better balance mining demand ;)
 
I'd imagine to some extent they won't, they will have to give some data to the OS, or at the very least, the driver - although the difference between the two amounts to splitting hairs: either it's software controlled to some extent, or it's entirely hardware... and I think it will be software, similar to multi-threaded execution for CPU cores.

I see it kinda like back before multi-core CPUs were a thing. When OSes went pre-emptive multi-tasking, (Win95, kinda.. OS/2, Mac OS 8, most every unix varient) - they needed a scheduler to know how to handle all the various threads on the CPU. That scheduler is software-based for the most part, at least for CPU scheduling**.

Shortly after that, you had (and still have) multi-socket motherboards, and discrete CPU packages. The OS definitely knew about that, and the role of the scheduler stayed the same, it just had more resources to distribute threads across.... but it needed to be aware: if you had a thread arbitrarily bouncing between different discrete CPUs, you'd lose things like cache advantage.

Then we went multi-core CPUs. Same thing, the scheduler just evolved yet again, and got tweaked for things like AMD's Cluster multithreaded CPU (Bulldozer), the Infinity Fabric, etc.

So at the heart of it will be some sort of scheduler. I'm certain that already exists for GPUs - there are already dozens/hundreds/thousands of Shader Compute cores in there, and something has to feed them. It has some driver component, as they also work in multi-GPU configurations to share load there too across discrete cards. So I think you'll just see another evolution of that scheduler get tweaked to take advantage of specific hardware architectures.

** There is a lower level hardware scheduler for out-of-order execution, but that's a bit different, and that is more or less transparent to the OS. I don't think that will be the basis for chiplet implementation, as that is extremely specific to an architecture and not terribly flexible.
What I mean is that if its not seen as a single GPU by the OS then it could be some sort of multi-gpu implemetation (crossfire) and that would need both game and driver support.

I recall intel had promised that multicore cpus could eventually accelerate single core applications. That never happened, at least not yet.
Also both AMD and nvidia were supposedly working on solutions that would make multi-gpus work as one. AFAIK nvidia can do this with CUDA but not for games.
 
Become a Patron!
Back
Top