I'd imagine to some extent they won't, they will have to give some data to the OS, or at the very least, the driver - although the difference between the two amounts to splitting hairs: either it's software controlled to some extent, or it's entirely hardware... and I think it will be software, similar to multi-threaded execution for CPU cores.
I see it kinda like back before multi-core CPUs were a thing. When OSes went pre-emptive multi-tasking, (Win95, kinda.. OS/2, Mac OS 8, most every unix varient) - they needed a scheduler to know how to handle all the various threads on the CPU. That scheduler is software-based for the most part, at least for CPU scheduling**.
Shortly after that, you had (and still have) multi-socket motherboards, and discrete CPU packages. The OS definitely knew about that, and the role of the scheduler stayed the same, it just had more resources to distribute threads across.... but it needed to be aware: if you had a thread arbitrarily bouncing between different discrete CPUs, you'd lose things like cache advantage.
Then we went multi-core CPUs. Same thing, the scheduler just evolved yet again, and got tweaked for things like AMD's Cluster multithreaded CPU (Bulldozer), the Infinity Fabric, etc.
So at the heart of it will be some sort of scheduler. I'm certain that already exists for GPUs - there are already dozens/hundreds/thousands of Shader Compute cores in there, and something has to feed them. It has some driver component, as they also work in multi-GPU configurations to share load there too across discrete cards. So I think you'll just see another evolution of that scheduler get tweaked to take advantage of specific hardware architectures.
** There is a lower level hardware scheduler for out-of-order execution, but that's a bit different, and that is more or less transparent to the OS. I don't think that will be the basis for chiplet implementation, as that is extremely specific to an architecture and not terribly flexible.