I never had any personal experience with Itanium. From what I recall reading at the time it was a solid design, it was just that getting organizations to switch instruction sets for their software base was like pulling teeth...
At the time it seemed to involve significantly more than just swapping compilers - for IA64 VLIW, there seemed to be a significant expectation of code hand-tuning for the architecture.
That seemed to be the selling, and also failing point of IA64 - it was stupidly fast for its time for hand-optimized workloads.
I'll say that it also didn't help that their IA32 (that is, x86-32) translation layer was potato-grade. If they had the technology that Apple has put in their M1 SoCs for x86 emulation, we might all be running Itanium-based CPUs today.
...yet now we are all talking about moving to ARM, so who knows.
The market is different today than it was in 2001 though. Most software is programmed using very high level languages, so there is less of a penalty when it comes to porting from platform to platform than there used to be.
Penalty's still there, it just doesn't sting as much since computing paradigms have mostly stabilized. Servers don't need GUIs, user-facing applications mostly run in web pages, back-end and front-end languages don't need to be at all the same (but can be!), and so on.
What I'm more getting at is that once we run out of hardware-level performance advancement capacity, we're going to need to go back and look at the overhead involved in every layer of computing.
And with machine learning becoming accessible at scale, the idea behind VLIW starts making more sense. You optimize prior to compiling code as thoroughly as possible, essentially as you'd do with embedded systems, but for everything.