No, the exception rather than the rule is an old cliche. Day to day more and more things are being offloaded to the GPU.
Not really. It's the same as the good old "multithreading" argument.
Most code simply cannot be multithreaded under any circumstance no matter the breakthrough. It is a fundamental violation of logic. If code depends on state at all, and most code does, you wind up with a mess of thread locks and other problems if you try.
It's sortof like a violation of the fundamental laws of physics. It simply cannot be done.
It's more and more common to take the little bits and pieces of code that lend themselves to parallelization and offload them to the GPU, but some corner cases excluded (notably rendering, encoding and some scientific workloads) this usually represents a small percentage of the code in most software.
The state issue prevents it from being more. Unless you invent time travelling code, it cannot happen.
GPU's are fantastic at tackling highly paralellized tasks, but absolutely awful at tackling tasks which are not, and most tasks are not highly parallellized, and nothing can change that. It's not a matter of time or innovation or something like that. It is an unchangeable constant.
That's not to say that ASIC's or FPGA's can't be used to offload the CPU and do a better job for many tasks, but they tend to be highly task specific, and it is generally infeasible to have an on board ASIC for every little thing. FPGA's are more flexible, but you'd have to constantly reprogram them for the task at hand, and they would be slower than ASIC's.
The general purpose CPU is far from dead. You can get by with a weak one, if you want a limited scope device that only excels at a limited subset of things for which it has GPU and/or ASIC acceleration, but this is really more of a "consumer device" approach and can never replace the flexibility of a strong general purpose CPU core.