You all need to chill out.
I think I'll have to clear up a few things. Maybe I'll make a flowchart.
1. It's not a bait-and-switch. http://en.wikipedia.org/wiki/Bait-and-switch
The other options would have been to actually just put 3GB on it and cripple the bandwidth throughout the whole memory or not to use colour compression and cripple the bandwidth or not to sell the GTX 970 at all. Clearly a better deal for everyone involved. Technical explanation further down.
2. All those game benchmarks didn't suddenly change. The GTX 970 still performs the same.
3. Who of you would have actually used exactly 3.5-4GB VRAM? The driver is actively trying to keep VRAM usage below 3.5GB so once you manage to go over that you'll probably go over 4GB aswell. That means swapping with the RAM and a lot more performance issues.
Technical stuff:
DISCLAIMER: This might not be correct. I haven't confirmed it myself and I won't dissect a cut down GM204 just to please some people on the internet. So take this with a grain of salt, some of it might be wrong. However that's also true for most of the accusations.
What everyone thinks is the issue won't be fixed because they can't fix it.
The issue is real and I hope they'll fix it on the GM200 and maybe later versions of the GM204, but it's more like a minor inconvenience rather than the absolutely game-breaking, performance-destroying and possibly life-threatening bug everyone makes it out to be.
Most of the benchmarks were also run incorrectly. People didn't bother to read the instructions. Some caught onto that and are now trying to blame the author, because a program that was coded in 30 minutes isn't fool proof. If the benchmark is run incorrectly it can show lower bandwidth for 1GB. It'll also show lower bandwidth or infinite bandwidth towards the end of the memory on every single nvidia card in existence. What it's actually showing is the RAM/swapping bandwidth. The whole point of that benchmark was to find out if the last 0.5GB of VRAM is that much slower than the rest to be able to cause these issues people have reported when using >3.5GB VRAM. What it's showing when run correctly* is that for some reason the swapping starts with 0.5GB left. It's not actually the VRAM being slow, it's the DRAM (via PCIe) being used instead of the VRAM which is incredibly slow.
*headless, the gpu can't be used for a display, use the iGPU, or the OS will reserve VRAM and the weird CUDA memory swapping will show up earlier or before you actually run out of memory on other cards that don't even have that issue
My guess on what's happening in as simple as I can describe it, this is highly speculative and might not even come close to the truth:
1. Nvidia "hardwired" the VRAM adresses to the L2 cache adresses for the colour compression. Because of that, intentionally or unintenionally, data in the VRAM can't be moved/swapped. So once you run out of VRAM and data gets put in the DRAM (the normal RAM) it's stuck there. Normally in case of a page fault (data is DRAM instead of VRAM) the data would get swapped and the one you're using gets put in the VRAM. This doesn't work now, so everytime you need that data it's going to be send from the DRAM via PCIe (which is incredibly slow compared to VRAM). Now that would only be a problem when you're using a bit over 4GB (compressed size). In fact it's only a problem under specific circumstances, namely when the total memory consumption is >4GB but the actively used memory for the current application/task is <4GB. Once you start using >4GB for one application you have to swap anyway. So unless you've got some pretty big stuff in the background/minimized or windows is reserving stupid amounts of VRAM, this isn't an issue.
2. Because of the way they cut down the L2 cache (1.75MB instead of 2MB)/crossbars on the GTX 970 they can't access the last 0.5GB of the VRAM via the normal "hardwired" colour compression way. However they knew about this and made 2 "partitions". The first 3.5GB would be accessed in the "normal" way, like the GTX 980 does for all 4GB. The last 0.5GB would be accessed without colour compression. That way you'd only lose the extra 30% bandwidth from the compression. It's not ideal but acceptable. For that reason the driver tries to keep the VRAM usage below 3.5GB. The only question is did they have to cut down the L2 cache? Iirc they didn't cut it down on the GTX 780, so I'm leaning towards yes.
3. Now it's time to get to the actual issue. The whole thing started when people noticed that the GTX 970 wouldn't used more than 3.5GB VRAM unless it was forced to. The behaviour itself is normal but the limit should have been 4GB like on the GTX 980. Also a 30% drop in bandwidth shouldn't be able to cause the drastic performance problems that were reported. Nai's benchmark indicates that the GTX 970 can't access the last 0.5GB or starts swapping to DRAM even though it can access them. That's not supposed to happen. Everything else is.
Conclusions:
1. There is an issue. But it's not what people think is the issue, it's only related.
2. Unless Nvidia sent cards with the full 2MB L2 cache to reviewers and then sold the cards with 1.75MB while knowing about the problem it would cause it's not bait-and-switch. A reviewer with press and retail versions of the GTX 970 could confirm this.
3. Benchmarks didn't show it because most benchmarks still only use 3GB.*
4. It's mostly an issue for 4K and/or very high settings that the 970 might not be able to handle anyway. It's a bummer in those cases and in SLI because the 970 might not scale as well as expected since it's running out of VRAM. If were really lucky it's just a driver glitch and fixable without physical changes.
*We've been there before, people claimed you need Titan Blacks because the 3GB on the 780 Ti isn't enough for 4K, triple 1080p, 1440p once you go to 256xSSAA (it's a hyperbole). Turns out that neither of those GPUs can get 60fps with 256xSSAA anyway and that people prefer 60fps 4xSSAA to 0.2fps 256xSSAA. Apparently some of the people working on the drivers actually have a clue about what they're doing. In fact, some of them are so good they even get paid for it. They know what happens when a GPU runs out of VRAM so unless there is absolutely no way to avoid it, it won't happen.