There are unfortunately a lot of serial guarantees that are unavoidable even if compilers could optimize perfectly. For example: 'for(double v: vec) sum+=v'
Floating point addition is not associative, so summing each value in order is not the same as summing every 8th element, then summing the remainder, which is how SIMD handles it. So even though this is an obvious optimization for compilers, they will prioritize the serial guarantee over the optimization unless you tell it to relax that particular guarantee.
It's a mess and I agree with janwas: use a library (and in particular: use Google Highway) or something like Intel's ISPC when your hot path needs this.
It is crazy sometimes how you try to program something the best you ever could with classical C++, and then someone comes along and makes a version using SIMD that is over 10x faster (but less portable code).
I really wish compilers were better at auto vectorization. And some support added for annotations in the language to locally allow reordering some operations, ...