Released Clang 12 generates bad code for the original loop in here. While this is a compiler bug plain and simple, we still have to deal with it. This is related to the SLP vectorizer, and in particular the two reverse subtracts in the butterflies for the second half to avoid unary negates. Use the more regular dataflow that has the unary negates in it (we can at least fold one of them into a constant, namely for A2) and introduce a few temporaries that also make alias analysis (and possible block-level vectorization) a whole let easier while I'm at it. This fixes the codegen issues on Clang 12, which now produces a working decoder, and I expect the single unary negate that we actually gain per iteration of this loop is not a significant perf concern. (There are bigger fish to fry here regardless.) Fixes issue #1152.master
parent
0d47d17002
commit
70136cd5f1
1 changed files with 22 additions and 23 deletions
Loading…
Reference in New Issue