Rating JCER=3090
Timestamp: 1610120144
Affine transform refactoring.
Reordered weights in such a way that accumulated sum fits to output. Weights are grouped in blocks of four elements because four int8 (weight type) corresponds to one int32 (output type). No horizontal additions.
Grouped AVX512, AVX2 and SSSE3 implementations.
Repeated code was removed.
An earlier version passed STC:
LLR: 2.97 (-2.94,2.94) {-0.25,1.25}
Total: 15336 W: 1495 L: 1355 D: 12486 Elo +3.17
Ptnml(0-2): 44, 1054, 5350, 1158, 62
https://tests.stockfishchess.org/tests/view/5ff60e106019e097de3eefd5
Speedup depends on the architecture, up to 4% measured on a NNUE only bench.
No functional change
Comments
Post a Comment