Approximately +250 Elo over 1.5 (Ordo ~2800, Gauntlet 4 n=1040). Direct self-play vs 1.5: +222.8 Elo (n=10000, +/- 7.0) at 10+0.1.
Unlike 1.5, whose gain came almost entirely from a single critical bug fix, 1.6's improvement is a deliberate, broad evaluation overhaul. Every tunable evaluation weight was centralized into a single flat vector and optimized with Texel tuning; the piece-square tables were made phase-tapered (separate middlegame and endgame values); and three new evaluation feature groups were added: pawn shelter/storm, a second king-safety layer, and a positional refinement group covering tempo, bishop outpost, and passed-pawn play. On the infrastructure side, the magic numbers are now hardcoded, making engine startup effectively instant.
The development was strict and incremental: features were added one at a time (or in small thematic groups), each validated by self-play before the next was included, and only measurable gainers were kept. Three conceptually sound features were implemented, tuned, measured, and rejected (rook-behind-passer, non-linear mobility, material imbalance).
Evaluation -- Tuning and Tapering
- Weight centralization: every tunable evaluation weight collected into a single flat array
eval_weights[NUM_WEIGHTS](934 entries), organized into named groups. Previously weights were scattered as named compile-time constants. The array is non-const so the tuner can mutate it; the eval hot path treats it as read-only by convention. - Texel-tuned weight set: the integrated weights were produced by Texel tuning on a labeled quiet-position dataset. Material was modeled as a separate tunable during tuning and folded back into the PSTs afterward, so
PIECE_VALUEstays fixed at 100/320/330/500/900 while the evaluation reproduces the tuned values. - Tapered piece-square tables: each PST square now holds separate middlegame and endgame values, blended by
game_phase(). Lets the tuner express phase-dependent placement (king centralization bad in the middlegame and good in the endgame, advanced pawns worth progressively more as material comes off).
Evaluation -- New Feature Groups
- Pawn shelter / storm: scores the friendly pawn nearest the king on the king's file and the two adjacent files (shelter, bucketed by rank gap), and enemy pawns advancing on those files (storm, bucketed by advancement). Tapered.
- Second king-safety layer: open/semi-open files toward the king (king's file weighted apart from adjacent files) and safe checks (squares an enemy can check from without being captured, per piece type). Complements the attacker-count king safety from earlier versions.
- Tempo (+19 Elo, n=860, LOS 98.5%): a bonus for the side to move, added per the side to move rather than per colour. The single most valuable feature added in 1.6.
- Bishop outpost (+6.6 Elo, n=3480, LOS 93%): a bishop on relative ranks 4-6 on a square no enemy pawn can challenge, with a larger bonus when pawn-supported. Mirrors the knight outpost.
- Passed-pawn refinement (+10.6 Elo as a group, n=6520, LOS 99.94%): king proximity (own king near is good, enemy king near is bad, endgame-weighted), blockade (enemy piece on the stop square; a minor blockades more effectively than a major), free path to promotion, and pawn protection. Computed in a single pass over passed pawns, no NPS cost.
Evaluation -- Trace and Consistency
trace_evaluate()and thetracecommand:trace_evaluate()produces the linear coefficient decomposition of the evaluation (required for Texel tuning); its dot product with the weight vector reproducesevaluate()within the documented tapered-blend rounding tolerance (<=2cp). The newtraceUCI debug command prints the per-weight coefficients and the engine-vs-trace fidelity check.
Infrastructure
- Hardcoded magic numbers: the bishop/rook magics, previously found at startup by a fixed-seed randomized search, are now baked in as constants. Startup time dropped from ~261 ms to ~11 ms (~96% faster) on the Ryzen 7 1700; the generated attacks are bit-identical (bench unchanged at 618952, depth 8). The randomized search is preserved under
#ifdef FACON_REGENERATE_MAGICSfor offline regeneration; not compiled into the normal build.
Build
- Version bumped to 1.6, codename "Temple": the version-string/binary-name logic distinguishes development codenames (suffix carried) from release codenames (no suffix), and the version-selection comments were generalized so they need no editing between development and release.
Features attempted and rejected
- Rook behind passed pawn (Tarrasch rule): -9 Elo (n=3020, LOS 2.9%). Redundant with the rook PST and existing passed-pawn terms.
- Non-linear mobility (per-count tables): -12 Elo (n=5300). Overfit; the linear mobility from 1.4 is more robust at this strength.
- Material imbalance (Kaufman-style, knight/rook/bishop-pair scaled by pawn count): approximately neutral to -5 Elo. The tuning itself signaled redundancy (two of three terms wrong-signed vs theory, tiny magnitudes, error barely moved, optimizer redistributed existing PST/bishop-pair value); self-play confirmed it.
Known limitations
- Drawn pawnless endings evaluated as material (carried from 1.5): K+B vs K, K+N vs K, K+N+N vs K, and K+B+B same-color vs K return the raw material count rather than 0. An override forcing 0 was attempted in 1.5 and reverted after measuring ~30 Elo regression. Proper resolution requires material-signature endgame recognition or tablebase probing, planned for a future version.

Comments
Post a Comment