Skip to main content

Since you’re here...

... we have a small favour to ask. More people, like you, are reading and supporting our blog: "Chess Engines Diary". And unlike many other sites and blogs, we made the choice to keep our articles open for all, regardless of where they live or what they can afford to pay.

We hope you will consider supporting us today. We need your support to continue to exist, because good entries are more and more work time. Every reader contribution, however big or small, is so valuable. Support "Chess Engines Diary" even a small amount– and it only takes a minute. Thank you.

============================== My email: jotes@go2.pl



Stockfish: Depth vs. TC

 Newer and older results showing the average depth for games at fishtest conditions

NewOld

Elo cost of small Hash

We measure the influence of Hash on the playing strength, using games of SF15.1 at LTC (60+0.6s) and VLTC (240+2.4s) on the UHO book. Hash is varied between 1 and 64 MB and 256MB in powers of two, leading to as average hashfull between 100 and 950 per thousand. The data suggests that keeping hashfull below 30% is best to maintain strength.

Raw data for the above graph

Elo cost of using MultiPV

MultiPV provides the N best moves, and their associated principal variation. This is a great tool to understand the options available in a given position. However, this information does not come for free, and the computational cost computing it reducing the quality of the bestmove found relative to a search that only needs to find a single line.

MultiPVEloElo-err
10.00.0
2-97.22.1
3-156.72.8
4-199.32.9
5-234.52.8

Engine: Stockfish 15.1  Time control: 60s+0.6s, Book: UHO


Elo gain using MultiPV at fixed depth

MultiPVEloElo-errPointsPlayed
10.013496.530614
245.73.115388.030697
353.93.515732.530722
459.53.215862.530479
563.73.616078.530604

Time control: 580s+5.8s, Depth: 18

Elo gain using syzygy

TB6 testing for various versions of SF

Consistent measurement of Elo gain (syzygy 6men vs none) for various SF versions:

TB are in RAM (so fast access), TC is 10+0.1s (STC), book UHO_XXL_+0.90_+1.19.epd. No adjudication. The introduction of NNUE (with SF12) is clearly visible. With SF15, there is just 2.7 Elo gain.

Raw data for the above graph

Testing depending on number of pieces and TC

Tested at 10+0.1, with all syzygy WDL files on tmpfs (i.e. RAM), testing using none(0), 4, 5, and 6 man TB in a round-robin tournament (SF10dev).

RankNameElo+/-GamesScoreDraws
1syzygy61328259151.8%59.5%
2syzygy5228259050.3%59.4%
3syzygy4-728259149.0%59.3%
4syzygy0-728259248.9%59.4%
Tested at 60+0.6, with all syzygy WDL files on tmpfs (i.e. RAM), testing using none(0) against 6 man TB:
Score of syzygy6 vs syzygy0: 4084 - 3298 - 18510 [0.515] 25892 Elo difference: 10.55 +/- 2.25

Threading efficiency and Elo gain.

Efficiency

Here we look at the threading efficiency of the lazySMP parallelization scheme. To focus on the algorithm we play games with a given budget of nodes rather than at a given TC. In principle, lazySMP has excellent scaling of the nps with cores, but practical measurement is influenced by e.g. frequency adjustments, SMT/hyperthreading, and sometimes hardware limitation.

Equivalent nodestime

In these tests, matches are played at a fixed nodes budget (using the nodestime feature of SF), and equivalence in strength between the serial player and the threaded player (for x threads in the graph below) is found by adjusting the number of nodes given to the threaded player (e.g. with 16 threads, the threaded player might need 200% of the nodes of the serial player to match the strength of the serial player). This 'equivalent nodestime' is determined for various number of threads and various nodes budgets (60+0.6Mnodes/game is somewhat similar to our usual LTC at 60+0.6s/game, if we assume 1Mnps).

The interesting observation one can make immediately is that this 'equivalent nodestime' grows with the number of threads, but not too steeply, and further more that the 'equivalent nodestime' decreases with increasing nodes budget. The data shows that with 64 threads, the equivalent nodestime is about 200% for a node budget of 240+2.4Mn, i.e. despite such games being much faster than STC (10+0.1s), efficiency is still around 50%.

The curves are sufficiently smooth to be fitted with a model having 1 parameter that is different between the curves (f(x), parameter a, see caption). A smaller value of a means a higher efficiency.

A fit for the a parameter, and extrapolation to long TCs.

The above parameter a from the model, can be fit as a function of nodes budget, this allows for extrapolating the parameter, and to arrive at and estimate for the 'equivalent nodestime' at large TC / nodes budgets:

The fit is again fairly good. Taking a leap of faith, these measurements at up to 240+2.4Mn can be extrapolated to node budgets typical of TCEC or CCC (up to 500Gn). This allows us to predict speedup and/or efficiency.

SpeedupEfficiency

These extrapolations suggest that even at thread counts of >300, at TCEC TCs efficiency could be 80% or higher, provided the nps scales with the number of threads.

Elo results (older)

LTC

Playing 8 threads vs 1 thread at LTC (60+0.6, 8moves_v3.pgn):

Score of t8 vs seq: 476 - 3 - 521  [0.737] 1000
Elo difference: 178.6 +/- 14.0, LOS: 100.0 %, DrawRatio: 52.1 %

Playing 1 thread at 8xLTC (480+4.8) vs (60+0.6) (8moves_v3.pgn):

Score of seq8 vs seq: 561 - 5 - 434  [0.778] 1000
Elo difference: 217.9 +/- 15.8, LOS: 100.0 %, DrawRatio: 43.4 %

Which is roughly 82% efficiency (178/218).

STC

Playing 8 threads vs 1 thread at STC (10+0.1):

Score of threads vs serial: 1606 - 15 - 540  [0.868] 2161
Elo difference: 327.36 +/- 14.59

Playing 8 threads @ 10+0.1 vs 1 thread @ 80+0.8:

Score of threads vs time: 348 - 995 - 2104  [0.406] 3447
Elo difference: -66.00 +/- 7.15

So, 1 -> 8 threads has about 83% scaling efficiency (327 / (327 + 66)) using this test.


Elo from speedups

For small speedups (<~5%) the linear estimate can be used that gives Elo gain as a function of speedup percentage (x) as:

Elo_stc(x) = 2.10 x
Elo_ltc(x) = 1.43 x

To have 50% passing chance at STC<-0.5,1.5>, we need a 0.24% speedup, while at LTC<0.25,1.75> we need 0.70% speedup. A 1% speedup has nearly 85% passing chance at LTC.

Raw data:

tc 10+0.1:
16   32.42  3.06
 8   13.67  3.05
 4    8.99  3.04
 2    3.52  3.05

tc 60+0.6:
16   20.85  2.59
 8   12.20  2.57
 4    4.67  2.57

Note: Numbers will depend on the precise hardware. The model was verified quite accurately on fishtest see https://github.com/locutus2/Stockfish/commit/82958c97214b6d418e5bc95e3bf1961060cd6113#commitcomment-38646654


Distribution of lengths of games at LTC (60+0.6) on fishtest

In a collection of a few million games, the longest was 902 plies.


Win-Loss-Draw statistics of LTC games on fishtest

The following graph give information on the Win-Loss-Draw (WLD) statistics, relating them to score and move number. It answers the question 'What fraction of positions that have a given score + (move number) in fishtest LTC, have a Win a Loss or a Draw ?'.

This model is used when Stockfish provides WLD statistics during analysis with the UCI_ShowWDL option set to True.


Elo gain with time odds

See also: https://github.com/official-stockfish/Stockfish/discussions/3402

NewOld

One year of NNUE speed improvements

Presents nodes per second (nps) measurements for all SF version between the first NNUE commit (SF_NNUE, Aug 2th 2020) and end of July 2021 on a AMD Ryzen 9 3950X compiled with make -j ARCH=x86-64-avx2 profile-build. The last nps reported for a depth 22 search from startpos using NNUE (best over about 20 measurements) is shown in the graph. For reference, the last classical evaluation (SF_classical, July 30 2020) has 2.30 Mnps.


Round-robin tournament with SF releases, impact of book and time odds

Measured playing games of 5+0.05s, with SF 7 - 15, using the three different books. Each version plays once with the base TC, and once with 20% time odds.

Raw data for the above graph

Branching factor of Stockfish

The branching factor (bf) of Stockfish is defined here such that nodes = bf ** rootDepth or equivalently bf = exp(log(nodes)/rootDepth). Here, this has been measured with a single search from the starting position.

The trend is the deeper one searches the lower the branching factor, and newer versions of SF have a lower branching factor. A small difference in branching factor leads to very large differences in number of nodes searched. For example, SF10 needs about 143x more nodes than SF15.1 to reach depth 49.


Contempt measurements

Older SF (around SF10) had contempt that worked rather well. This data shows the dependence of Elo difference between SFdev of October 2018 and older versions of Stockfish depending on contempt value (The SFdev used is approx. 40Elo above SF9). Upper and lower bounds represent value with maximum error.

OpponentSTCLTC
7
8
9

Full data with values https://docs.google.com/spreadsheets/d/1R_eopD8_ujlBbt_Q0ygZMvuMsP1sc4UyO3Md4qL1z5M/edit#gid=1878521689


Elo change with respect to TC

Here is the result of some scaling tests with the 2moves book. 40000 games each (STC=10+0.1, LTC=60+0.6)

SF7 -> SF8SF8 -> SF9SF9 -> SF10
Elo STC95.91 +-2.358.28 +-2.371.03 +-2.4
Elo LTC100.40 +-2.168.55 +-2.165.55 +-2.2

So we see that the common wisdom that increased TC causes elo compression is not always true.

See https://github.com/official-stockfish/Stockfish/issues/1859#issuecomment-449624976


TC dependence of certain terms in search

Discussed here https://github.com/official-stockfish/Stockfish/pull/2401#issuecomment-552768526


Elo contributions from various evaluation terms

See spreadsheet at: https://github.com/official-stockfish/Stockfish/files/3828738/Stockfish.Feature.s.Estimated.Elo.worth.1.xlsx

Note: The estimated elo worth for various features might be outdated, or might get outdated soon.


The article comes from the site




Comments

Popular posts from this blog

New Strong Engines Test, by Chess Engines Diary, 2024.04.12

  💾  552 (!) games from the tournament download   👍 @chessenginesdiary  Country - Poland, City -  Malbork 🕓 Time 3'+3" 💻HP Pavilion i5-1035G1 8GB RAM 🖬 GUI-Banksia All  CEDR  317.321 games download  (01.04.2024 - 3'+3")  HypnoS 030424, Yuli GM Pro 16, Stockfish 16.1 and ShashChess 35 . These four engines scored the same number of points and placed at the top of the table in this strong tournament. Tech table: Engine KN/move NPS dep/mov time/mov mov/game time/game fails Alexandria 6.1.0 3218 576429 30.2 5.6 56.7 316.2 Berserk 13 4779 841025 36.7 5.7 56.0 318.1 Brainlearn 28 2275 366977 31.4 6.2 43.5 269.4 Caissa 1.18 5492 923486 30.2 5.9 50.4 299.7 Clover 6.1.19 4803 854377 31.4 5.6 62.2 349.4 Cool Iris 12.10 2201 356730 27.6 6.2 51.2 315.5 CorChess 20240331 2253 372774 27.4 6.0 52.2 315.8 Crystal 8 3582 571994 20.3 6.3 47.7 298.8 Fire 9.3 5225 876565 22.0 6.0 55.5 331.1 Fisherov chess monk 1.2 3487 619728 36.0 5.6 60.4 340.0 Hyp

New version chess engine: Raid 3.6 TacticaL RetreaT (Stockfish derivative)

Raid - is a Stockfish derivative . Rating CEDR=3743 Raid 3.2 - results: Polyfish 240114 7/14 +0 14 Games Hazard 4.21 DuaL 6.5/12 +1 12 Games Leptir 100124 6/12 +0 12 Games Arasan 24.1 7.5/10 +5 10 Games Uralochka 3.41 dev1 6/10 +2 10 Games Caissa 1.16 5.5/10 +1 10 Games SF-PRO 07.01.2024 5/10 +0 10 Games Stockfish 20240117 5/10 +0 10 Games Minic 3.41 7/8 +6 8 Games Pawn 3.0 6.5/8 +5 8 Games Texel 1.11 5.5/8 +3 8 Games RubiChess 20240112 5/8 +2 8 Games CorChess 20240103 4/8 +0 8 Games Counter 5.5 4.5/6 +3 6 Games Obsidian 10.0 3.5/6 +1 6 Games Fisherov chess monk 1.2 3/6 +0 6 Games Avalanche 2.1.0 3.5/4 +3 4 Games Starzix 4.0 2.5/4 +1 4 Games Cool Iris 11.80 2/4 +0 4 Games Blue Marlin 15.7 2/4 +0 4 Games AI 28.0 2/4 +0 4 Games

Chess engine: Incognito 8.2 (experimental version of CorChess)

  Eduard Nemeth: "by Solista: This engine is identical in code to 8.01 but without Random Op. MultiPV mode. Some users only want to use normal MultiPV mode. Download Win 64-Bit (avx2, bmi2, sse41, source code).". Rating CEDR=3741 Incognito 5 PRO engine results SF CorChess 010124 25.5/50 +1 50 Games Predator AI 23/46 +0 46 Games Polyfish 240105 22.5/44 +1 44 Games Sawfish 2TC 21/42 +0 42 Games SF PB 070124 14.5/29 +0 29 Games Dragon 3.3 15.5/26 +5 26 Games Stockfish 20240108 12/24 +0 24 Games Deep Blue 20230628 12/24 +0 24 Games AbbyStein 2.8 12/24 +0 24 Games Berserk 12.1 11/20 +2 20 Games Cool Iris 11.80 10.5/20 +1 20 Games Ethereal 14.25 10.5/16 +5 16 Games Stockfish 20231231 8/16 +0 16 Games Raid 3.1 8/16 +0 16 Games SF-PRO 25.12.2023 8/16 +0 16 Games Killfish 071223 8/16 +0 16 Games Crystal 7 7/14 +0 14 Games TACTICAL 161223 6/12 +0 12 Games Fisherov chess monk 1.2 6/12 +0 12 Games Seer 2.8.0 6.5/10 +3 10 Games SF-PRO 07.01.2024 5/10 +0 10 Games Incognito 8.2 download

New Chess Engines Diary Rating (CEDR) - 06.04.2024, over 317 thousand games!

  In March there were as many as 16,713 new games . In total there are 317.321 games! The new CEDR ranking was calculated based on these games. All 317.321 games download New Chess Engines Diary Rating CEDR – 06.04.2024 Chess Engines Diary Rating CEDR – 06.04.2024 generated with Ordo 0.9.9 (anchor Critter 1.6a with 3230) All 317.321 games, Time:3’+3” Minimum 250 games Pl Engine Rating Score Games (%) 1 Stockfish 20231104 3760.8 221.00 301 73.42 2 CorChess 20240122 3750.7 149.00 278 53.60 3 ShashChess 34 3747.5 156.50 260 60.19 Stockfish 16.1 3745.6 660.00 1111 59.41 4 Lc0 BT4 3712.6 148.50 263 56.46 5 Dragon 3.3 3711.5 1537.00 2939 52.30 6 Berserk 20240311 3692.4 224.00 374 59.89 7 Caissa

Yuliirma 4.0 wins Strong Engines Tournament (Tests by Jörn Gronemann, Heide, 2024.04.26 Time: 3'+3")

Our next test, a large tournament in which over 1,000 games were played, was won by the Yuliirma 4.0 chess engine. This engine is only available to our testers. In second place is the official version of Stockfish 16.1 and in third place is Stockfish 20240421 . GPU NVIDIA  RTX 3060 ti. GUI-Banksia, For Every Engine only 1 CPU and 512 Hashtables, Book - short ICCF book,  6TBs. 🕒🕒Time 3'+3"   1050 (!!!) tournaments games download   💾 All games CEDR  317.321 games download  01.04.2024  (3'+3")    @chessenginesdiary