Chess Engine: Ceres 0.94
Ceres Release Notes - version 0.94
Ceres 0.94 incorporates 3 significant new features as well as numerous smaller tweaks which improve the user experience.
The Elo improvements compared to version 0.93 are believed to be very modest (perhaps 5 Elo) at shorter time controls. However the root swapping enhancement definitely increases search speed by at least 7% with large searches. There are also theoretical reasons (e.g. the greater abundance of transpositions) to believe the other enhancements might be more impactful with larger search trees. But this is not confirmed due to the time and computation resources need to perform such tests.
Major Features
a "root swapping" technique greatly reduces the time spent at the beginning of search of moves within a game to reorganize the search tree. Previously this overhead decreased effective search speed by approximately 11% at long time controls, now reduced to about 3%. Ceres can now avoid rebuilding the tree in a most cases (by swapping the new root position and performing some associated data structure update). In addition, the nodes which fall out of the actual search still tree sometimes remain memory resident and available as an addition cache (reducing the number of positions actually being sent to the neural network by approximately 3%). This technique does increase the memory usage, but the aggressiveness is automatically tuned based on the physical RAM available in the computer. This feature can optionally be turned off via the ReducedMemoryMode option in Ceres.json or UCI, or the MaxTreeNodes setting can be used to place a hard limit on the number of nodes that are allowed to be used by any search.
a "virtual subtree" technique is used to reduce memory consumption. Ceres avoids materializing physical nodes in the tree when they positions they represent already exist elsewhere in the tree (transpositions). In prior versions of Ceres, this was limited to a single node having its child array virtualized. This has been expanded to up to 2 nodes and 3 child arrays will be virtualized. In many cases the subtree will not be further visited and there will never be a need to materialize the actual nodes/children, thereby further reducing memory consumption by approximately 10% (for smaller searches) to 25% (for very large searches). The semantics of the prior Ceres MCTS search algorithm are preserved.
a "sibling blending" technique sometimes averages in information to newly visited nodes from their siblings not yet evaluated (i.e. having lower policy prior probabilities). Specifically, when a child of a node is first visited as a leaf node, the adjacent siblings (within 3 slots and 10% prior probability distance) are also scanned to see if an evaluation is already available (via transpositions or tablebase hits). If so, this value (taken from the Q subtree evaluation in the case of transpositions) is blended into the value backed up higher in the tree. The weight used in the blending depends on the mimimax consideration of giving more weight if the node was better than the node actually being visited (i.e. better than expected), and also the reliability of the sibling evaluation (e.g. tablebase hit is definive and a transposition subtree with large N is highly reliable). In practice about 10% to 15% of visited nodes will be found eligible for this blending. This approach attempts to exploit one of the major advantages that MCTS based search algorithms posess - retaining a full tree of potentially useful information in memory. Currently the Elo gains from this feature are seemingly very modest (e.g. 5 Elo) but potentially larger at very long time controls and/or after future tuning optimizations. See MCTSNodeSiblingEval.cs for more detail in the code.
Minor Enhancements
The initialization time to load network files is reduced by 60%, and memory consumption also significantly reduced (for example, by approximately 400mb with a 30b network).
The new option "MaxTreeNodes" can be set either in Ceres.json or via UCI interface and limits the maximum total number of physical nodes actually allocated to the specified value (terminating searches if they reach the specified value).
A bug with the searchmoves feature (when black was to play) is corrected.
Some minor search performance optimizations were made, focused on concurrency efficiency (eliminting some locks and reducing false sharing) yielding about a 3% speedup for large searches on high-end hardware.
Comments
Post a Comment