Release notes for Ceres 0.90-rc1
Community help and feedback in testing v0.90 would be welcomed. Primary areas of enhancement include:
Compatability (Linux support, also GPUs with limited memory)
Installation (LC0.DLL no longer required)
Search speed faster by 10% to 25% (due to enhancements to CUDA backend and MCTS engine), especially on more recent NVIDIA hardware and CUDA 11.3+.
Because of the major backend changes (about 10,000 lines of new C# code) it is likely that issues will be identified relating to untested hardware and software configurations. Please feel free to open an issue if you are encountering difficulties, or post in #help of the Discord channel for Leela Chess Zero.
Internal testing suggests that play strength (with T60) is significantly improved, even on relatively modest hardware (Windows laptop with 2070 GPU) and moderate time controls (such as 60 seconds per game). However independent community assessment is necessary and would be welcomed.
Known minor issues:
loading of networks (at initialization) is a little slow
in multigpu configurations the new feature to reduce GPU memory consumption is disabled
Details
The same binaries support both Windows and Linux operating systems. To run on Linux, use: "dotnet ./Ceres.dll"
A native C# backend for CUDA instead of relying upon an external library LC0.DLL, yielding simplified installation and improved performance. This backend is largely a transliteration of the Leela Chess Zero CUDA backend in C++ (largely by Ankan) into C#
This work leveraged open source ManagedCUDA project (by Michael Kunz) to provide the object-oriented bindings to the CUDA C API.
The implementation also features several enhancements:
use of CUDA graphs to precompile the entire neural network into a single CUDA operation, yielding speedups of 5% to 20%
(greater with smaller networks and smaller batch sizes), especially on more recent hardware and CUDA 11.3 or above.
a supplemental CUDA kernel which reduces GPU/host bandwidth requirements (copies policy only for legal moves)
reduced GPU memory consumption and network load times (by about 25%)
diagnostic and introspection features for developers, such as optionally capturing inner layer timings and activations
integrated native Syzygy tablebase probing (thru 7 man) based upon transliteration of the Fathom library (by Ronald de Man, basil, and Jon Dart) from C++ to C#
search speed optimizations, particularly for long searches on higher-end GPUs. For example, 425k nodes per second is achievable on 2017 vintage CPUs.
some adjustments and tuning to MCTS parameters yielding slightly improved play quality in most situations
two additional command line options (BACKENDBENCH and BENCHMARK) emulating LC0 functionality.
See below for an example of the output from the benchmark command which also compare against LC0 v0.28rc1, e.g.
"Ceres benchmark opponent=lc0 limit=10sm"
various bug fixes, including those involving use of MultiPV feature
ongoing work to improve source code clarity and documentation
ongoing work to make the Ceres code base (set of object-oriented classes for general chess operations, MCTS search, and neural network training) more flexible, performant, and well documented to facilitate research in Chess programming.
Comments
Post a Comment