vBT4:
Big transformer 4. New network architecture which builds off of BT3 by adding two types of auxiliary heads, future heads and categorical value heads. The categorical value heads predict a distribution over values of q rather than a WDL outcome distribution, and the future heads predict the moves that will be played over the next two plies. The hope is that these heads will give additional information to the net to improve training speed. We've also fixed half-precision training, so this model will be larger. BT4 training started in mid-October and is expected to take a few months. It has 15 layers with 1024 embedding size, 32 heads per layer, and 1536 emb size, for roughly a doubling in size over BT3.
Individual statistics: Lc0 0.30.0
Stockfish 16 | 5/13 | -3 | 13 Games |
Booot 7.2 | 6/10 | +2 | 10 Games |
Critter 1.6a | 8/9 | +7 | 9 Games |
Dragon 3.2 | 4/8 | +0 | 8 Games |
Stockfish 20230729 | 3.5/8 | -1 | 8 Games |
ShashChess 33.2 | 3.5/8 | -1 | 8 Games |
Fisherov chess monk 1.2 | 3/8 | -2 | 8 Games |
Seer 2.6.0 | 4/7 | +1 | 7 Games |
Raid 2.7i | 3/7 | -1 | 7 Games |
Pingu | 6/6 | +6 | 6 Games |
Luna 1.1.0 | 6/6 | +6 | 6 Games |
GOOB 1.8.9 | 6/6 | +6 | 6 Games |
Coiled 1.2 | 6/6 | +6 | 6 Games |
Texel 1.09 | 6/6 | +6 | 6 Games |
Wasp 6.50 | 5.5/6 | +5 | 6 Games |
Alexandria 4.0 | 4/6 | +2 | 6 Games |
Clover 6.0 | 4/6 | +2 | 6 Games |
SlowChess 2.9 | 3.5/6 | +1 | 6 Games |
Igel 3.5.0 | 3.5/6 | +1 | 6 Games |
Cool Iris 10.40 | 2/6 | -2 | 6 Games |
Lynx 0.14.1 | 5/5 | +5 | 5 Games |
Comments
Post a Comment