LCZero - What chess-specific enhancements can be made to the transformer architecture?

- March 11, 2024

Fragment of the article:

Our work

Leela’s nets have historically struggled with long-range dependencies, failing to recognize positional and tactical ideas involving squares that are far away from each other, such as multiloaded pieces. This was because Leela’s models used a convolution-based architecture, like those of the DeepMind AlphaZero project on which Lc0 is based. Effectively, each square iteratively analyzes the information at each adjacent square and uses that information to refine its representation. The main drawback of this approach is the small receptive fields of the convolution filters. For the a1 square to “learn” about what piece is on h8, the information must make at least 7 trips from square to square.

Our strongest transformer model, BT4, is nearly 300 elo stronger in terms of raw policy than our strongest convolution-based model, T78, with fewer parameters and less computation. We’ve tested dozens of modifications to get our transformer architecture to where it is today.

More

Search This Blog

Chess Engines Diary

Since you’re here...

LCZero - What chess-specific enhancements can be made to the transformer architecture?

Comments

Post a Comment

Popular posts from this blog

New Strong Engines Test, by Chess Engines Diary, 2024.04.12

New chess opening book: M11.2 (bin and ctg)

Dragon NNUE by Komodo Chess - it's free!

Chess engines: ChessKiss 1.8

New chess opening book: Perfect 2021 (abk, bin, bkt, ctg)

New version chess engine: Lc0 BT4

Houdidit 6.03