RWKV: Reinventing RNNs for the Transformer Era (Paper Explained) RWKV, a highly scalable architecture between Transformers and RNNs
Add comment