My M4 Max MacBook gets 3,756,165 tok/sec in pure C, compared to ~50,000 tok/sec with the FPGA. Try it yourself:
From X

Disclaimer: The above content reflects only the author's opinion and does not represent any stance of CoinNX, nor does it constitute any investment advice related to CoinNX.

4