๐Ÿงฎ LLM Mathematics Benchmark

Evaluate Large Language Models on mathematical reasoning tasks using a diverse dataset of questions

API Configuration

Configure your API keys for different model providers:

๐Ÿค– OpenAI Configuration

๐Ÿง  Anthropic Claude Configuration