Full access is free during Beta. A paid subscription will be offered after Beta.

Groq — User Guide

Very fast inference (Groq).

Visit website
Freemium
Strengths
  • The fastest inference speed in the world (LPU chip), up to 500+ tokens/second
  • Generous free quota, 30 requests per minute
  • Supports mainstream open source models such as Llama 3, Mixtral, and Gemma
  • OpenAI compatible API, easy migration
  • Extremely low latency, suitable for real-time conversational applications
Best for
  • Real-time AI applications with extremely high latency requirements
  • Voice AI assistant (low latency is key)
  • Live code completion and suggestions
  • Highly concurrent AI services
  • Free rapid prototyping with open source models

quick start

Groq's API is fully compatible with OpenAI, and you can experience extremely fast inference with just a few lines of code.

Scenario

Experience extremely fast reasoning

Prompt example
from groq import Groq
import time

client = Groq(api_key="your-groq-api-key")

start = time.time()

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "user", "content": "Write a 500-word article about artificial intelligence"}
    ],
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

elapsed = time.time()-start
print(f"\n\nTime taken: {elapsed:.2f} seconds")
Output / what to expect
A 500-word article will be generated in about 2-3 seconds. The speed is 5-10 times that of OpenAI, Streaming output is nearly real-time, The user experience is excellent.
Tips

Streaming output (stream=True) allows users to see the output immediately, further improving the experience.

Scenario

Free quota description

Prompt example
Groq free version limitations (2025):

Llama 3.3 70B:
- Per minute: 30 requests, 6000 tokens
- Daily: 14,400 requests, 500,000 tokens

Llama 3.1 8B:
- Per minute: 30 requests, 20,000 tokens
- Daily: 14,400 requests, 500,000 tokens

Suitable scenarios:
- Personal projects and prototype development
- Learn and test
- Tools for low frequency use
Output / what to expect
The free version is sufficient for personal projects. Production applications require the paid version, The paid version is billed by token and the price is reasonable.
Tips

The free version has strict per-minute limits. For production applications, it is recommended to use the paid version or combine it with other platforms.

Starter & above

The rest of this guide

Additional scenarios and the full comparison table are included with Starter and above. Sign in with an eligible account to load them.

You're on the Free plan. Upgrade to Starter or higher to unlock the rest of this guide—additional scenarios and the full comparison table.

Loading full guide…