Full access is free during Beta. A paid subscription will be offered after Beta.

LM Studio — User Guide

GUI for local LLMs.

Strengths
  • Graphical interface, no command line required, novice-friendly
  • Supports searching and downloading models directly from Hugging Face
  • Built-in ChatGPT-like conversation interface
  • Provides a local OpenAI compatible API server
  • Supports multiple quantization formats (GGUF) and flexibly adapts to hardware
Best for
  • Getting Started with Local Large Model Running for Newbies
  • Run models like Llama, Qwen, Mistral and more locally
  • Provide AI API services for local applications
  • Test and compare the performance of different models
  • Scenarios with high data privacy requirements

Install and download models

LM Studio provides a complete graphical interface, from downloading the model to starting a conversation in just a few steps.

Scenario

Install and download the first model

Prompt example
Steps:
1. Visit lmstudio.ai to download the installation package
   - Windows: .exe installation package
   - macOS: .dmg installation package
   - Linux: .AppImage

2. After the installation is complete, open LM Studio

3. Click the "Search" icon (magnifying glass) on the left

4. Search models, recommended for novices:
   - "Qwen2.5-7B-Instruct-GGUF" (Excellent Chinese)
   - "Llama-3.2-3B-Instruct-GGUF" (lightweight)

5. Select the quantized version (Q4_K_M is the balance between quality and size)

6. Click Download and wait for completion
Output / what to expect
After the model is downloaded, it will be displayed in the “My Models” list. Q4_K_M quantized 7B model is about 4-5GB, Download time depends on internet speed.
Tips

Q4_K_M Quantize is the most recommended choice, providing the best balance between quality and file size.

Scenario

Choose the right model based on your hardware

Prompt example
Hardware configuration and model selection guide:

8GB memory (without discrete graphics):
- Select 3B-4B parametric model
- Quantization: Q4_K_M or Q5_K_M
- Recommended: Llama-3.2-3B, Phi-3.5-mini

16GB memory (without discrete graphics):
- Can run 7B parameter model
- Quantization: Q4_K_M
- Recommended: Qwen2.5-7B, Llama-3.1-8B

With discrete graphics card (8GB video memory):
- Models can be loaded to the GPU, greatly improving speed
- Fully load the 7B model to the GPU
- Turn on GPU acceleration in LM Studio settings

32GB RAM or 24GB VRAM:
- Can run 13B-14B parametric models
- Recommended: Qwen2.5-14B, Llama-3.1-14B
Output / what to expect
Choose the appropriate model based on your hardware, Avoid running slowly due to too large models, GPU acceleration can speed up generation by 5-10 times.
Tips

On the model download page of LM Studio, the hardware requirements of the model will be displayed, with green indicating recommended and yellow indicating barely usable.

Starter & above

The rest of this guide

Additional scenarios and the full comparison table are included with Starter and above. Sign in with an eligible account to load them.

You're on the Free plan. Upgrade to Starter or higher to unlock the rest of this guide—additional scenarios and the full comparison table.

Loading full guide…