LM Studio — Guide | AI devotee

LM Studio — User Guide

GUI for local LLMs.

Visit website

Free

Strengths

Graphical interface, no command line required, novice-friendly
Supports searching and downloading models directly from Hugging Face
Built-in ChatGPT-like conversation interface
Provides a local OpenAI compatible API server
Supports multiple quantization formats (GGUF) and flexibly adapts to hardware

Best for

Getting Started with Local Large Model Running for Newbies
Run models like Llama, Qwen, Mistral and more locally
Provide AI API services for local applications
Test and compare the performance of different models
Scenarios with high data privacy requirements

Install and download models

LM Studio provides a complete graphical interface, from downloading the model to starting a conversation in just a few steps.

Scenario

Install and download the first model

Prompt example

Steps:
1. Visit lmstudio.ai to download the installation package
   - Windows: .exe installation package
   - macOS: .dmg installation package
   - Linux: .AppImage

2. After the installation is complete, open LM Studio

3. Click the "Search" icon (magnifying glass) on the left

4. Search models, recommended for novices:
   - "Qwen2.5-7B-Instruct-GGUF" (Excellent Chinese)
   - "Llama-3.2-3B-Instruct-GGUF" (lightweight)

5. Select the quantized version (Q4_K_M is the balance between quality and size)

6. Click Download and wait for completion

Output / what to expect

After the model is downloaded, it will be displayed in the “My Models” list. Q4_K_M quantized 7B model is about 4-5GB, Download time depends on internet speed.

Tips

Q4_K_M Quantize is the most recommended choice, providing the best balance between quality and file size.

Scenario

Choose the right model based on your hardware

Prompt example

Hardware configuration and model selection guide:

8GB memory (without discrete graphics):
- Select 3B-4B parametric model
- Quantization: Q4_K_M or Q5_K_M
- Recommended: Llama-3.2-3B, Phi-3.5-mini

16GB memory (without discrete graphics):
- Can run 7B parameter model
- Quantization: Q4_K_M
- Recommended: Qwen2.5-7B, Llama-3.1-8B

With discrete graphics card (8GB video memory):
- Models can be loaded to the GPU, greatly improving speed
- Fully load the 7B model to the GPU
- Turn on GPU acceleration in LM Studio settings

32GB RAM or 24GB VRAM:
- Can run 13B-14B parametric models
- Recommended: Qwen2.5-14B, Llama-3.1-14B

Output / what to expect

Choose the appropriate model based on your hardware, Avoid running slowly due to too large models, GPU acceleration can speed up generation by 5-10 times.

Tips

On the model download page of LM Studio, the hardware requirements of the model will be displayed, with green indicating recommended and yellow indicating barely usable.

Starter & above

The rest of this guide

Additional scenarios and the full comparison table are included with Starter and above. Sign in with an eligible account to load them.

View plans