bestofyou.tube
← All topics

Local AI & Self-Hosting

Running LLMs on your own hardware — without the API bill or the lock-in.

Architecture, benchmarks, and self-hosted inference stacks. For people who want to understand what's actually running, not just call an endpoint.

Curation rubric (what the LLM is told to look for)

Reward: empirical benchmarks, hardware teardowns, named tradeoffs in model/quantization choices, real deployment lessons, architectural reasoning. Penalize: AI hype, vague 'this changes everything' framing, sponsor-heavy filler, generic news roundups.

Seed channels
  • @servethehomevideo
  • @JeffGeerling
  • @NetworkChuck

Top picks · 2

92·Top pickPlaceholder

Benchmarking 4-bit quantization across consumer GPUs

@servethehomevideo · 15:00

Why this is here

Scored 92/100 on substance — among the top 5% indexed for Local AI & Self-Hosting. Heavy use of primary sources and explicit reasoning chains.

Key takeaways
  • Key claim is supported with on-screen evidence (data, citations, or worked examples)
  • Avoids the most common shallow framing of the topic
  • Specifically covers: benchmarking 4-bit quantization across consumer gpus
88·Top pickPlaceholder

Self-hosted llama.cpp + Ollama production setup

@JeffGeerling · 19:00

Why this is here

Scored 88/100. Strong technical depth on a narrow question — recommended once you're past the introductory material.

Key takeaways
  • Key claim is supported with on-screen evidence (data, citations, or worked examples)
  • Avoids the most common shallow framing of the topic
  • Specifically covers: self-hosted llama.cpp + ollama production setup

Also strong · 3

84·StrongPlaceholder

Why your local model is slower than the API (memory bandwidth)

@NetworkChuck · 23:00

Why this is here

Scored 84/100. Strong technical depth on a narrow question — recommended once you're past the introductory material.

Key takeaways
  • Key claim is supported with on-screen evidence (data, citations, or worked examples)
  • Avoids the most common shallow framing of the topic
  • Specifically covers: why your local model is slower than the api (memory bandwidth)
79·StrongPlaceholder

Building a $1,500 inference server from used parts

@servethehomevideo · 27:00

Why this is here

Scored 79/100. Useful introductory framing with some hand-waving in the technical sections — pair it with deeper sources from this topic.

Key takeaways
  • Key claim is supported with on-screen evidence (data, citations, or worked examples)
  • Avoids the most common shallow framing of the topic
  • Specifically covers: building a $1,500 inference server from used parts
76·StrongPlaceholder

Comparing local embedding models for RAG

@JeffGeerling · 31:00

Why this is here

Scored 76/100. Useful introductory framing with some hand-waving in the technical sections — pair it with deeper sources from this topic.

Key takeaways
  • Key claim is supported with on-screen evidence (data, citations, or worked examples)
  • Avoids the most common shallow framing of the topic
  • Specifically covers: comparing local embedding models for rag