Liquid4All · alay2shah · Jan 27, 2026 · Jan 27, 2026 · Jan 27, 2026 · Jan 27, 2026
diff --git a/docs.json b/docs.json
@@ -36,7 +36,7 @@
   "logo": {
     "light": "/logo/light.svg",
     "dark": "/logo/dark.svg",
-    "href": "/"
+    "href": "https://liquid.ai"
   },
   "navbar": {
     "links": [
@@ -52,20 +52,6 @@
     }
   },
   "navigation": {
-    "global": {
-      "anchors": [
-        {
-          "anchor": "About Us",
-          "icon": "building",
-          "href": "https://www.liquid.ai/company/about"
-        },
-        {
-          "anchor": "Blog",
-          "icon": "pencil",
-          "href": "https://www.liquid.ai/company/blog"
-        }
-      ]
-    },
     "tabs": [
       {
         "tab": "Documentation",
@@ -202,7 +188,7 @@
         ]
       },
       {
-        "tab": "Guides",
+        "tab": "Examples",
         "groups": [
           {
             "group": "Get Started",

diff --git a/docs/help/faqs.mdx b/docs/help/faqs.mdx
@@ -69,7 +69,7 @@ For most use cases, Q4_K_M or Q5_K_M provide good quality with significant size
 ## Fine-tuning
 
 <Accordion title="Can I fine-tune LFM models?">
-Yes! Most LFM models support fine-tuning with [TRL](/lfm/fine-tuning/trl) and [Unsloth](/lfm/fine-tuning/unsloth). Check the [Complete Model Library](/lfm/models/complete-library) for trainability information.
+Yes! Most LFM models support fine-tuning with [TRL](/docs/fine-tuning/trl) and [Unsloth](/docs/fine-tuning/unsloth). Check the [Model Library](/docs/models/complete-library) for trainability information.
 </Accordion>
 
 <Accordion title="What fine-tuning methods are supported?">

diff --git a/docs/inference/llama-cpp.mdx b/docs/inference/llama-cpp.mdx
@@ -114,67 +114,25 @@
 
 ## Basic Usage
 
-llama.cpp offers three main interfaces for running inference: `llama-cpp-python` (Python bindings), `llama-server` (OpenAI-compatible server), and `llama-cli` (interactive CLI).
+llama.cpp offers two main interfaces for running inference: `llama-server` (OpenAI-compatible server) and `llama-cli` (interactive CLI).
 
 <Tabs>
-  <Tab title="llama-cpp-python">
-    For Python applications, use the `llama-cpp-python` package.
-
-    **Installation:**
-    ```bash
-    pip install llama-cpp-python
-    ```
-
-    For GPU support:
-    ```bash
-    CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
-    ```
-
-    **Model Setup:**
-    ```python
-    from llama_cpp import Llama
-
-    # Load model
-    llm = Llama(
-        model_path="lfm2.5-1.2b-instruct-q4_k_m.gguf",
-        n_ctx=4096,
-        n_threads=8
-    )
-
-    # Generate text
-    output = llm(
-        "What is artificial intelligence?",
-        max_tokens=512,
-        temperature=0.7,
-        top_p=0.9
-    )
-    print(output["choices"][0]["text"])
-    ```
-
-    **Chat Completions:**
-    ```python
-    response = llm.create_chat_completion(
-        messages=[
-            {"role": "system", "content": "You are a helpful assistant."},
-            {"role": "user", "content": "Explain quantum computing."}
-        ],
-        temperature=0.7,
-        max_tokens=512
-    )
-    print(response["choices"][0]["message"]["content"])
-    ```
-  </Tab>
-
   <Tab title="llama-server">
     llama-server provides an OpenAI-compatible API for serving models locally.
 
     **Starting the Server:**
     ```bash
+    llama-server -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF -c 4096 --port 8080
+    ```
+
+    The `-hf` flag downloads the model directly from Hugging Face. Alternatively, use a local model file:
+    ```bash
     llama-server -m lfm2.5-1.2b-instruct-q4_k_m.gguf -c 4096 --port 8080
     ```
 
     Key parameters:
-    * `-m`: Path to GGUF model file
+    * `-hf`: Hugging Face model ID (downloads automatically)
+    * `-m`: Path to local GGUF model file
     * `-c`: Context length (default: 4096)
     * `--port`: Server port (default: 8080)
     * `-ngl 99`: Offload layers to GPU (if available)
@@ -216,12 +174,18 @@
   <Tab title="llama-cli">
     llama-cli provides an interactive terminal interface for chatting with models.
 
+    ```bash
+    llama-cli -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF -c 4096 --color -i
+    ```
+
+    The `-hf` flag downloads the model directly from Hugging Face. Alternatively, use a local model file:
     ```bash
     llama-cli -m lfm2.5-1.2b-instruct-q4_k_m.gguf -c 4096 --color -i
     ```
 
     Key parameters:
-    * `-m`: Path to GGUF model file
+    * `-hf`: Hugging Face model ID (downloads automatically)
+    * `-m`: Path to local GGUF model file
     * `-c`: Context length
     * `--color`: Colored output
     * `-i`: Interactive mode
@@ -236,49 +200,12 @@
 Control text generation behavior using parameters in the OpenAI-compatible API or command-line flags. Key parameters:

 * **`temperature`** (`float`, default 1.0): Controls randomness (0.0 = deterministic, higher = more random). Typical range: 0.1-2.0
 * **`top_p`** (`float`, default 1.0): Nucleus sampling - limits to tokens with cumulative probability ≤ top\_p. Typical range: 0.1-1.0
 * **`top_k`** (`int`, default 40): Limits to top-k most probable tokens. Typical range: 1-100
 * **`max_tokens`** / **`--n-predict`** (`int`): Maximum number of tokens to generate
 * **`repetition_penalty`** / **`--repeat-penalty`** (`float`, default 1.1): Penalty for repeating tokens (>1.0 = discourage repetition). Typical range: 1.0-1.5
 * **`stop`** (`str` or `list[str]`): Strings that terminate generation when encountered
 
-<Accordion title="llama-cpp-python example">
-  ```python
-  from llama_cpp import Llama
-
-  llm = Llama(
-      model_path="lfm2.5-1.2b-instruct-q4_k_m.gguf",
-      n_ctx=4096,
-      n_threads=8
-  )
-
-  # Text generation with sampling parameters
-  output = llm(
-      "What is machine learning?",
-      max_tokens=512,
-      temperature=0.7,
-      top_p=0.9,
-      top_k=40,
-      repeat_penalty=1.1,
-      stop=["<|im_end|>", "<|endoftext|>"]
-  )
-  print(output["choices"][0]["text"])
-
-  # Chat completion with sampling parameters
-  response = llm.create_chat_completion(
-      messages=[
-          {"role": "user", "content": "Explain quantum computing."}
-      ],
-      temperature=0.7,
-      top_p=0.9,
-      top_k=40,
-      max_tokens=512,
-      repeat_penalty=1.1
-  )
-  print(response["choices"][0]["message"]["content"])
-  ```
-</Accordion>
-
 <Accordion title="llama-server (OpenAI-compatible API) example">
   ```python
   from openai import OpenAI
@@ -305,7 +232,7 @@

 ## Vision Models

 LFM2-VL GGUF models can be used for multimodal inference with llama.cpp.

 ### Quick Start with llama-cli

@@ -407,45 +334,13 @@
   ```
 </Accordion>
 
-<Accordion title="Using llama-cpp-python">
-  ```python
-  from llama_cpp import Llama
-  from llama_cpp.llama_chat_format import Llava15ChatHandler
-
-  # Initialize with vision support
-  # Note: Use the correct chat handler for your model architecture
-  chat_handler = Llava15ChatHandler(clip_model_path="mmproj-model-f16.gguf")
-
-  llm = Llama(
-      model_path="lfm2.5-vl-1.6b-q4_k_m.gguf",
-      chat_handler=chat_handler,
-      n_ctx=4096
-  )
-
-  # Generate with image
-  response = llm.create_chat_completion(
-      messages=[
-          {
-              "role": "user",
-              "content": [
-                  {"type": "image_url", "image_url": {"url": "file:///path/to/image.jpg"}},
-                  {"type": "text", "text": "Describe this image."}
-              ]
-          }
-      ],
-      max_tokens=256
-  )
-  print(response["choices"][0]["message"]["content"])
-  ```
-</Accordion>
-
 <Info>
 For a complete working example with step-by-step instructions, see the [llama.cpp Vision Model Colab notebook](https://colab.research.google.com/drive/1q2PjE6O_AahakRlkTNJGYL32MsdUcj7b?usp=sharing).
 </Info>

 ## Converting Custom Models

 If you have a finetuned model or need to create a GGUF from a Hugging Face model:

 ```bash
 # Clone llama.cpp if you haven't already