software-mansion · msluszniak · Jan 22, 2026 · Jan 22, 2026
diff --git a/docs/docs/01-fundamentals/03-frequently-asked-questions.md b/docs/docs/01-fundamentals/03-frequently-asked-questions.md
@@ -10,11 +10,11 @@ Each hook documentation subpage (useClassification, useLLM, etc.) contains a sup
 
 ### How can I run my own AI model?
 
-To run your own model, you need to directly access the underlying [ExecuTorch Module API](https://pytorch.org/executorch/stable/extension-module.html). We provide an experimental [React hook](../02-hooks/03-executorch-bindings/useExecutorchModule.md) along with a [TypeScript alternative](../03-typescript-api/03-executorch-bindings/ExecutorchModule.md), which serve as a way to use the aforementioned API without the need of diving into native code. In order to get a model in a format runnable by the runtime, you'll need to get your hands dirty with some ExecuTorch knowledge. For more guides on exporting models, please refer to the [ExecuTorch tutorials](https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html). Once you obtain your model in a `.pte` format, you can run it with `useExecuTorchModule` and `ExecuTorchModule`.
+To run your own model, you need to directly access the underlying [ExecuTorch Module API](https://pytorch.org/executorch/stable/extension-module.html). We provide an experimental [React hook](../03-hooks/03-executorch-bindings/useExecutorchModule.md) along with a [TypeScript alternative](../04-typescript-api/03-executorch-bindings/ExecutorchModule.md), which serve as a way to use the aforementioned API without the need of diving into native code. In order to get a model in a format runnable by the runtime, you'll need to get your hands dirty with some ExecuTorch knowledge. For more guides on exporting models, please refer to the [ExecuTorch tutorials](https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html). Once you obtain your model in a `.pte` format, you can run it with `useExecuTorchModule` and `ExecuTorchModule`.
 
 ### Can you do function calling with useLLM?
 
-If your model supports tool calling (i.e. its chat template can process tools) you can use the method explained on the [useLLM page](../02-hooks/01-natural-language-processing/useLLM.md).
+If your model supports tool calling (i.e. its chat template can process tools) you can use the method explained on the [useLLM page](../03-hooks/01-natural-language-processing/useLLM.md).
 
 If your model doesn't support it, you can still work around it using context. For details, refer to [this comment](https://github.com/software-mansion/react-native-executorch/issues/173#issuecomment-2775082278).
 

diff --git a/docs/docs/04-benchmarks/_category_.json → docs/docs/02-benchmarks/_category_.json b/docs/docs/04-benchmarks/_category_.json → docs/docs/02-benchmarks/_category_.json
diff --git a/docs/docs/04-benchmarks/inference-time.md → docs/docs/02-benchmarks/inference-time.md b/docs/docs/04-benchmarks/inference-time.md → docs/docs/02-benchmarks/inference-time.md
diff --git a/docs/docs/04-benchmarks/memory-usage.md → docs/docs/02-benchmarks/memory-usage.md b/docs/docs/04-benchmarks/memory-usage.md → docs/docs/02-benchmarks/memory-usage.md
diff --git a/docs/docs/04-benchmarks/model-size.md → docs/docs/02-benchmarks/model-size.md b/docs/docs/04-benchmarks/model-size.md → docs/docs/02-benchmarks/model-size.md
diff --git a/...tural-language-processing/_category_.json → ...tural-language-processing/_category_.json b/...tural-language-processing/_category_.json → ...tural-language-processing/_category_.json
diff --git a/.../01-natural-language-processing/useLLM.md → .../01-natural-language-processing/useLLM.md b/.../01-natural-language-processing/useLLM.md → .../01-natural-language-processing/useLLM.md
@@ -498,40 +498,3 @@ Depending on selected model and the user's device generation speed can be above
 | [Phi 4 Mini](https://huggingface.co/software-mansion/react-native-executorch-phi-4-mini) |        4B        |    ✅     |
 | [SmolLM 2](https://huggingface.co/software-mansion/react-native-executorch-smolLm-2)     | 135M, 360M, 1.7B |    ✅     |
 | [LLaMA 3.2](https://huggingface.co/software-mansion/react-native-executorch-llama-3.2)   |      1B, 3B      |    ✅     |
-
-## Benchmarks
-
-### Model size
-
-| Model                 | XNNPACK [GB] |
-| --------------------- | :----------: |
-| LLAMA3_2_1B           |     2.47     |
-| LLAMA3_2_1B_SPINQUANT |     1.14     |
-| LLAMA3_2_1B_QLORA     |     1.18     |
-| LLAMA3_2_3B           |     6.43     |
-| LLAMA3_2_3B_SPINQUANT |     2.55     |
-| LLAMA3_2_3B_QLORA     |     2.65     |
-
-### Memory usage
-
-| Model                 | Android (XNNPACK) [GB] | iOS (XNNPACK) [GB] |
-| --------------------- | :--------------------: | :----------------: |
-| LLAMA3_2_1B           |          3.2           |        3.1         |
-| LLAMA3_2_1B_SPINQUANT |          1.9           |         2          |
-| LLAMA3_2_1B_QLORA     |          2.2           |        2.5         |
-| LLAMA3_2_3B           |          7.1           |        7.3         |
-| LLAMA3_2_3B_SPINQUANT |          3.7           |        3.8         |
-| LLAMA3_2_3B_QLORA     |           4            |        4.1         |
-
-### Inference time
-
-| Model                 | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] |
-| --------------------- | :--------------------------------: | :--------------------------------: | :------------------------------: | :-------------------------------------: | :-----------------------------: |
-| LLAMA3_2_1B           |                16.1                |                11.4                |                ❌                |                  15.6                   |              19.3               |
-| LLAMA3_2_1B_SPINQUANT |                40.6                |                16.7                |               16.5               |                  40.3                   |              48.2               |
-| LLAMA3_2_1B_QLORA     |                31.8                |                11.4                |               11.2               |                  37.3                   |              44.4               |
-| LLAMA3_2_3B           |                 ❌                 |                 ❌                 |                ❌                |                   ❌                    |               7.1               |
-| LLAMA3_2_3B_SPINQUANT |                17.2                |                8.2                 |                ❌                |                  16.2                   |              19.4               |
-| LLAMA3_2_3B_QLORA     |                14.5                |                 ❌                 |                ❌                |                  14.8                   |              18.1               |
-
-❌ - Insufficient RAM.
diff --git a/...al-language-processing/useSpeechToText.md → ...al-language-processing/useSpeechToText.md b/...al-language-processing/useSpeechToText.md → ...al-language-processing/useSpeechToText.md
@@ -322,22 +322,3 @@ function App() {
 | [whisper-base](https://huggingface.co/openai/whisper-base)         | Multilingual |
 | [whisper-small.en](https://huggingface.co/openai/whisper-small.en) |   English    |
 | [whisper-small](https://huggingface.co/openai/whisper-small)       | Multilingual |
-
-## Benchmarks
-
-### Model size
-
-| Model            | XNNPACK [MB] |
-| ---------------- | :----------: |
-| WHISPER_TINY_EN  |     151      |
-| WHISPER_TINY     |     151      |
-| WHISPER_BASE_EN  |    290.6     |
-| WHISPER_BASE     |    290.6     |
-| WHISPER_SMALL_EN |     968      |
-| WHISPER_SMALL    |     968      |
-
-### Memory usage
-
-| Model        | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
-| ------------ | :--------------------: | :----------------: |
-| WHISPER_TINY |          410           |        375         |
diff --git a/...-language-processing/useTextEmbeddings.md → ...-language-processing/useTextEmbeddings.md b/...-language-processing/useTextEmbeddings.md → ...-language-processing/useTextEmbeddings.md
@@ -116,43 +116,3 @@ function App() {
 :::info
 For the supported models, the returned embedding vector is normalized, meaning that its length is equal to 1. This allows for easier comparison of vectors using cosine similarity, just calculate the dot product of two vectors to get the cosine similarity score.
 :::
-
-## Benchmarks
-
-### Model size
-
-| Model                      | XNNPACK [MB] |
-| -------------------------- | :----------: |
-| ALL_MINILM_L6_V2           |      91      |
-| ALL_MPNET_BASE_V2          |     438      |
-| MULTI_QA_MINILM_L6_COS_V1  |      91      |
-| MULTI_QA_MPNET_BASE_DOT_V1 |     438      |
-| CLIP_VIT_BASE_PATCH32_TEXT |     254      |
-
-### Memory usage
-
-| Model                      | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
-| -------------------------- | :--------------------: | :----------------: |
-| ALL_MINILM_L6_V2           |           95           |        110         |
-| ALL_MPNET_BASE_V2          |          405           |        455         |
-| MULTI_QA_MINILM_L6_COS_V1  |          120           |        140         |
-| MULTI_QA_MPNET_BASE_DOT_V1 |          435           |        455         |
-| CLIP_VIT_BASE_PATCH32_TEXT |          200           |        280         |
-
-### Inference time
-
-:::warning
-Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
-:::
-
-| Model                      | iPhone 17 Pro (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
-| -------------------------- | :--------------------------: | :-----------------------: |
-| ALL_MINILM_L6_V2           |              7               |            21             |
-| ALL_MPNET_BASE_V2          |              24              |            90             |
-| MULTI_QA_MINILM_L6_COS_V1  |              7               |            19             |
-| MULTI_QA_MPNET_BASE_DOT_V1 |              24              |            88             |
-| CLIP_VIT_BASE_PATCH32_TEXT |              14              |            39             |
-
-:::info
-Benchmark times for text embeddings are highly dependent on the sentence length. The numbers above are based on a sentence of around 80 tokens. For shorter or longer sentences, inference time may vary accordingly.
-:::
diff --git a/...tural-language-processing/useTokenizer.md → ...tural-language-processing/useTokenizer.md b/...tural-language-processing/useTokenizer.md → ...tural-language-processing/useTokenizer.md
diff --git a/.../01-natural-language-processing/useVAD.md → .../01-natural-language-processing/useVAD.md b/.../01-natural-language-processing/useVAD.md → .../01-natural-language-processing/useVAD.md
@@ -164,31 +164,3 @@ export default function App() {
 ## Supported models
 
 - [fsmn-vad](https://huggingface.co/funasr/fsmn-vad)
-
-## Benchmarks
-
-### Model size
-
-| Model    | XNNPACK [MB] |
-| -------- | :----------: |
-| FSMN_VAD |     1.83     |
-
-### Memory usage
-
-| Model    | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
-| -------- | :--------------------: | :----------------: |
-| FSMN_VAD |           97           |        45,9        |
-
-### Inference time
-
-<!-- TODO: MEASURE INFERENCE TIME FOR SAMSUNG GALAXY S24 WHEN POSSIBLE -->
-
-:::warning warning
-Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
-:::
-
-Inference time were measured on a 60s audio, that can be found [here](https://models.silero.ai/vad_models/en.wav).
-
-| Model    | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
-| -------- | :--------------------------: | :------------------------------: | :------------------------: | :-----------------------: |
-| FSMN_VAD |             151              |               171                |            180             |            109            |
diff --git a/...-hooks/02-computer-vision/_category_.json → ...-hooks/02-computer-vision/_category_.json b/...-hooks/02-computer-vision/_category_.json → ...-hooks/02-computer-vision/_category_.json
diff --git a/...s/02-computer-vision/useClassification.md → ...s/02-computer-vision/useClassification.md b/...s/02-computer-vision/useClassification.md → ...s/02-computer-vision/useClassification.md
@@ -87,27 +87,3 @@ function App() {
 | Model                                                                                                             | Number of classes | Class list                                                                                                                                                                    |
 | ----------------------------------------------------------------------------------------------------------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | [efficientnet_v2_s](https://pytorch.org/vision/stable/models/generated/torchvision.models.efficientnet_v2_s.html) | 1000              | [ImageNet1k_v1](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/common/rnexecutorch/models/classification/Constants.h) |
-
-## Benchmarks
-
-### Model size
-
-| Model             | XNNPACK [MB] | Core ML [MB] |
-| ----------------- | :----------: | :----------: |
-| EFFICIENTNET_V2_S |     85.6     |     43.9     |
-
-### Memory usage
-
-| Model             | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
-| ----------------- | :--------------------: | :----------------: |
-| EFFICIENTNET_V2_S |          230           |         87         |
-
-### Inference time
-
-:::warning
-Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
-:::
-
-| Model             | iPhone 17 Pro (Core ML) [ms] | iPhone 16 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
-| ----------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: |
-| EFFICIENTNET_V2_S |              64              |              68              |            217             |                205                |            198            |
diff --git a/.../02-computer-vision/useImageEmbeddings.md → .../02-computer-vision/useImageEmbeddings.md b/.../02-computer-vision/useImageEmbeddings.md → .../02-computer-vision/useImageEmbeddings.md
@@ -102,31 +102,3 @@ try {
 :::info
 For the supported models, the returned embedding vector is normalized, meaning that its length is equal to 1. This allows for easier comparison of vectors using cosine similarity, just calculate the dot product of two vectors to get the cosine similarity score.
 :::
-
-## Benchmarks
-
-### Model size
-
-| Model                       | XNNPACK [MB] |
-| --------------------------- | :----------: |
-| CLIP_VIT_BASE_PATCH32_IMAGE |     352      |
-
-### Memory usage
-
-| Model                       | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
-| --------------------------- | :--------------------: | :----------------: |
-| CLIP_VIT_BASE_PATCH32_IMAGE |          350           |        340         |
-
-### Inference time
-
-:::warning
-Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. Performance also heavily depends on image size, because resize is expansive operation, especially on low-end devices.
-:::
-
-| Model                       | iPhone 17 Pro (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
-| --------------------------- | :--------------------------: | :-----------------------: |
-| CLIP_VIT_BASE_PATCH32_IMAGE |              18              |            55             |
-
-:::info
-Image embedding benchmark times are measured using 224×224 pixel images, as required by the model. All input images, whether larger or smaller, are resized to 224×224 before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total inference time.
-:::
diff --git a/...2-computer-vision/useImageSegmentation.md → ...2-computer-vision/useImageSegmentation.md b/...2-computer-vision/useImageSegmentation.md → ...2-computer-vision/useImageSegmentation.md
@@ -87,31 +87,3 @@ function App() {
 | Model                                                                                                                            | Number of classes | Class list                                                                                                                                            |
 | -------------------------------------------------------------------------------------------------------------------------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
 | [deeplabv3_resnet50](https://pytorch.org/vision/stable/models/generated/torchvision.models.segmentation.deeplabv3_resnet50.html) | 21                | [DeeplabLabel](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/imageSegmentation.ts) |
-
-## Benchmarks
-
-### Model size
-
-| Model             | XNNPACK [MB] |
-| ----------------- | ------------ |
-| DEELABV3_RESNET50 | 168          |
-
-### Memory usage
-
-:::warning
-Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions.
-:::
-
-| Model             | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
-| ----------------- | ---------------------- | ------------------ |
-| DEELABV3_RESNET50 | 930                    | 660                |
-
-### Inference time
-
-:::warning
-Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
-:::
-
-| Model             | iPhone 16 Pro (Core ML) [ms] | iPhone 14 Pro Max (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] |
-| ----------------- | ---------------------------- | -------------------------------- | --------------------------------- |
-| DEELABV3_RESNET50 | 1000                         | 670                              | 700                               |
diff --git a/...ocs/02-hooks/02-computer-vision/useOCR.md → ...ocs/03-hooks/02-computer-vision/useOCR.md b/...ocs/02-hooks/02-computer-vision/useOCR.md → ...ocs/03-hooks/02-computer-vision/useOCR.md
@@ -266,45 +266,3 @@ You need to make sure the recognizer model you pass in `recognizerSource` matche
 | ------------------------------------------------- | :--------: |
 | [CRAFT](https://github.com/clovaai/CRAFT-pytorch) |  Detector  |
 | [CRNN](https://www.jaided.ai/easyocr/modelhub/)   | Recognizer |
-
-## Benchmarks
-
-### Model size
-
-| Model                      | XNNPACK [MB]  |
-| -------------------------- | :-----------: |
-| Detector (CRAFT_QUANTIZED) |     20.9      |
-| Recognizer (CRNN)          | 18.5 - 25.2\* |
-
-\* - The model weights vary depending on the language.
-
-### Memory usage
-
-| Model                                | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
-| ------------------------------------ | :--------------------: | :----------------: |
-| Detector (CRAFT) + Recognizer (CRNN) |          1400          |        1320        |
-
-### Inference time
-
-**Image Used for Benchmarking:**
-
-| ![Alt text](../../../static/img/harvard.png) | ![Alt text](../../../static/img/harvard-boxes.png) |
-| -------------------------------------------- | -------------------------------------------------- |
-| Original Image                               | Image with detected Text Boxes                     |
-
-:::warning
-Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
-:::
-
-**Time measurements:**
-
-Notice that the recognizer models were executed between 3 and 7 times during a single recognition.
-The values below represent the averages across all runs for the benchmark image.
-
-| Model                           | iPhone 17 Pro [ms] | iPhone 16 Pro [ms] | iPhone SE 3 | Samsung Galaxy S24 [ms] | OnePlus 12 [ms] |
-| ------------------------------- | ------------------ | ------------------ | ----------- | ----------------------- | --------------- |
-| **Total Inference Time**        | 652                | 600                | 2855        | 1092                    | 1034            |
-| Detector (CRAFT) `forward_800`  | 220                | 221                | 1740        | 521                     | 492             |
-| Recognizer (CRNN) `forward_512` | 45                 | 38                 | 110         | 40                      | 38              |
-| Recognizer (CRNN) `forward_256` | 21                 | 18                 | 54          | 20                      | 19              |
-| Recognizer (CRNN) `forward_128` | 11                 | 9                  | 27          | 10                      | 10              |