Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/docs/01-fundamentals/03-frequently-asked-questions.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ Each hook documentation subpage (useClassification, useLLM, etc.) contains a sup

### How can I run my own AI model?

To run your own model, you need to directly access the underlying [ExecuTorch Module API](https://pytorch.org/executorch/stable/extension-module.html). We provide an experimental [React hook](../02-hooks/03-executorch-bindings/useExecutorchModule.md) along with a [TypeScript alternative](../03-typescript-api/03-executorch-bindings/ExecutorchModule.md), which serve as a way to use the aforementioned API without the need of diving into native code. In order to get a model in a format runnable by the runtime, you'll need to get your hands dirty with some ExecuTorch knowledge. For more guides on exporting models, please refer to the [ExecuTorch tutorials](https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html). Once you obtain your model in a `.pte` format, you can run it with `useExecuTorchModule` and `ExecuTorchModule`.
To run your own model, you need to directly access the underlying [ExecuTorch Module API](https://pytorch.org/executorch/stable/extension-module.html). We provide an experimental [React hook](../03-hooks/03-executorch-bindings/useExecutorchModule.md) along with a [TypeScript alternative](../04-typescript-api/03-executorch-bindings/ExecutorchModule.md), which serve as a way to use the aforementioned API without the need of diving into native code. In order to get a model in a format runnable by the runtime, you'll need to get your hands dirty with some ExecuTorch knowledge. For more guides on exporting models, please refer to the [ExecuTorch tutorials](https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html). Once you obtain your model in a `.pte` format, you can run it with `useExecuTorchModule` and `ExecuTorchModule`.

### Can you do function calling with useLLM?

If your model supports tool calling (i.e. its chat template can process tools) you can use the method explained on the [useLLM page](../02-hooks/01-natural-language-processing/useLLM.md).
If your model supports tool calling (i.e. its chat template can process tools) you can use the method explained on the [useLLM page](../03-hooks/01-natural-language-processing/useLLM.md).

If your model doesn't support it, you can still work around it using context. For details, refer to [this comment](https://github.com/software-mansion/react-native-executorch/issues/173#issuecomment-2775082278).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -498,40 +498,3 @@ Depending on selected model and the user's device generation speed can be above
| [Phi 4 Mini](https://huggingface.co/software-mansion/react-native-executorch-phi-4-mini) | 4B | ✅ |
| [SmolLM 2](https://huggingface.co/software-mansion/react-native-executorch-smolLm-2) | 135M, 360M, 1.7B | ✅ |
| [LLaMA 3.2](https://huggingface.co/software-mansion/react-native-executorch-llama-3.2) | 1B, 3B | ✅ |

## Benchmarks

### Model size

| Model | XNNPACK [GB] |
| --------------------- | :----------: |
| LLAMA3_2_1B | 2.47 |
| LLAMA3_2_1B_SPINQUANT | 1.14 |
| LLAMA3_2_1B_QLORA | 1.18 |
| LLAMA3_2_3B | 6.43 |
| LLAMA3_2_3B_SPINQUANT | 2.55 |
| LLAMA3_2_3B_QLORA | 2.65 |

### Memory usage

| Model | Android (XNNPACK) [GB] | iOS (XNNPACK) [GB] |
| --------------------- | :--------------------: | :----------------: |
| LLAMA3_2_1B | 3.2 | 3.1 |
| LLAMA3_2_1B_SPINQUANT | 1.9 | 2 |
| LLAMA3_2_1B_QLORA | 2.2 | 2.5 |
| LLAMA3_2_3B | 7.1 | 7.3 |
| LLAMA3_2_3B_SPINQUANT | 3.7 | 3.8 |
| LLAMA3_2_3B_QLORA | 4 | 4.1 |

### Inference time

| Model | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] |
| --------------------- | :--------------------------------: | :--------------------------------: | :------------------------------: | :-------------------------------------: | :-----------------------------: |
| LLAMA3_2_1B | 16.1 | 11.4 | ❌ | 15.6 | 19.3 |
| LLAMA3_2_1B_SPINQUANT | 40.6 | 16.7 | 16.5 | 40.3 | 48.2 |
| LLAMA3_2_1B_QLORA | 31.8 | 11.4 | 11.2 | 37.3 | 44.4 |
| LLAMA3_2_3B | ❌ | ❌ | ❌ | ❌ | 7.1 |
| LLAMA3_2_3B_SPINQUANT | 17.2 | 8.2 | ❌ | 16.2 | 19.4 |
| LLAMA3_2_3B_QLORA | 14.5 | ❌ | ❌ | 14.8 | 18.1 |

❌ - Insufficient RAM.
Original file line number Diff line number Diff line change
Expand Up @@ -322,22 +322,3 @@ function App() {
| [whisper-base](https://huggingface.co/openai/whisper-base) | Multilingual |
| [whisper-small.en](https://huggingface.co/openai/whisper-small.en) | English |
| [whisper-small](https://huggingface.co/openai/whisper-small) | Multilingual |

## Benchmarks

### Model size

| Model | XNNPACK [MB] |
| ---------------- | :----------: |
| WHISPER_TINY_EN | 151 |
| WHISPER_TINY | 151 |
| WHISPER_BASE_EN | 290.6 |
| WHISPER_BASE | 290.6 |
| WHISPER_SMALL_EN | 968 |
| WHISPER_SMALL | 968 |

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------ | :--------------------: | :----------------: |
| WHISPER_TINY | 410 | 375 |
Original file line number Diff line number Diff line change
Expand Up @@ -116,43 +116,3 @@ function App() {
:::info
For the supported models, the returned embedding vector is normalized, meaning that its length is equal to 1. This allows for easier comparison of vectors using cosine similarity, just calculate the dot product of two vectors to get the cosine similarity score.
:::

## Benchmarks

### Model size

| Model | XNNPACK [MB] |
| -------------------------- | :----------: |
| ALL_MINILM_L6_V2 | 91 |
| ALL_MPNET_BASE_V2 | 438 |
| MULTI_QA_MINILM_L6_COS_V1 | 91 |
| MULTI_QA_MPNET_BASE_DOT_V1 | 438 |
| CLIP_VIT_BASE_PATCH32_TEXT | 254 |

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| -------------------------- | :--------------------: | :----------------: |
| ALL_MINILM_L6_V2 | 95 | 110 |
| ALL_MPNET_BASE_V2 | 405 | 455 |
| MULTI_QA_MINILM_L6_COS_V1 | 120 | 140 |
| MULTI_QA_MPNET_BASE_DOT_V1 | 435 | 455 |
| CLIP_VIT_BASE_PATCH32_TEXT | 200 | 280 |

### Inference time

:::warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

| Model | iPhone 17 Pro (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
| -------------------------- | :--------------------------: | :-----------------------: |
| ALL_MINILM_L6_V2 | 7 | 21 |
| ALL_MPNET_BASE_V2 | 24 | 90 |
| MULTI_QA_MINILM_L6_COS_V1 | 7 | 19 |
| MULTI_QA_MPNET_BASE_DOT_V1 | 24 | 88 |
| CLIP_VIT_BASE_PATCH32_TEXT | 14 | 39 |

:::info
Benchmark times for text embeddings are highly dependent on the sentence length. The numbers above are based on a sentence of around 80 tokens. For shorter or longer sentences, inference time may vary accordingly.
:::
Original file line number Diff line number Diff line change
Expand Up @@ -164,31 +164,3 @@ export default function App() {
## Supported models

- [fsmn-vad](https://huggingface.co/funasr/fsmn-vad)

## Benchmarks

### Model size

| Model | XNNPACK [MB] |
| -------- | :----------: |
| FSMN_VAD | 1.83 |

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| -------- | :--------------------: | :----------------: |
| FSMN_VAD | 97 | 45,9 |

### Inference time

<!-- TODO: MEASURE INFERENCE TIME FOR SAMSUNG GALAXY S24 WHEN POSSIBLE -->

:::warning warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

Inference time were measured on a 60s audio, that can be found [here](https://models.silero.ai/vad_models/en.wav).

| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
| -------- | :--------------------------: | :------------------------------: | :------------------------: | :-----------------------: |
| FSMN_VAD | 151 | 171 | 180 | 109 |
Original file line number Diff line number Diff line change
Expand Up @@ -87,27 +87,3 @@ function App() {
| Model | Number of classes | Class list |
| ----------------------------------------------------------------------------------------------------------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [efficientnet_v2_s](https://pytorch.org/vision/stable/models/generated/torchvision.models.efficientnet_v2_s.html) | 1000 | [ImageNet1k_v1](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/common/rnexecutorch/models/classification/Constants.h) |

## Benchmarks

### Model size

| Model | XNNPACK [MB] | Core ML [MB] |
| ----------------- | :----------: | :----------: |
| EFFICIENTNET_V2_S | 85.6 | 43.9 |

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
| ----------------- | :--------------------: | :----------------: |
| EFFICIENTNET_V2_S | 230 | 87 |

### Inference time

:::warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

| Model | iPhone 17 Pro (Core ML) [ms] | iPhone 16 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
| ----------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: |
| EFFICIENTNET_V2_S | 64 | 68 | 217 | 205 | 198 |
Original file line number Diff line number Diff line change
Expand Up @@ -102,31 +102,3 @@ try {
:::info
For the supported models, the returned embedding vector is normalized, meaning that its length is equal to 1. This allows for easier comparison of vectors using cosine similarity, just calculate the dot product of two vectors to get the cosine similarity score.
:::

## Benchmarks

### Model size

| Model | XNNPACK [MB] |
| --------------------------- | :----------: |
| CLIP_VIT_BASE_PATCH32_IMAGE | 352 |

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| --------------------------- | :--------------------: | :----------------: |
| CLIP_VIT_BASE_PATCH32_IMAGE | 350 | 340 |

### Inference time

:::warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. Performance also heavily depends on image size, because resize is expansive operation, especially on low-end devices.
:::

| Model | iPhone 17 Pro (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
| --------------------------- | :--------------------------: | :-----------------------: |
| CLIP_VIT_BASE_PATCH32_IMAGE | 18 | 55 |

:::info
Image embedding benchmark times are measured using 224×224 pixel images, as required by the model. All input images, whether larger or smaller, are resized to 224×224 before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total inference time.
:::
Original file line number Diff line number Diff line change
Expand Up @@ -87,31 +87,3 @@ function App() {
| Model | Number of classes | Class list |
| -------------------------------------------------------------------------------------------------------------------------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| [deeplabv3_resnet50](https://pytorch.org/vision/stable/models/generated/torchvision.models.segmentation.deeplabv3_resnet50.html) | 21 | [DeeplabLabel](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/imageSegmentation.ts) |

## Benchmarks

### Model size

| Model | XNNPACK [MB] |
| ----------------- | ------------ |
| DEELABV3_RESNET50 | 168 |

### Memory usage

:::warning
Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions.
:::

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ----------------- | ---------------------- | ------------------ |
| DEELABV3_RESNET50 | 930 | 660 |

### Inference time

:::warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 14 Pro Max (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] |
| ----------------- | ---------------------------- | -------------------------------- | --------------------------------- |
| DEELABV3_RESNET50 | 1000 | 670 | 700 |
Original file line number Diff line number Diff line change
Expand Up @@ -266,45 +266,3 @@ You need to make sure the recognizer model you pass in `recognizerSource` matche
| ------------------------------------------------- | :--------: |
| [CRAFT](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRNN](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |

## Benchmarks

### Model size

| Model | XNNPACK [MB] |
| -------------------------- | :-----------: |
| Detector (CRAFT_QUANTIZED) | 20.9 |
| Recognizer (CRNN) | 18.5 - 25.2\* |

\* - The model weights vary depending on the language.

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------------------------------ | :--------------------: | :----------------: |
| Detector (CRAFT) + Recognizer (CRNN) | 1400 | 1320 |

### Inference time

**Image Used for Benchmarking:**

| ![Alt text](../../../static/img/harvard.png) | ![Alt text](../../../static/img/harvard-boxes.png) |
| -------------------------------------------- | -------------------------------------------------- |
| Original Image | Image with detected Text Boxes |

:::warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

**Time measurements:**

Notice that the recognizer models were executed between 3 and 7 times during a single recognition.
The values below represent the averages across all runs for the benchmark image.

| Model | iPhone 17 Pro [ms] | iPhone 16 Pro [ms] | iPhone SE 3 | Samsung Galaxy S24 [ms] | OnePlus 12 [ms] |
| ------------------------------- | ------------------ | ------------------ | ----------- | ----------------------- | --------------- |
| **Total Inference Time** | 652 | 600 | 2855 | 1092 | 1034 |
| Detector (CRAFT) `forward_800` | 220 | 221 | 1740 | 521 | 492 |
| Recognizer (CRNN) `forward_512` | 45 | 38 | 110 | 40 | 38 |
| Recognizer (CRNN) `forward_256` | 21 | 18 | 54 | 20 | 19 |
| Recognizer (CRNN) `forward_128` | 11 | 9 | 27 | 10 | 10 |
Loading