Multithreading vs. multiprocessing

At the moment we have as default `multiprocessing=False`, but I wonder what was/is the reasoning behind it.

When browsing the web, I can find the following statement:
* multi-threading is good for IO-bound processes like reading or downloading files
* multi-processing is good for computational heavy tasks

When doing a simple test:

```python
import audb
import audinterface
import audmath
import time

def process_func(signal, sampling_rate):
    return audmath.db(audmath.rms(signal))

db = audb.load("emodb", version="1.4.1")
for multiprocessing in [False, True]:
    for num_workers in [1, 5]:
        interface = audinterface.Feature(
            ["rms"],
            process_func=process_func,
            num_workers=num_workers,
            multiprocessing=multiprocessing,
        )
        t0 = time.time()
        df = interface.process_index(db.files)
        t = time.time() - t0
        print(f"{multiprocessing=}, {num_workers=}: {t:.2f} s")
```
it returns (after running the second time)
```
multiprocessing=False, num_workers=1: 0.16 s                                                        
multiprocessing=False, num_workers=5: 0.26 s
multiprocessing=True, num_workers=1: 0.16 s
multiprocessing=True, num_workers=5: 0.11 s
```

Even though we don't do heavy processing here, multi-processing seems to be faster in this case. Is this expected?

/cc @ureichel, @ChristianGeng, @frankenjoe, @maxschmitt, @audeerington, @schruefer      


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreading vs. multiprocessing #171

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Multithreading vs. multiprocessing #171

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions