Chainer MultiGPU

@mitmul Thank you for highlighting my typo in your PR request; I wanted to highlight two further issues I am facing [here](https://github.com/ilkarman/DeepLearningFrameworks/blob/mutligpuilia/notebooks/Chainer_MultiGPU.ipynb)

1. Toggling between single and muli-gpu (4x) improves time-taken from 47min15s to 14min43s; however for some reason the AUC also drops from 0.8028 (which matches all other examples) to 0.56. This does not happen for example with [PyTorch](https://github.com/ilkarman/DeepLearningFrameworks/blob/mutligpuilia/notebooks/PyTorch_MultiGPU.ipynb). There is a also a diff in validation/main/loss which ends at 0.23 for multi-gpu but 0.15 for single-gpu

2. I wondered if there was an update to the pre-trained densenet model so that I no longer have to override CaffeFunction with class<CaffeFunctionDenseNet121> to reduce the memory fooptrint? The custom _call__ lets me use a batch of 56 over 32, however I am still not able to get the low-memory footprint as with other frameworks that lets me run a batch of 64

```
Chainer:  4.1.0
CuPy:  4.1.0
Numpy:  1.14.1
GPU:  ['Tesla V100-PCIE-16GB', 'Tesla V100-PCIE-16GB', 'Tesla V100-PCIE-16GB', 'Tesla V100-PCIE-16GB']
CUDA Version 9.0.176
CuDNN Version  7.0.5
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chainer MultiGPU #87

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Chainer MultiGPU #87

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions