Add VAE to txt-to-speech Inference

Hey hey!

So I am using some models that either have VAE baked in or require a separate VAE to be defined during inference like this:

```
model = "CompVis/stable-diffusion-v1-4"
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)
```

when I either manually added the vae or used a model with a vae baked in for the `MODEL_ID`, I received the following error, for example with the model `dreamlike-art/dreamlike-photoreal-2.0`


```
'name': 'RuntimeError', 'message': 'Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same', 'stack': 'Traceback (most recent call last):\n  File "/api/app.py", line 382, in inference\n    images = pipeline(**model_inputs).images\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context\n    return func(*args, **kwargs)\n  File "/api/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 606, in __call__\n    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=prompt_embeds).sample\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl\n    return forward_call(*input, **kwargs)\n  File "/api/diffusers/src/diffusers/models/unet_2d_condition.py", line 475, in forward\n    sample = self.conv_in(sample)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl\n    return forward_call(*input, **kwargs)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 457, in forward\n    return self._conv_forward(input, self.weight, self.bias)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward\n    return F.conv2d(input, weight, bias, self.stride,\nRuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
```

Line 382 in the inference function which looks like this:

`images = pipeline(**model_inputs).images`


Perhaps we need to add a .half() to the input somewhere, not sure where. though.

Any help would be greatly appreciated! 

It's the last hurdle I am facing to be generating images.

IDEA:
It would be awesome if we could define an optional VAE when making API call like this:
```
model_inputs["callInputs"] = {
                "MODEL_ID": "runwayml/stable-diffusion-v1-5",
                "PIPELINE": "StableDiffusionPipeline",
                "SCHEDULER": self.scheduler,
                "VAE": "stabilityai/sd-vae-ft-mse"
            }
```

 





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add VAE to txt-to-speech Inference #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add VAE to txt-to-speech Inference #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions