DCVC-FM Paper Loss Computation

Hello,

Thank you for your excellent work on this project!

In your DCVC-FM paper, Equation (4) specifies how to compute $\lambda$ based on the randomly selected $q_t$:

$$
\lambda = e^{\ln \lambda_{\min} + \frac{q_t}{q_{\text{num}} - 1} \cdot (\ln \lambda_{\max} - \ln \lambda_{\min})}
$$

This $\lambda$ is then used in the rate-distortion loss which is described in the "Implementation" section of your paper:

$$
\text{Loss}_{RD} = R + \lambda \cdot \big(k \cdot D_{YUV} + (1 - k) \cdot D_{RGB}\big)
$$

which we aim to minimize during training.

I would be grateful if you could clarify the following:

1. Is the rate $R$ defined as the bits-per-pixel (bpp) loss, i.e.,  `bpp = bits / pixel_num`, as is typically done?

2. My understanding is that both YUV420 and RGB are converted to YUV444 during training, and that the distortion terms are then computed in the YUV444 domain. Is that correct?

3. For the distortion terms $D_{YUV}$ and $D_{RGB}$, are they computed as the average MSE between the reconstructed and the target frame (similar to `torch.nn.MSELoss()`), or is there a different scaling applied for each individual YUV component?

Thank you for your time and support!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DCVC-FM Paper Loss Computation #130

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DCVC-FM Paper Loss Computation #130

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions