-
Notifications
You must be signed in to change notification settings - Fork 104
Description
Hello,
Thank you for your excellent work on this project!
In your DCVC-FM paper, Equation (4) specifies how to compute
This
which we aim to minimize during training.
I would be grateful if you could clarify the following:
-
Is the rate
$R$ defined as the bits-per-pixel (bpp) loss, i.e.,bpp = bits / pixel_num, as is typically done? -
My understanding is that both YUV420 and RGB are converted to YUV444 during training, and that the distortion terms are then computed in the YUV444 domain. Is that correct?
-
For the distortion terms
$D_{YUV}$ and$D_{RGB}$ , are they computed as the average MSE between the reconstructed and the target frame (similar totorch.nn.MSELoss()), or is there a different scaling applied for each individual YUV component?
Thank you for your time and support!