Skip to content

Question about coverage mechanism implementation #157

@iamxpy

Description

@iamxpy

I am trying to figure out the implementation of coverage mechanism, and after debug for a while, I still cannot understand why is the procedure of producing coverage vector in decode mode NOT the same as that in training/eval mode.

Related code is here: this line

Note that this attention decoder passes each decoder input through a linear layer with the previous step's context vector to get a modified version of the input. If initial_state_attention is False, on the first decoder step the "previous context vector" is just a zero vector. If initial_state_attention is True, we use initial_state to (re)calculate the previous step's context vector. We set this to False for train/eval mode (because we call attention_decoder once for all decoder steps) and True for decode mode (because we call attention_decoder once for each decoder step).

IMHO, the training and decode procedures would mismatch to some extend in such an implementation (Please correct me if I am wrong).

For example:

Let H be all encoder hidden states (a list of tensors), then,

In training/eval mode, every decode step use attention network only once:

Input: H, current_decoder_hidden_state, previous_coverage(None for the first decode step)

Output: next coverage, next context and attention weights( i.e. attn_dist in the code).

In decode mode, every step will apply attention mechanism twice:

(1) The first time:

Input: H, previous_decoder_hidden_state, previous_coverage (0s for the first decode step)

Output: modified previous context and next coverage (discard attention weights here)

(2) The second time:

Input: H, current_decoder_hidden_state, next coverage

Output: next context, attention weights (DO NOT update next coverage here)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions