Question about coverage mechanism implementation

I am trying to figure out the implementation of coverage mechanism, and after debug for a while, I still cannot understand why is the procedure of producing coverage vector in `decode` mode NOT the same as that in `training/eval` mode. 

Related code is here: [this line](https://github.com/abisee/pointer-generator/blob/b29e986f24fdd01a6b6d6008187c5c887f0be282/attention_decoder.py#L140)

> Note that this attention decoder passes each decoder input through a linear layer with the previous step's context vector to get a modified version of the input. If initial_state_attention is False, on the first decoder step the "previous context vector" is just a zero vector. If initial_state_attention is True, **we use initial_state to (re)calculate the previous step's context vector**. We set this to False for train/eval mode (because we call attention_decoder once for all decoder steps) and True for decode mode (because we call attention_decoder once for each decoder step).


IMHO, the training and decode procedures would mismatch to some extend in such an implementation (Please correct me if I am wrong).

For example:

Let `H` be all encoder hidden states (a list of tensors), then,

**In training/eval mode, every decode step use attention network only once:**

Input: `H`, `current_decoder_hidden_state`, `previous_coverage`(`None` for the first decode step)

Output: `next coverage`, `next context` and `attention weights`( i.e. `attn_dist` in the code).

**In decode mode, every step will apply attention mechanism twice:**

(1) The first time:

Input: `H`, `previous_decoder_hidden_state`, `previous_coverage` (`0`s for the first decode step)

Output: `modified previous context` and `next coverage` (discard `attention weights` here)

(2) The second time:

Input: `H`, `current_decoder_hidden_state`, `next coverage`

Output: `next context`, `attention weights` (DO NOT update `next coverage` here)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about coverage mechanism implementation #157

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about coverage mechanism implementation #157

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions