|
###============== Point-text Matching ===================### |
|
text_input_ids_world = concat_all_gather(text_tokens.input_ids) # [bs, 32] |
|
text_attention_mask_world = concat_all_gather(text_tokens.attention_mask) # [bs, 32] |
|
point_embeds_world = all_gather_with_grad(point_embeds) # [bs, 257, 1408] |
|
with torch.no_grad(): |
|
sim_t2p[:, rank * bs : rank * bs + bs].fill_diagonal_(-10000) |
|
sim_p2t[:, rank * bs : rank * bs + bs].fill_diagonal_(-10000) |
|
|
|
weights_t2p = F.softmax(sim_t2p, dim=1) |
|
weights_p2t = F.softmax(sim_p2t, dim=1) |
|
|
|
# select a negative point for each text |
|
point_embeds_neg = [] |
|
for b in range(bs): |
|
neg_idx = torch.multinomial(weights_t2p[b], 1).item() |
|
point_embeds_neg.append(point_embeds_world[neg_idx]) |
|
point_embeds_neg = torch.stack(point_embeds_neg, dim=0) |
|
|
|
# select a negative text for each point |
|
text_ids_neg = [] |
|
text_atts_neg = [] |
|
for b in range(bs): |
|
neg_idx = torch.multinomial(weights_p2t[b], 1).item() |
|
text_ids_neg.append(text_input_ids_world[neg_idx]) |
|
text_atts_neg.append(text_attention_mask_world[neg_idx]) |
|
|
|
text_ids_neg = torch.stack(text_ids_neg, dim=0) |
|
text_atts_neg = torch.stack(text_atts_neg, dim=0) |
The neg_idx seems to select the most similar point sample for each text sample, and the most similar text sample for each point sample.
Why the "most similar" instead of "least similar"?
GPT4Point/lavis/models/gpt4point_models/gpt4point_qformer.py
Lines 156 to 183 in 3ed52d9
The
neg_idxseems to select the most similar point sample for each text sample, and the most similar text sample for each point sample.Why the "most similar" instead of "least similar"?