Skip to content

​Conv Processor #1

@BUPT-BownZ

Description

@BUPT-BownZ

Your team's work is very interesting and addresses a gap in multi-scale research for pathological image processing. I have a question: When your inputs are x5, x10, and x20 magnification patches, the CLAM-ResNet-derived features would likely have dimensions like (1, 100, 1024), (1, 200, 1024), and (1, 400, 1024). How do you ensure that the ​height (H) and ​width (W) of these features are aligned before feeding them into the Transformer? For instance:

The (1, 100, 1024) feature, after padding, might be reshaped into a 10x10 grid.
The (1, 400, 1024) feature might become a 20x20 grid.
If their spatial dimensions (H, W) differ, how does the subsequent ​Conv Processor handle multi-scale feature fusion?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions