​Conv Processor

Your team's work is very interesting and addresses a gap in multi-scale research for pathological image processing. I have a question: When your inputs are x5, x10, and x20 magnification patches, the CLAM-ResNet-derived features would likely have dimensions like (1, 100, 1024), (1, 200, 1024), and (1, 400, 1024). How do you ensure that the ​height (H) and ​width (W) of these features are aligned before feeding them into the Transformer? For instance:

The (1, 100, 1024) feature, after padding, might be reshaped into a 10x10 grid.
The (1, 400, 1024) feature might become a 20x20 grid.
If their spatial dimensions (H, W) differ, how does the subsequent ​Conv Processor handle multi-scale feature fusion?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conv Processor #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

​Conv Processor #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Conv Processor #1