[Enhance] Simplify the pipeline of training with hf model #1461

HAOCHENYE · 2026-01-28T08:25:32Z

No description provided.

ghstack-source-id: 4ba8384 Pull-Request: InternLM#1449

ghstack-source-id: 8951a52 Pull-Request: InternLM#1450

ghstack-source-id: 4b149b4 Pull-Request: InternLM#1451

The previous `clean_param_name` only matches the "._checkpoint_wrapped_module" which starts with **.**, however, for layers wrapper with checkpoint wrapper, the layer name start with "_checkpoint_wrapped_module" cannot be cleaned for the missing prefix . ghstack-source-id: 9c8da53 Pull-Request: InternLM#1452

…educe code duplication ghstack-source-id: cf0d79c Pull-Request: InternLM#1453

ghstack-source-id: 834b0f1 Pull-Request: InternLM#1454

ghstack-source-id: a221d63 Pull-Request: InternLM#1455

ghstack-source-id: 4b7f8e7 Pull-Request: InternLM#1456

ghstack-source-id: 383761a Pull-Request: InternLM#1457

`torch.autograd.grad` will raise an error if any tensor of `input` does not require gradient, e.g, the frozen `lm_head`. This commit just fix it with a simple control flow. ghstack-source-id: ec3804f Pull-Request: InternLM#1458

ghstack-source-id: ef53340 Pull-Request: InternLM#1460

HAOCHENYE force-pushed the yehc/training_with_hf branch 3 times, most recently from 0a929ca to 84bbe79 Compare January 28, 2026 08:44

HAOCHENYE added 11 commits January 28, 2026 09:28

[Refactor] Replace HF key remapping hack with hf_key_mapping config

f651a26

ghstack-source-id: 4ba8384 Pull-Request: InternLM#1449

[Enhance] Remove float8_handler in fully_shared

121550d

ghstack-source-id: 8951a52 Pull-Request: InternLM#1450

[Enhance] Add copy interface for SequenceContext

40135b4

ghstack-source-id: 4b149b4 Pull-Request: InternLM#1451

[Refactor] Extract common text context and labels building logic to r…

f1e1cf9

…educe code duplication ghstack-source-id: cf0d79c Pull-Request: InternLM#1453

[Enhance] Support using custom collator in DataloaderConfig

8a35b96

ghstack-source-id: 834b0f1 Pull-Request: InternLM#1454

[Enhance] Add data interface for sequence context

26004f5

ghstack-source-id: a221d63 Pull-Request: InternLM#1455

[Enhance] Replace annotation of model_cfg with XTunerBaseModelConfig

e6cb61f

ghstack-source-id: 4b7f8e7 Pull-Request: InternLM#1456

[Enhance] update toy tokenizer

8663e7a

ghstack-source-id: 383761a Pull-Request: InternLM#1457

[Feature] Provide naive fully shard in BaseModel`

df0fa00

ghstack-source-id: ef53340 Pull-Request: InternLM#1460

HAOCHENYE force-pushed the yehc/training_with_hf branch from 84bbe79 to df0fa00 Compare January 28, 2026 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhance] Simplify the pipeline of training with hf model #1461

[Enhance] Simplify the pipeline of training with hf model #1461

Uh oh!

HAOCHENYE commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Enhance] Simplify the pipeline of training with hf model #1461

Are you sure you want to change the base?

[Enhance] Simplify the pipeline of training with hf model #1461

Uh oh!

Conversation

HAOCHENYE commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant