Skip to content

feat(pi_lock): feat pi_lock and fix lock#1926

Merged
fslongjin merged 9 commits into
DragonOS-Community:masterfrom
oeasy1412:fix-lock
May 28, 2026
Merged

feat(pi_lock): feat pi_lock and fix lock#1926
fslongjin merged 9 commits into
DragonOS-Community:masterfrom
oeasy1412:fix-lock

Conversation

@oeasy1412
Copy link
Copy Markdown
Member

@oeasy1412 oeasy1412 commented May 21, 2026

1. 引入 pi_lock 替代 inner_lock,对齐 Linux task_rq_lock() 锁序

原实现使用 RwLock<InnerSchedInfo> 保护进程状态、优先级、调度策略,锁粒度过大且与 Linux 语义不符。

  • 状态字段原子化:移除 InnerSchedInfo(含 state + sleep),新增 state_atomic: AtomicU32,使用与 Linux 一致的位编码(TASK_RUNNING=0x0000, TASK_INTERRUPTIBLE=0x0001, TASK_UNINTERRUPTIBLE=0x0002, TASK_STOPPED=0x0004, TASK_DEAD_MARKER=0x0080,退出码存高 20 位)。
  • 新增 pi_lock: SpinLock<PiProtected>:保护 cpus_allowednr_cpus_allowed,集中管理受 pi_lock 保护的字段。
  • 调度策略与优先级原子化sched_policy: RwLock<SchedPolicy>AtomicU8prio_data: RwLock<PrioData> → 拆分为 prio: AtomicI32static_prio: AtomicI32normal_prio: AtomicI32
  • 锁序规范pi_lock → rq_lock(对齐 Linux task_rq_lock()),释放时先 rq_lockpi_lock

2. 修复 rwlock 抢占/中断顺序

read_irqsave/write_irqsave/upgradeable_read_irqsave 在自旋循环内部每次迭代都重新关中断,中断状态不一致。

  • 在循环外先 save_and_disable_irq() + preempt_disable(),循环内仅尝试获取锁。
  • 修复 Guard 转换(downgrade_to_read):使用 mem::forget(self) 跳过原 Guard 的 preempt_enable,由新 Guard 接管。
  • 修正 Drop 顺序:所有 Guard 先恢复中断(irq_guard.take()),再启用抢占(preempt_enable())。

3. 修复 spinlock 解锁顺序,对齐 Linux spin_unlock_irqrestore / spin_unlock_bh

  • 移除 SpinLock::unlock() 中的 preempt_enable(),移至 SpinLockGuard::drop() 中,确保顺序为:先解锁 → 恢复中断 → 启用抢占。
  • 调整 SpinLockBhGuard 字段顺序为 guard 在前、bh 在后,确保 Rust Drop 顺序为:先解锁 → 恢复 BH(对齐 spin_unlock_bh)。

4. 重构 __schedule() 对齐 Linux 信号检查语义

  • 移除 is_mark_sleep 判断(原 InnerSchedInfo.sleep 标志)。
  • 对标 smp_mb__after_spinlock() 语义:获取 rq_lock 后插入 fence(Ordering::SeqCst)
  • 调整 计数位置:nr_uninterruptible++deactivate_task 之前;nr_iowait++deactivate_task 之后。
  • 调整 activate_task/deactivate_task:移除其中的 nr_iowait/nr_uninterruptible/IDLE_CPUS 维护,移至调用方。

5. 修复 sched_fork() 子进程策略/优先级继承

  • 子进程继承父进程的 normal_priostatic_priopolicy
  • 使用原子操作替代读写锁。

6. 修复 copy_flags() 子进程标志位继承

子进程不再继承父进程运行时状态标志(NEED_SCHEDULEEXITINGWAKEKILLSIGNALEDNEED_MIGRATE 等),仅继承 RANDOMIZE(ASLR 配置)。KTHREAD 标志由创建参数显式设置。

7. 函数重命名(修复拼写)

  • check_preempt_currnetcheck_preempt_current
  • wakeup_new_taskwake_up_new_task

8. Nix 构建系统:新增 GDB 调试支持

  • QEMU 动态分配 GDB 端口,支持 QEMU_GDB_WAIT=1 暂停等待连接。
  • 新增 gdb-${target} app/package,使用 rust-gdb(x86_64)或 gdb-multiarch(其他架构)。
  • 新增 dunitest 测试框架集成。

@fslongjin
Copy link
Copy Markdown
Member

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19d239331a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread kernel/src/process/mod.rs Outdated
Comment on lines +617 to +619
// 正在远端 CPU 运行 → 发 IPI 让其尽快进 __schedule(),在那里 deactivate
if is_current && !update_clock {
kick_cpu(target_cpu).ok();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge 在远端 stop 当前任务前先从 rq 摘除或走阻塞调度路径

stop_task() 对“目标任务正运行在远端 CPU”的分支只发送 kick_cpu(),并依赖 __schedule() 中后续 deactivate(见注释“在那里 deactivate”);但中断路径进入的是 __schedule(SM_PREEMPT),而该路径不会执行非可运行任务的 deactivate 逻辑(deactivate 仅在 !SM_MASK_PREEMPT 分支)。结果是被置为 Stopped 的当前任务仍可能留在 runqueue 上并再次被选中运行,导致 stop 语义失效(典型场景:对正在远端运行的线程发送作业控制 stop 信号)。

Useful? React with 👍 / 👎.

@oeasy1412
Copy link
Copy Markdown
Member Author

1. Introduce pi_lock to replace inner_lock, aligning with the Linux task_rq_lock() lock order

The original implementation used RwLock<InnerSchedInfo> to protect process state, priority, and scheduling policy, resulting in overly coarse locking and inconsistency with Linux semantics.

  • Atomic state field: Remove InnerSchedInfo (which contained state + sleep) and add state_atomic: AtomicU32, using the same bit encoding as Linux (TASK_RUNNING=0x0000, TASK_INTERRUPTIBLE=0x0001, TASK_UNINTERRUPTIBLE=0x0002, TASK_STOPPED=0x0004, TASK_DEAD_MARKER=0x0080, with the exit code stored in the upper 20 bits).
  • New pi_lock: SpinLock<PiProtected>: Protects cpus_allowed and nr_cpus_allowed, centralizing fields guarded by pi_lock.
  • Atomic scheduling policy and priority: sched_policy: RwLock<SchedPolicy>AtomicU8; prio_data: RwLock<PrioData> → split into prio: AtomicI32, static_prio: AtomicI32, normal_prio: AtomicI32.
  • Lock ordering discipline: pi_lock → rq_lock (aligning with Linux task_rq_lock()). On release, rq_lock is dropped before pi_lock.

2. Fix rwlock preemption/interrupt ordering

The original read_irqsave/write_irqsave/upgradeable_read_irqsave re-disabled interrupts on every iteration of the spin loop, leading to inconsistent interrupt states.

  • Perform save_and_disable_irq() + preempt_disable() outside the loop, and only attempt to acquire the lock inside the loop.
  • Fix guard conversion (downgrade_to_read): Use mem::forget(self) to skip the original guard’s preempt_enable, letting the new guard take over.
  • Correct Drop order: All guards restore interrupts first (irq_guard.take()) and then re-enable preemption (preempt_enable()).

3. Fix spinlock unlock order, aligning with Linux spin_unlock_irqrestore / spin_unlock_bh

  • Remove preempt_enable() from SpinLock::unlock() and move it to SpinLockGuard::drop() to guarantee the order: unlock → restore interrupts → enable preemption.
  • Reorder fields in SpinLockBhGuard so that guard comes before bh, ensuring Rust’s Drop order is: unlock → restore BH (aligning with spin_unlock_bh).

4. Refactor __schedule() to align with Linux signal-check semantics

  • Remove the is_mark_sleep check (the old InnerSchedInfo.sleep flag).
  • Match smp_mb__after_spinlock() semantics: Insert fence(Ordering::SeqCst) after acquiring rq_lock.
  • Reposition counters: nr_uninterruptible++ before deactivate_task; nr_iowait++ after deactivate_task.
  • Adjust activate_task/deactivate_task: Remove the nr_iowait/nr_uninterruptible/IDLE_CPUS bookkeeping from these functions and move it to the call sites.

5. Fix child policy/priority inheritance in sched_fork()

  • The child process inherits the parent’s normal_prio, static_prio, and policy.
  • Use atomic operations instead of reader-writer locks.

6. Fix child flag inheritance in copy_flags()

The child process no longer inherits the parent’s run-time state flags (such as NEED_SCHEDULE, EXITING, WAKEKILL, SIGNALED, NEED_MIGRATE), and only inherits RANDOMIZE (ASLR configuration). The KTHREAD flag is set explicitly through the creation parameters.

7. Rename functions (fix spelling)

  • check_preempt_currnetcheck_preempt_current
  • wakeup_new_taskwake_up_new_task

8. Nix build system: Add GDB debugging support

  • QEMU dynamically allocates a GDB port, with support for QEMU_GDB_WAIT=1 to pause and wait for a connection.
  • Add gdb-${target} app/package, using rust-gdb (x86_64) or gdb-multiarch (other architectures).
  • Add integration with the dunitest test framework.

@fslongjin
Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

} else if !update_clock {
// 正在远端 CPU 运行:dequeue + kick
// 此处异步 dequeue,远端 CPU 收到 kick 后 __schedule(SM_PREEMPT) 时
// stopped task 已不在红黑树中,pick_next_task 不会选中它
rq.deactivate_task(
pcb.clone(),
DequeueFlag::DEQUEUE_STOPPED | DequeueFlag::DEQUEUE_NOCLOCK,
);

P1 Badge 避免远端 current 被重复出队

当 SIGSTOP/作业控制停止命中正在其他 CPU 上运行的任务时,这里先把远端 rq.current() 从运行队列出队并发送 kick;但远端 CPU 随后进入 __schedule(SM_NONE) 时会看到同一个任务仍是 Stopped,并在 kernel/src/sched/mod.rs 的非抢占路径再次无条件调用 rq.deactivate_task(prev, DEQUEUE_SLEEP | ...)。这会对同一个调度实体执行两次 dequeue,导致 nr_running/CFS 树等运行队列状态下溢或损坏;远端 current 应只标记需要调度并 kick,让它在自己的 __schedule() 路径完成唯一一次出队。

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@fslongjin
Copy link
Copy Markdown
Member

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9694d3eb21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread kernel/src/process/mod.rs Outdated
Copy link
Copy Markdown
Member

补充一个非阻塞的构建配置 review 点:本 PR 把 flake.nix 里的 qemuConfig.debug 默认从 false 改成了 true,这会让普通 nix run .#start-* / yolo 启动也默认打开 QEMU GDB TCP stub。新增 gdb-${target} 调试能力本身是合理的,但默认启用调试端口会改变普通运行、CI 和安全边界。

建议保持默认 debug = false,把 GDB stub 做成显式 opt-in(例如单独 debug app、参数或环境配置),或者拆成单独 PR 讨论。这个点和本轮调度/flags 根因修复解耦,不需要混在内核修复里。

oeasy1412 and others added 8 commits May 27, 2026 17:08
Signed-off-by: aLinChe <1129332011@qq.com>
Signed-off-by: aLinChe <1129332011@qq.com>
Signed-off-by: aLinChe <1129332011@qq.com>
Signed-off-by: aLinChe <1129332011@qq.com>
Signed-off-by: aLinChe <1129332011@qq.com>
Signed-off-by: aLinChe <1129332011@qq.com>
Signed-off-by: longjin <longjin@dragonos.org>
Signed-off-by: longjin <longjin@dragonos.org>
Signed-off-by: aLinChe <1129332011@qq.com>
@fslongjin fslongjin merged commit b1db90d into DragonOS-Community:master May 28, 2026
15 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants