-
Notifications
You must be signed in to change notification settings - Fork 0
panic/netfilter/tools: taint buffer, xt match align, litmus #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Compute the taint string buffer length from the taint flag descriptors\nand formatting overhead to avoid truncation in verbose output.
Convert xt_DSCP/xt_HL/xt_RATEEST/xt_TCPMSS to register as matches with\nlowercase aliases and matchinfo structs, and sync associated UAPI headers\n(xt_*, ipt_*, ip6t_HL) to match definitions.
Refresh the poonce lock litmus test name and description and drop\nsmp_mb__after_spinlock to reflect the observable ordering.
📝 WalkthroughWalkthroughThe pull request refactors several netfilter xtables modules (DSCP, TTL/HL, TCPMSS, RATEEST) from target-based implementations to match-based implementations. Public API headers are updated with new struct definitions, enum values, and naming conventions. Additionally, kernel panic taint flag definitions are refactored using macro expansion, and a memory model test case is revised. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@include/uapi/linux/netfilter/xt_TCPMSS.h`:
- Around line 7-10: The struct xt_tcpmss_match_info is indented with spaces
instead of tabs; change the leading spaces before the struct definition and its
member lines to tabs to comply with Linux kernel style, ensuring the struct name
and member declarations (__u16 mss_min, mss_max; __u8 invert;) remain unchanged
and only whitespace indentation is replaced with tabs.
In `@net/netfilter/xt_TCPMSS.c`:
- Around line 53-68: The loop parsing TCP options can read past the options
buffer because it accesses op[i+1], op[i+2], and op[i+3] without ensuring those
indexes are < optlen; update the parsing in the loop (the code using variables
i, op, optlen, and constants TCPOPT_MSS/TCPOLEN_MSS) to validate bounds before
any multi-byte access: before reading op[i+1] require i+1 < optlen, and before
reading the MSS value require i + TCPOLEN_MSS <= optlen; if a
malformed/too-short option is detected, treat it as non-matching and advance or
break safely (e.g., stop parsing or skip the rest) rather than reading
out-of-bounds, and ensure the increment logic that currently does i += op[i+1] ?
: 1 is only used after confirming op[i+1] is valid and non-zero.
In `@tools/memory-model/litmus-tests/Z6.0`+pooncelock+poonceLock+pombonce.litmus:
- Line 1: The test header string "Z6.0+pooncelock+pooncelock+pombonce" does not
match the actual filename "Z6.0+pooncelock+poonceLock+pombonce.litmus"; update
one to match the other per LKMM conventions: either rename the file to
"Z6.0+pooncelock+pooncelock+pombonce.litmus" to match the header, or change the
header line to "Z6.0+pooncelock+poonceLock+pombonce" so it exactly matches the
filename (preserve case and spelling).
🧹 Nitpick comments (3)
include/uapi/linux/netfilter/xt_CONNMARK.h (1)
17-20: Consider namespacing the shift direction enum.The
D_SHIFT_LEFTandD_SHIFT_RIGHTidentifiers use a genericD_prefix that could conflict with other definitions in the global namespace. Consider using a more specific prefix likeXT_CONNMARK_SHIFT_LEFTfor consistency with the other enums in this header.Proposed fix
enum { - D_SHIFT_LEFT = 0, - D_SHIFT_RIGHT, + XT_CONNMARK_SHIFT_LEFT = 0, + XT_CONNMARK_SHIFT_RIGHT, };include/uapi/linux/netfilter_ipv4/ipt_ECN.h (1)
16-16: Minor: End guard comment inconsistent with header guard.The
#ifndefguard is_IPT_ECN_Hbut the end comment says/* IPT_ECN_H */(missing leading underscore).Suggested fix
-#endif /* IPT_ECN_H */ +#endif /* _IPT_ECN_H */net/netfilter/xt_TCPMSS.c (1)
77-94: Consider adding a checkentry function to validate match info.The registration lacks a
checkentrycallback. Without validation, userspace could configuremss_min > mss_max, resulting in a rule that never matches. A simple validation would improve robustness.Example checkentry implementation
static int tcpmss_mt_check(const struct xt_mtchk_param *par) { const struct xt_tcpmss_match_info *info = par->matchinfo; if (info->mss_min > info->mss_max) return -EINVAL; return 0; }Then add
.checkentry = tcpmss_mt_checkto both registration entries.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
include/uapi/linux/netfilter/xt_CONNMARK.hinclude/uapi/linux/netfilter/xt_DSCP.hinclude/uapi/linux/netfilter/xt_MARK.hinclude/uapi/linux/netfilter/xt_RATEEST.hinclude/uapi/linux/netfilter/xt_TCPMSS.hinclude/uapi/linux/netfilter_ipv4/ipt_ECN.hinclude/uapi/linux/netfilter_ipv4/ipt_TTL.hinclude/uapi/linux/netfilter_ipv6/ip6t_HL.hkernel/panic.cnet/netfilter/xt_DSCP.cnet/netfilter/xt_HL.cnet/netfilter/xt_RATEEST.cnet/netfilter/xt_TCPMSS.ctools/memory-model/litmus-tests/Z6.0+pooncelock+poonceLock+pombonce.litmus
🔇 Additional comments (36)
tools/memory-model/litmus-tests/Z6.0+pooncelock+poonceLock+pombonce.litmus (3)
4-8: LGTM!The result change from "Never" to "Sometimes" correctly reflects the weaker ordering guarantees after removing
smp_mb__after_spinlock(). The description clearly explains that lock-protected accesses across processes are not necessarily observed in order by a third process not holding that lock.
21-29: LGTM!The removal of
smp_mb__after_spinlock()is the key change enabling this test to demonstrate that spinlock ordering is not transitively visible to non-lock-holding observers. The critical section logic is correct.
40-40: LGTM!The exists clause correctly captures the ordering anomaly: P1 observes P0's write to
y, but P2 (not holding the lock) can still observex=0despitez=2indicating P2 executed after P1's critical section. This demonstrates the intended memory model behavior.kernel/panic.c (4)
653-668: LGTM!The
TAINT_FLAGmacro correctly generates designated initializers for thetaint_flagsarray, and the stringification of taint names for descriptions is appropriate.
672-678: LGTM!Clever use of enum to compute the total descriptor length at compile time. The
sizeof(#taint) - 1correctly accounts for the null terminator exclusion.
680-694: LGTM!The buffer size calculations correctly account for all formatting scenarios:
- Verbose mode: prefix + flags with separators + descriptions
- Non-verbose mode: prefix + one character per flag
- Untainted case: "Not tainted"
The
+1inTAINT_BUF_LENcorrectly reserves space for the null terminator.
725-735: LGTM!Using the computed
TAINT_BUF_LENinstead of a hardcoded orsizeof-based size is more robust and self-documenting. TheBUILD_BUG_ONat line 728 provides a compile-time sanity check ensuring the array size matches the expected flag count.include/uapi/linux/netfilter/xt_MARK.h (1)
2-16: LGTM - Clean UAPI header definition.The header properly defines the target and match info structures with appropriate types. The
xt_mark_tginfo2andxt_mark_mtinfo1structs follow established netfilter conventions.include/uapi/linux/netfilter/xt_CONNMARK.h (1)
22-35: LGTM - Well-structured target and match info definitions.The structs follow established netfilter conventions with appropriate type usage. The separation of
tginfo1,tginfo2(versioned target info), andmtinfo1(match info) is consistent with other xt_* modules.include/uapi/linux/netfilter_ipv6/ip6t_HL.h (1)
11-22: This UAPI change includes a backward compatibility wrapper. The oldip6t_HL.hheader was converted to a thin deprecated wrapper that includes the canonicalip6t_hl.h, allowing existing userspace code using the old name to continue compiling (with a deprecation pragma warning). This is a coordinated header cleanup to unify identically-named files and avoid case-insensitive filesystem issues, not a breaking change. The implementation aligns with kernel UAPI stability guarantees.Likely an incorrect or invalid review comment.
include/uapi/linux/netfilter/xt_TCPMSS.h (1)
2-12: The old API names (xt_tcpmss_infoandXT_TCPMSS_CLAMP_PMTU) do not appear anywhere in the current codebase, and no references to them exist in kernel modules or commit history. The current API usingxt_tcpmss_match_infois consistently referenced in kernel modules (net/netfilter/xt_tcpmss.candnet/netfilter/xt_TCPMSS.c). However, the repository's limited git history (2 commits) prevents confirming whether these symbols were actually removed in a prior commit or never existed in this repository.include/uapi/linux/netfilter/xt_RATEEST.h (3)
8-22: LGTM!The enum definitions are well-structured: flags use explicit bit shifts enabling bitwise combinations, and modes use sequential values appropriate for switch-based dispatch.
24-37: LGTM!The struct uses appropriate fixed-width types for UAPI stability. The
aligned(8)attribute on kernel-internal pointers ensures consistent ABI across architectures, and these fields are correctly excluded from userspace viausersizein the match registration.
2-3: No action required for header guard rename.The header guard change from
_XT_RATEEST_TARGET_Hto_XT_RATEEST_MATCH_Hdoes not impact UAPI stability. Linux kernel policy treats header guards as internal implementation details; backward compatibility is enforced at the ABI/API level (struct layouts, enums, symbols, sizes) rather than at the literal macro token level. The kernel's UAPI checker validates these real compatibility concerns and would flag actual breaking changes.Likely an incorrect or invalid review comment.
include/uapi/linux/netfilter_ipv4/ipt_ECN.h (1)
5-14: LGTM!Clean backward-compatible aliasing pattern. The
#definealias and anonymous enum maintain API compatibility for existing userspace code while delegating to the canonicalxt_ecn.hdefinitions.net/netfilter/xt_TCPMSS.c (1)
96-108: LGTM!Module initialization, cleanup, and metadata are correctly updated for the target-to-match migration.
net/netfilter/xt_RATEEST.c (4)
14-70: LGTM!The match function correctly handles delta/absolute modes and comparison operations. The checkentry validation ensures exactly one of ABS/REL is set, so
est2is only accessed when REL mode guarantees it was looked up.
72-114: LGTM!The checkentry function provides thorough validation: ensures mutually exclusive flags, requires at least one metric type, validates comparison modes, and properly cleans up
est1ifest2lookup fails.
116-123: LGTM!Destroy function correctly releases resources, mirroring the checkentry allocation pattern.
125-154: LGTM!Registration correctly uses
NFPROTO_UNSPECfor protocol-agnostic matching, andusersizeproperly excludes kernel-internal pointers from userspace copy operations.include/uapi/linux/netfilter/xt_DSCP.h (3)
16-18: LGTM!The DSCP constants are correctly defined per RFC2474: mask covers the upper 6 bits, shift is 2, and max value is 63 (0x3f).
11-12: Header guard rename does not affect userspace code.Header guards are preprocessing directives used only during compilation to prevent multiple inclusion. Changing the guard from
_XT_DSCP_TARGET_Hto_XT_DSCP_Hhas no impact on userspace code that includes this header—the header's API remains accessible. Header guard names are internal implementation details, not part of the public ABI.Likely an incorrect or invalid review comment.
20-30: UAPI struct field reordering may affect binary compatibility.The struct
xt_tos_match_infoshows field ordertos_mask,tos_value,invert, which differs from some earlier versions wheretos_valueandtos_maskwere ordered differently. If this file is part of a recent merge or cleanup (early 2025), confirm whether the field reordering affects existing userspace consumers or if backward compatibility measures (version guards, padding, or aliases) are already in place.net/netfilter/xt_HL.c (5)
1-23: LGTM: Module metadata and includes are properly structured.The header comments, includes, and module metadata are correctly updated to reflect the match-based implementation. The
MODULE_ALIASentries foript_ttlandip6t_hlprovide appropriate backward compatibility.
25-42: LGTM: TTL match function is correctly implemented.The function properly extracts the TTL from the IPv4 header and performs the comparison based on the configured mode. The fall-through to
return falsehandles any unexpected mode values safely.
44-61: LGTM: IPv6 Hop Limit match function is correctly implemented.The function correctly mirrors the IPv4 TTL matching logic for IPv6, using the appropriate header accessor and comparison constants.
82-93: LGTM: Module initialization and cleanup are correctly implemented.The init and exit functions properly use
xt_register_matchesandxt_unregister_matchesfor the match-based registration pattern.
63-80: The registration array is correctly structured.Struct definitions verified:
ipt_ttl_infoandip6t_hl_infoare properly defined in the UAPI headers with simple, straightforward field definitions (__u8 modeand__u8 ttl/hop_limit). Thesizeof()operations in the registration array are correct and will function as intended. No validation issues.net/netfilter/xt_DSCP.c (8)
1-22: LGTM: Module metadata and includes are properly updated.The header includes and module aliases are correctly updated for the match-based implementation. The lowercase aliases (
ipt_dscp,ip6t_dscp,ipt_tos,ip6t_tos) follow the naming convention for match modules.
24-31: LGTM: IPv4 DSCP match function is correctly implemented.The inversion logic using
^ !!info->invertis idiomatic and correctly handles the match negation case.
33-40: LGTM: IPv6 DSCP match function correctly mirrors the IPv4 implementation.
42-50: LGTM: DSCP validation function correctly bounds-checks the DSCP value.
52-62: LGTM: TOS match function handles both IPv4 and IPv6 correctly.Using
xt_family(par)to determine the protocol family and applying the mask/value comparison pattern is the correct approach for a dual-stack match implementation.
64-97: LGTM: Match registration array is correctly structured.The registration correctly separates DSCP (with bounds-checking via
dscp_mt_check) from TOS (which accepts any mask/value combination). Thetosentries appropriately use revision 1 to distinguish from legacy implementations.
99-110: LGTM: Module initialization and cleanup are correctly implemented.
14-14: Verify that the renamed headerxt_dscp.hexists and defines the required structures.The include path changed from the uppercase convention to lowercase. Ensure the header file exists at
include/uapi/linux/netfilter/xt_dscp.hand definesstruct xt_dscp_infoandstruct xt_tos_match_info.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| struct xt_tcpmss_match_info { | ||
| __u16 mss_min, mss_max; | ||
| __u8 invert; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation uses spaces instead of tabs.
Linux kernel coding style requires tabs for indentation. This struct uses 4 spaces.
Proposed fix
struct xt_tcpmss_match_info {
- __u16 mss_min, mss_max;
- __u8 invert;
+ __u16 mss_min, mss_max;
+ __u8 invert;
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| struct xt_tcpmss_match_info { | |
| __u16 mss_min, mss_max; | |
| __u8 invert; | |
| }; | |
| struct xt_tcpmss_match_info { | |
| __u16 mss_min, mss_max; | |
| __u8 invert; | |
| }; |
🤖 Prompt for AI Agents
In `@include/uapi/linux/netfilter/xt_TCPMSS.h` around lines 7 - 10, The struct
xt_tcpmss_match_info is indented with spaces instead of tabs; change the leading
spaces before the struct definition and its member lines to tabs to comply with
Linux kernel style, ensuring the struct name and member declarations (__u16
mss_min, mss_max; __u8 invert;) remain unchanged and only whitespace indentation
is replaced with tabs.
| for (i = 0; i < optlen; ) { | ||
| if (op[i] == TCPOPT_MSS | ||
| && (optlen - i) >= TCPOLEN_MSS | ||
| && op[i+1] == TCPOLEN_MSS) { | ||
| u_int16_t mssval; | ||
|
|
||
| mssval = (op[i+2] << 8) | op[i+3]; | ||
|
|
||
| return (mssval >= info->mss_min && | ||
| mssval <= info->mss_max) ^ info->invert; | ||
| } | ||
| if (op[i] < 2) | ||
| i++; | ||
| else | ||
| i += op[i+1] ? : 1; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential out-of-bounds read when parsing TCP options.
When op[i] >= 2 (multi-byte option), the code accesses op[i+1] at line 67 without checking that i+1 < optlen. If a malformed packet has a multi-byte option code at the last byte of the options area, this could read one byte past the buffer.
Suggested fix
for (i = 0; i < optlen; ) {
if (op[i] == TCPOPT_MSS
&& (optlen - i) >= TCPOLEN_MSS
&& op[i+1] == TCPOLEN_MSS) {
u_int16_t mssval;
mssval = (op[i+2] << 8) | op[i+3];
return (mssval >= info->mss_min &&
mssval <= info->mss_max) ^ info->invert;
}
if (op[i] < 2)
i++;
- else
- i += op[i+1] ? : 1;
+ else {
+ if (i + 1 >= optlen)
+ break;
+ i += op[i+1] ? : 1;
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (i = 0; i < optlen; ) { | |
| if (op[i] == TCPOPT_MSS | |
| && (optlen - i) >= TCPOLEN_MSS | |
| && op[i+1] == TCPOLEN_MSS) { | |
| u_int16_t mssval; | |
| mssval = (op[i+2] << 8) | op[i+3]; | |
| return (mssval >= info->mss_min && | |
| mssval <= info->mss_max) ^ info->invert; | |
| } | |
| if (op[i] < 2) | |
| i++; | |
| else | |
| i += op[i+1] ? : 1; | |
| } | |
| for (i = 0; i < optlen; ) { | |
| if (op[i] == TCPOPT_MSS | |
| && (optlen - i) >= TCPOLEN_MSS | |
| && op[i+1] == TCPOLEN_MSS) { | |
| u_int16_t mssval; | |
| mssval = (op[i+2] << 8) | op[i+3]; | |
| return (mssval >= info->mss_min && | |
| mssval <= info->mss_max) ^ info->invert; | |
| } | |
| if (op[i] < 2) | |
| i++; | |
| else { | |
| if (i + 1 >= optlen) | |
| break; | |
| i += op[i+1] ? : 1; | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@net/netfilter/xt_TCPMSS.c` around lines 53 - 68, The loop parsing TCP options
can read past the options buffer because it accesses op[i+1], op[i+2], and
op[i+3] without ensuring those indexes are < optlen; update the parsing in the
loop (the code using variables i, op, optlen, and constants
TCPOPT_MSS/TCPOLEN_MSS) to validate bounds before any multi-byte access: before
reading op[i+1] require i+1 < optlen, and before reading the MSS value require i
+ TCPOLEN_MSS <= optlen; if a malformed/too-short option is detected, treat it
as non-matching and advance or break safely (e.g., stop parsing or skip the
rest) rather than reading out-of-bounds, and ensure the increment logic that
currently does i += op[i+1] ? : 1 is only used after confirming op[i+1] is valid
and non-zero.
| @@ -1,12 +1,11 @@ | |||
| C Z6.0+pooncelock+poonceLock+pombonce | |||
| C Z6.0+pooncelock+pooncelock+pombonce | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Header/filename mismatch.
The header declares Z6.0+pooncelock+pooncelock+pombonce but the file is named Z6.0+pooncelock+poonceLock+pombonce.litmus. LKMM conventions require these to match exactly for tooling (herd7, klitmus) to work correctly.
Either rename the file to match the header or revert the header to preserve the original mixed-case naming.
🤖 Prompt for AI Agents
In `@tools/memory-model/litmus-tests/Z6.0`+pooncelock+poonceLock+pombonce.litmus
at line 1, The test header string "Z6.0+pooncelock+pooncelock+pombonce" does not
match the actual filename "Z6.0+pooncelock+poonceLock+pombonce.litmus"; update
one to match the other per LKMM conventions: either rename the file to
"Z6.0+pooncelock+pooncelock+pombonce.litmus" to match the header, or change the
header line to "Z6.0+pooncelock+poonceLock+pombonce" so it exactly matches the
filename (preserve case and spelling).
When forward-porting Rust Binder to 6.18, I neglected to take commit fb56fdf ("mm/list_lru: split the lock to per-cgroup scope") into account, and apparently I did not end up running the shrinker callback when I sanity tested the driver before submission. This leads to crashes like the following: ============================================ WARNING: possible recursive locking detected 6.18.0-mainline-maybe-dirty #1 Tainted: G IO -------------------------------------------- kswapd0/68 is trying to acquire lock: ffff956000fa18b0 (&l->lock){+.+.}-{2:2}, at: lock_list_lru_of_memcg+0x128/0x230 but task is already holding lock: ffff956000fa18b0 (&l->lock){+.+.}-{2:2}, at: rust_helper_spin_lock+0xd/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&l->lock); lock(&l->lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kswapd0/68: #0: ffffffff90d2e260 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x597/0x1160 #1: ffff956000fa18b0 (&l->lock){+.+.}-{2:2}, at: rust_helper_spin_lock+0xd/0x20 #2: ffffffff90cf3680 (rcu_read_lock){....}-{1:2}, at: lock_list_lru_of_memcg+0x2d/0x230 To fix this, remove the spin_lock() call from rust_shrink_free_page(). Cc: stable <stable@kernel.org> Fixes: eafedbc ("rust_binder: add Rust Binder driver") Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20251202-binder-shrink-unspin-v1-1-263efb9ad625@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…ked_inode()
In btrfs_read_locked_inode() we are calling btrfs_init_file_extent_tree()
while holding a path with a read locked leaf from a subvolume tree, and
btrfs_init_file_extent_tree() may do a GFP_KERNEL allocation, which can
trigger reclaim.
This can create a circular lock dependency which lockdep warns about with
the following splat:
[6.1433] ======================================================
[6.1574] WARNING: possible circular locking dependency detected
[6.1583] 6.18.0+ #4 Tainted: G U
[6.1591] ------------------------------------------------------
[6.1599] kswapd0/117 is trying to acquire lock:
[6.1606] ffff8d9b6333c5b8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1625]
but task is already holding lock:
[6.1633] ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60
[6.1646]
which lock already depends on the new lock.
[6.1657]
the existing dependency chain (in reverse order) is:
[6.1667]
-> #2 (fs_reclaim){+.+.}-{0:0}:
[6.1677] fs_reclaim_acquire+0x9d/0xd0
[6.1685] __kmalloc_cache_noprof+0x59/0x750
[6.1694] btrfs_init_file_extent_tree+0x90/0x100
[6.1702] btrfs_read_locked_inode+0xc3/0x6b0
[6.1710] btrfs_iget+0xbb/0xf0
[6.1716] btrfs_lookup_dentry+0x3c5/0x8e0
[6.1724] btrfs_lookup+0x12/0x30
[6.1731] lookup_open.isra.0+0x1aa/0x6a0
[6.1739] path_openat+0x5f7/0xc60
[6.1746] do_filp_open+0xd6/0x180
[6.1753] do_sys_openat2+0x8b/0xe0
[6.1760] __x64_sys_openat+0x54/0xa0
[6.1768] do_syscall_64+0x97/0x3e0
[6.1776] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[6.1784]
-> #1 (btrfs-tree-00){++++}-{3:3}:
[6.1794] lock_release+0x127/0x2a0
[6.1801] up_read+0x1b/0x30
[6.1808] btrfs_search_slot+0x8e0/0xff0
[6.1817] btrfs_lookup_inode+0x52/0xd0
[6.1825] __btrfs_update_delayed_inode+0x73/0x520
[6.1833] btrfs_commit_inode_delayed_inode+0x11a/0x120
[6.1842] btrfs_log_inode+0x608/0x1aa0
[6.1849] btrfs_log_inode_parent+0x249/0xf80
[6.1857] btrfs_log_dentry_safe+0x3e/0x60
[6.1865] btrfs_sync_file+0x431/0x690
[6.1872] do_fsync+0x39/0x80
[6.1879] __x64_sys_fsync+0x13/0x20
[6.1887] do_syscall_64+0x97/0x3e0
[6.1894] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[6.1903]
-> #0 (&delayed_node->mutex){+.+.}-{3:3}:
[6.1913] __lock_acquire+0x15e9/0x2820
[6.1920] lock_acquire+0xc9/0x2d0
[6.1927] __mutex_lock+0xcc/0x10a0
[6.1934] __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1944] btrfs_evict_inode+0x20b/0x4b0
[6.1952] evict+0x15a/0x2f0
[6.1958] prune_icache_sb+0x91/0xd0
[6.1966] super_cache_scan+0x150/0x1d0
[6.1974] do_shrink_slab+0x155/0x6f0
[6.1981] shrink_slab+0x48e/0x890
[6.1988] shrink_one+0x11a/0x1f0
[6.1995] shrink_node+0xbfd/0x1320
[6.1002] balance_pgdat+0x67f/0xc60
[6.1321] kswapd+0x1dc/0x3e0
[6.1643] kthread+0xff/0x240
[6.1965] ret_from_fork+0x223/0x280
[6.1287] ret_from_fork_asm+0x1a/0x30
[6.1616]
other info that might help us debug this:
[6.1561] Chain exists of:
&delayed_node->mutex --> btrfs-tree-00 --> fs_reclaim
[6.1503] Possible unsafe locking scenario:
[6.1110] CPU0 CPU1
[6.1411] ---- ----
[6.1707] lock(fs_reclaim);
[6.1998] lock(btrfs-tree-00);
[6.1291] lock(fs_reclaim);
[6.1581] lock(&delayed_node->mutex);
[6.1874]
*** DEADLOCK ***
[6.1716] 2 locks held by kswapd0/117:
[6.1999] #0: ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60
[6.1294] #1: ffff8d998344b0e0 (&type->s_umount_key#40){++++}- {3:3}, at: super_cache_scan+0x37/0x1d0
[6.1596]
stack backtrace:
[6.1183] CPU: 11 UID: 0 PID: 117 Comm: kswapd0 Tainted: G U 6.18.0+ #4 PREEMPT(lazy)
[6.1185] Tainted: [U]=USER
[6.1186] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
[6.1187] Call Trace:
[6.1187] <TASK>
[6.1189] dump_stack_lvl+0x6e/0xa0
[6.1192] print_circular_bug.cold+0x17a/0x1c0
[6.1194] check_noncircular+0x175/0x190
[6.1197] __lock_acquire+0x15e9/0x2820
[6.1200] lock_acquire+0xc9/0x2d0
[6.1201] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1204] __mutex_lock+0xcc/0x10a0
[6.1206] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1208] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1211] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1213] __btrfs_release_delayed_node.part.0+0x39/0x2f0
[6.1215] btrfs_evict_inode+0x20b/0x4b0
[6.1217] ? lock_acquire+0xc9/0x2d0
[6.1220] evict+0x15a/0x2f0
[6.1222] prune_icache_sb+0x91/0xd0
[6.1224] super_cache_scan+0x150/0x1d0
[6.1226] do_shrink_slab+0x155/0x6f0
[6.1228] shrink_slab+0x48e/0x890
[6.1229] ? shrink_slab+0x2d2/0x890
[6.1231] shrink_one+0x11a/0x1f0
[6.1234] shrink_node+0xbfd/0x1320
[6.1236] ? shrink_node+0xa2d/0x1320
[6.1236] ? shrink_node+0xbd3/0x1320
[6.1239] ? balance_pgdat+0x67f/0xc60
[6.1239] balance_pgdat+0x67f/0xc60
[6.1241] ? finish_task_switch.isra.0+0xc4/0x2a0
[6.1246] kswapd+0x1dc/0x3e0
[6.1247] ? __pfx_autoremove_wake_function+0x10/0x10
[6.1249] ? __pfx_kswapd+0x10/0x10
[6.1250] kthread+0xff/0x240
[6.1251] ? __pfx_kthread+0x10/0x10
[6.1253] ret_from_fork+0x223/0x280
[6.1255] ? __pfx_kthread+0x10/0x10
[6.1257] ret_from_fork_asm+0x1a/0x30
[6.1260] </TASK>
This is because:
1) The fsync task is holding an inode's delayed node mutex (for a
directory) while calling __btrfs_update_delayed_inode() and that needs
to do a search on the subvolume's btree (therefore read lock some
extent buffers);
2) The lookup task, at btrfs_lookup(), triggered reclaim with the
GFP_KERNEL allocation done by btrfs_init_file_extent_tree() while
holding a read lock on a subvolume leaf;
3) The reclaim triggered kswapd which is doing inode eviction for the
directory inode the fsync task is using as an argument to
btrfs_commit_inode_delayed_inode() - but in that call chain we are
trying to read lock the same leaf that the lookup task is holding
while calling btrfs_init_file_extent_tree() and doing the GFP_KERNEL
allocation.
Fix this by calling btrfs_init_file_extent_tree() after we don't need the
path anymore and release it in btrfs_read_locked_inode().
Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/linux-btrfs/6e55113a22347c3925458a5d840a18401a38b276.camel@linux.intel.com/
Fixes: 8679d26 ("btrfs: initialize inode::file_extent_tree after i_mode has been set")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The GPIO controller is configured as non-sleeping but it uses generic
pinctrl helpers which use a mutex for synchronization.
This can cause the following lockdep splat with shared GPIOs enabled on
boards which have multiple devices using the same GPIO:
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:591
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 12, name:
kworker/u16:0
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
6 locks held by kworker/u16:0/12:
#0: ffff0001f0018d48 ((wq_completion)events_unbound#2){+.+.}-{0:0},
at: process_one_work+0x18c/0x604
#1: ffff8000842dbdf0 (deferred_probe_work){+.+.}-{0:0}, at:
process_one_work+0x1b4/0x604
#2: ffff0001f18498f8 (&dev->mutex){....}-{4:4}, at:
__device_attach+0x38/0x1b0
#3: ffff0001f75f1e90 (&gdev->srcu){.+.?}-{0:0}, at:
gpiod_direction_output_raw_commit+0x0/0x360
#4: ffff0001f46e3db8 (&shared_desc->spinlock){....}-{3:3}, at:
gpio_shared_proxy_direction_output+0xd0/0x144 [gpio_shared_proxy]
#5: ffff0001f180ee90 (&gdev->srcu){.+.?}-{0:0}, at:
gpiod_direction_output_raw_commit+0x0/0x360
irq event stamp: 81450
hardirqs last enabled at (81449): [<ffff8000813acba4>]
_raw_spin_unlock_irqrestore+0x74/0x78
hardirqs last disabled at (81450): [<ffff8000813abfb8>]
_raw_spin_lock_irqsave+0x84/0x88
softirqs last enabled at (79616): [<ffff8000811455fc>]
__alloc_skb+0x17c/0x1e8
softirqs last disabled at (79614): [<ffff8000811455fc>]
__alloc_skb+0x17c/0x1e8
CPU: 2 UID: 0 PID: 12 Comm: kworker/u16:0 Not tainted
6.19.0-rc4-next-20260105+ #11975 PREEMPT
Hardware name: Hardkernel ODROID-M1 (DT)
Workqueue: events_unbound deferred_probe_work_func
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0x90/0xd0
dump_stack+0x18/0x24
__might_resched+0x144/0x248
__might_sleep+0x48/0x98
__mutex_lock+0x5c/0x894
mutex_lock_nested+0x24/0x30
pinctrl_get_device_gpio_range+0x44/0x128
pinctrl_gpio_direction+0x3c/0xe0
pinctrl_gpio_direction_output+0x14/0x20
rockchip_gpio_direction_output+0xb8/0x19c
gpiochip_direction_output+0x38/0x94
gpiod_direction_output_raw_commit+0x1d8/0x360
gpiod_direction_output_nonotify+0x7c/0x230
gpiod_direction_output+0x34/0xf8
gpio_shared_proxy_direction_output+0xec/0x144 [gpio_shared_proxy]
gpiochip_direction_output+0x38/0x94
gpiod_direction_output_raw_commit+0x1d8/0x360
gpiod_direction_output_nonotify+0x7c/0x230
gpiod_configure_flags+0xbc/0x480
gpiod_find_and_request+0x1a0/0x574
gpiod_get_index+0x58/0x84
devm_gpiod_get_index+0x20/0xb4
devm_gpiod_get_optional+0x18/0x30
rockchip_pcie_probe+0x98/0x380
platform_probe+0x5c/0xac
really_probe+0xbc/0x298
Fixes: 936ee26 ("gpio/rockchip: add driver for rockchip gpio")
Cc: stable@vger.kernel.org
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/all/d035fc29-3b03-4cd6-b8ec-001f93540bc6@samsung.com/
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20260106090011.21603-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
…te in qfq_reset `qfq_class->leaf_qdisc->q.qlen > 0` does not imply that the class itself is active. Two qfq_class objects may point to the same leaf_qdisc. This happens when: 1. one QFQ qdisc is attached to the dev as the root qdisc, and 2. another QFQ qdisc is temporarily referenced (e.g., via qdisc_get() / qdisc_put()) and is pending to be destroyed, as in function tc_new_tfilter. When packets are enqueued through the root QFQ qdisc, the shared leaf_qdisc->q.qlen increases. At the same time, the second QFQ qdisc triggers qdisc_put and qdisc_destroy: the qdisc enters qfq_reset() with its own q->q.qlen == 0, but its class's leaf qdisc->q.qlen > 0. Therefore, the qfq_reset would wrongly deactivate an inactive aggregate and trigger a null-deref in qfq_deactivate_agg: [ 0.903172] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 0.903571] #PF: supervisor write access in kernel mode [ 0.903860] #PF: error_code(0x0002) - not-present page [ 0.904177] PGD 10299b067 P4D 10299b067 PUD 10299c067 PMD 0 [ 0.904502] Oops: Oops: 0002 [#1] SMP NOPTI [ 0.904737] CPU: 0 UID: 0 PID: 135 Comm: exploit Not tainted 6.19.0-rc3+ #2 NONE [ 0.905157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 [ 0.905754] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2)) [ 0.906046] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0 Code starting with the faulting instruction =========================================== 0: 0f 84 4d 01 00 00 je 0x153 6: 48 89 70 18 mov %rsi,0x18(%rax) a: 8b 4b 10 mov 0x10(%rbx),%ecx d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 14: 48 8b 78 08 mov 0x8(%rax),%rdi 18: 48 d3 e2 shl %cl,%rdx 1b: 48 21 f2 and %rsi,%rdx 1e: 48 2b 13 sub (%rbx),%rdx 21: 48 8b 30 mov (%rax),%rsi 24: 48 d3 ea shr %cl,%rdx 27: 8b 4b 18 mov 0x18(%rbx),%ecx ... [ 0.907095] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246 [ 0.907368] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000 [ 0.907723] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.908100] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000 [ 0.908451] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000 [ 0.908804] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880 [ 0.909179] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000 [ 0.909572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.909857] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0 [ 0.910247] PKRU: 55555554 [ 0.910391] Call Trace: [ 0.910527] <TASK> [ 0.910638] qfq_reset_qdisc (net/sched/sch_qfq.c:357 net/sched/sch_qfq.c:1485) [ 0.910826] qdisc_reset (include/linux/skbuff.h:2195 include/linux/skbuff.h:2501 include/linux/skbuff.h:3424 include/linux/skbuff.h:3430 net/sched/sch_generic.c:1036) [ 0.911040] __qdisc_destroy (net/sched/sch_generic.c:1076) [ 0.911236] tc_new_tfilter (net/sched/cls_api.c:2447) [ 0.911447] rtnetlink_rcv_msg (net/core/rtnetlink.c:6958) [ 0.911663] ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6861) [ 0.911894] netlink_rcv_skb (net/netlink/af_netlink.c:2550) [ 0.912100] netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) [ 0.912296] ? __alloc_skb (net/core/skbuff.c:706) [ 0.912484] netlink_sendmsg (net/netlink/af_netlink.c:1894) [ 0.912682] sock_write_iter (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1) net/socket.c:1195 (discriminator 1)) [ 0.912880] vfs_write (fs/read_write.c:593 fs/read_write.c:686) [ 0.913077] ksys_write (fs/read_write.c:738) [ 0.913252] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) [ 0.913438] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131) [ 0.913687] RIP: 0033:0x424c34 [ 0.913844] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 2d 44 09 00 00 74 13 b8 01 00 00 00 0f 05 9 Code starting with the faulting instruction =========================================== 0: 89 02 mov %eax,(%rdx) 2: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax 9: eb bd jmp 0xffffffffffffffc8 b: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 12: 00 00 00 15: 90 nop 16: f3 0f 1e fa endbr64 1a: 80 3d 2d 44 09 00 00 cmpb $0x0,0x9442d(%rip) # 0x9444e 21: 74 13 je 0x36 23: b8 01 00 00 00 mov $0x1,%eax 28: 0f 05 syscall 2a: 09 .byte 0x9 [ 0.914807] RSP: 002b:00007ffea1938b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 0.915197] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000424c34 [ 0.915556] RDX: 000000000000003c RSI: 000000002af378c0 RDI: 0000000000000003 [ 0.915912] RBP: 00007ffea1938bc0 R08: 00000000004b8820 R09: 0000000000000000 [ 0.916297] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffea1938d28 [ 0.916652] R13: 00007ffea1938d38 R14: 00000000004b3828 R15: 0000000000000001 [ 0.917039] </TASK> [ 0.917158] Modules linked in: [ 0.917316] CR2: 0000000000000000 [ 0.917484] ---[ end trace 0000000000000000 ]--- [ 0.917717] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2)) [ 0.917978] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0 Code starting with the faulting instruction =========================================== 0: 0f 84 4d 01 00 00 je 0x153 6: 48 89 70 18 mov %rsi,0x18(%rax) a: 8b 4b 10 mov 0x10(%rbx),%ecx d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 14: 48 8b 78 08 mov 0x8(%rax),%rdi 18: 48 d3 e2 shl %cl,%rdx 1b: 48 21 f2 and %rsi,%rdx 1e: 48 2b 13 sub (%rbx),%rdx 21: 48 8b 30 mov (%rax),%rsi 24: 48 d3 ea shr %cl,%rdx 27: 8b 4b 18 mov 0x18(%rbx),%ecx ... [ 0.918902] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246 [ 0.919198] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000 [ 0.919559] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.919908] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000 [ 0.920289] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000 [ 0.920648] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880 [ 0.921014] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000 [ 0.921424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.921710] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0 [ 0.922097] PKRU: 55555554 [ 0.922240] Kernel panic - not syncing: Fatal exception [ 0.922590] Kernel Offset: disabled Fixes: 0545a30 ("pkt_sched: QFQ - quick fair queue scheduler") Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260106034100.1780779-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Summary
Testing
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.