netdev - [RFC] net: inet: Potential sleep in atomic context in inet_twsk_hashdance_schedule on PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <3edfd3ac-8127-41c2-afc5-3967b8b45410@kzalloc.com>
Date: Tue, 19 Aug 2025 01:46:10 +0900
From: Yunseong Kim <ysk@...lloc.com>
To: Eric Dumazet <edumazet@...gle.com>, "David S. Miller"
 <davem@...emloft.net>, Florian Westphal <fw@...len.de>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Clark Williams <clrkwllms@...nel.org>, Steven Rostedt <rostedt@...dmis.org>,
 LKML <linux-kernel@...r.kernel.org>, linux-rt-devel@...ts.linux.dev,
 netdev@...r.kernel.org
Subject: [RFC] net: inet: Potential sleep in atomic context in
 inet_twsk_hashdance_schedule on PREEMPT_RT

Hi everyone,

I'm looking at the inet_twsk_hashdance_schedule() function in
net/ipv4/inet_timewait_sock.c and noticed a pattern that could be
problematic for PREEMPT_RT kernels.

The code in question is:

 void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
                                   struct sock *sk,
                                   struct inet_hashinfo *hashinfo,
                                   int timeo)
 {
     ...
     local_bh_disable();
     spin_lock(&bhead->lock);
     spin_lock(&bhead2->lock);
     ...
 }

The sequence local_bh_disable() followed by spin_lock(), In a PREEMPT_RT
enabled kernel, spin_lock() is replaced by a mutex that an sleep.
However, local_bh_disable() creates an atomic context by incrementing
preempt_count, where sleeping is forbidden.

If the spinlock is contended, this code would attempt to sleep inside an
atomic context, leading to a "BUG: sleeping function called from invalid
context" kernel panic.

While this pattern is correct for non-RT kernels (and is essentially what
spin_lock_bh() expands to), it causes critical issues in an RT environment.

A possible fix would be to replace this sequence with calls to
spin_lock_bh(). Given that two separate locks are acquired, the most direct
change would look like this:

 // local_bh_disable();  <- removed
 spin_lock_bh(&bhead->lock);
 // The second lock is already protected from BH by the first one
 spin_lock(&bhead2->lock);

Or, to be more explicit and safe if the logic ever changes:

 spin_lock_bh(&bhead->lock);
 spin_lock_bh(&bhead2->lock);

However, since spin_lock_bh() on the first lock already disables bottom
halves, the second lock only needs to be a plain spin_lock().

I would like to ask for your thoughts on this. Is my understanding correct,
and would a patch to change this locking pattern be welcome?

It's possible the PREEMPT_RT implications were not a primary concern at
the time.

Thanks for your time and guidance.

Best regards,
Yunseong Kim