netdev - Re: Incomplete fix for recent bug in tc / hfsc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5e4490da-3f6c-4331-af9c-0e6d32b6fc75@gmail.com>
Date: Tue, 24 Jun 2025 12:43:27 +0200
From: Lion Ackermann <nnamrec@...il.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: netdev@...r.kernel.org, Jamal Hadi Salim <jhs@...atatu.com>,
 Jiri Pirko <jiri@...nulli.us>
Subject: Re: Incomplete fix for recent bug in tc / hfsc

Hi,

On 6/24/25 11:24 AM, Lion Ackermann wrote:
> Hi,
> 
> On 6/24/25 6:41 AM, Cong Wang wrote:
>> On Mon, Jun 23, 2025 at 12:41:08PM +0200, Lion Ackermann wrote:
>>> Hello,
>>>
>>> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
>>> incomplete:
>>>     sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
>>>     https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
>>>
>>> This patch also included a test which landed:
>>>     selftests/tc-testing: Add an HFSC qlen accounting test
>>>
>>> Basically running the included test case on a sanitizer kernel or with
>>> slub_debug=P will directly reveal the UAF:
>>
>> Interesting, I have SLUB debugging enabled in my kernel config too:
>>
>> CONFIG_SLUB_DEBUG=y
>> CONFIG_SLUB_DEBUG_ON=y
>> CONFIG_SLUB_RCU_DEBUG=y
>>
>> But I didn't catch this bug.
>>  
> 
> Technically the class deletion step which triggered the sanitizer was not
> present in your testcase. The testcase only left the stale pointer which was
> never accessed though.
> 
>>> To be completely honest I do not quite understand the rationale behind the
>>> original patch. The problem is that the backlog corruption propagates to
>>> the parent _before_ parent is even expecting any backlog updates.
>>> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
>>> Because HFSC is messing with the backlog before the enqueue completed, 
>>> DRR will simply make the class active even though it should have already
>>> removed the class from the active list due to qdisc_tree_backlog_flush.
>>> This leaves the stale class in the active list and causes the UAF.
>>>
>>> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
>>> the common case. HFSC calling dequeue in the enqueue handler violates
>>> expectations. In order to fix this either HFSC has to stop using dequeue or
>>> all classful qdiscs have to be updated to catch this corner case where
>>> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
>>> could signal enqueue failure if it sees child dequeue dropping packets to
>>> zero? I am not sure how this all plays out with the re-entrant case of
>>> netem though.
>>
>> I think this may be the same bug report from Mingi in the security
>> mailing list. I will take a deep look after I go back from Open Source
>> Summit this week. (But you are still very welcome to work on it by
>> yourself, just let me know.)
>>
>> Thanks!
> 
>> My suggestion is we go back to a proposal i made a few moons back (was
>> this in a discussion with you? i dont remember): create a mechanism to
>> disallow certain hierarchies of qdiscs based on certain attributes,
>> example in this case disallow hfsc from being the ancestor of "qdiscs that may
>> drop during peek" (such as netem). Then we can just keep adding more
>> "disallowed configs" that will be rejected via netlink. Similar idea
>> is being added to netem to disallow double duplication, see:
>> https://lore.kernel.org/netdev/20250622190344.446090-1-will@willsroot.io/
>>
>> cheers,
>> jamal
> 
> I vaguely remember Jamal's proposal from a while back, and I believe there was 
> some example code for this approach already? 
> Since there is another report you have a better overview, so it is probably 
> best you look at it first. In the meantime I can think about the solution a 
> bit more and possibly draft something if you wish.
> 
> Thanks,
> Lion

Actually I was intrigued, what do you think about addressing the root of the
use-after-free only and ignore the backlog corruption (kind of). After the 
recent patches where qlen_notify may get called multiple times, we could simply
loosen qdisc_tree_reduce_backlog to always notify when the qdisc is empty.
Since deletion of all qdiscs will run qdisc_reset / qdisc_purge_queue at one
point or another, this should always catch left-overs. And we need not care
about all the complexities involved of keeping the backlog right and / or
prevent certain hierarchies which seems rather tedious.
This requires some more testing, but I was imagining something like this:

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
 
 void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
 {
-	bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
 	const struct Qdisc_class_ops *cops;
 	unsigned long cl;
 	u32 parentid;
 	bool notify;
 	int drops;
 
-	if (n == 0 && len == 0)
-		return;
 	drops = max_t(int, n, 0);
 	rcu_read_lock();
 	while ((parentid = sch->parent)) {
@@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
 
 		if (sch->flags & TCQ_F_NOPARENT)
 			break;
-		/* Notify parent qdisc only if child qdisc becomes empty.
-		 *
-		 * If child was empty even before update then backlog
-		 * counter is screwed and we skip notification because
-		 * parent class is already passive.
-		 *
-		 * If the original child was offloaded then it is allowed
-		 * to be seem as empty, so the parent is notified anyway.
-		 */
-		notify = !sch->q.qlen && !WARN_ON_ONCE(!n &&
-						       !qdisc_is_offloaded);
+		/* Notify parent qdisc only if child qdisc becomes empty. */
+		notify = !sch->q.qlen;
 		/* TODO: perform the search on a per txq basis */
 		sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
 		if (sch == NULL) {
@@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
 		}
 		cops = sch->ops->cl_ops;
 		if (notify && cops->qlen_notify) {
+			/* Note that qlen_notify must be idempotent as it may get called
+			 * multiple times.
+			 */
 			cl = cops->find(sch, parentid);
 			cops->qlen_notify(sch, cl);
 		}

Thanks,
Lion