[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoMm+xgb0vkTDMAWy9xCvTF+XjGQ1xO5A2REajmBN1DKu1Q@mail.gmail.com>
Date: Wed, 25 Jun 2025 10:22:12 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Lion Ackermann <nnamrec@...il.com>
Cc: Cong Wang <xiyou.wangcong@...il.com>, netdev@...r.kernel.org,
Jiri Pirko <jiri@...nulli.us>, Mingi Cho <mincho@...ori.io>
Subject: Re: Incomplete fix for recent bug in tc / hfsc
On Tue, Jun 24, 2025 at 6:43 AM Lion Ackermann <nnamrec@...il.com> wrote:
>
> Hi,
>
> On 6/24/25 11:24 AM, Lion Ackermann wrote:
> > Hi,
> >
> > On 6/24/25 6:41 AM, Cong Wang wrote:
> >> On Mon, Jun 23, 2025 at 12:41:08PM +0200, Lion Ackermann wrote:
> >>> Hello,
> >>>
> >>> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
> >>> incomplete:
> >>> sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
> >>> https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
> >>>
> >>> This patch also included a test which landed:
> >>> selftests/tc-testing: Add an HFSC qlen accounting test
> >>>
> >>> Basically running the included test case on a sanitizer kernel or with
> >>> slub_debug=P will directly reveal the UAF:
> >>
> >> Interesting, I have SLUB debugging enabled in my kernel config too:
> >>
> >> CONFIG_SLUB_DEBUG=y
> >> CONFIG_SLUB_DEBUG_ON=y
> >> CONFIG_SLUB_RCU_DEBUG=y
> >>
> >> But I didn't catch this bug.
> >>
> >
> > Technically the class deletion step which triggered the sanitizer was not
> > present in your testcase. The testcase only left the stale pointer which was
> > never accessed though.
> >
> >>> To be completely honest I do not quite understand the rationale behind the
> >>> original patch. The problem is that the backlog corruption propagates to
> >>> the parent _before_ parent is even expecting any backlog updates.
> >>> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
> >>> Because HFSC is messing with the backlog before the enqueue completed,
> >>> DRR will simply make the class active even though it should have already
> >>> removed the class from the active list due to qdisc_tree_backlog_flush.
> >>> This leaves the stale class in the active list and causes the UAF.
> >>>
> >>> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
> >>> the common case. HFSC calling dequeue in the enqueue handler violates
> >>> expectations. In order to fix this either HFSC has to stop using dequeue or
> >>> all classful qdiscs have to be updated to catch this corner case where
> >>> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
> >>> could signal enqueue failure if it sees child dequeue dropping packets to
> >>> zero? I am not sure how this all plays out with the re-entrant case of
> >>> netem though.
> >>
> >> I think this may be the same bug report from Mingi in the security
> >> mailing list. I will take a deep look after I go back from Open Source
> >> Summit this week. (But you are still very welcome to work on it by
> >> yourself, just let me know.)
> >>
> >> Thanks!
> >
> >> My suggestion is we go back to a proposal i made a few moons back (was
> >> this in a discussion with you? i dont remember): create a mechanism to
> >> disallow certain hierarchies of qdiscs based on certain attributes,
> >> example in this case disallow hfsc from being the ancestor of "qdiscs that may
> >> drop during peek" (such as netem). Then we can just keep adding more
> >> "disallowed configs" that will be rejected via netlink. Similar idea
> >> is being added to netem to disallow double duplication, see:
> >> https://lore.kernel.org/netdev/20250622190344.446090-1-will@willsroot.io/
> >>
> >> cheers,
> >> jamal
> >
> > I vaguely remember Jamal's proposal from a while back, and I believe there was
> > some example code for this approach already?
> > Since there is another report you have a better overview, so it is probably
> > best you look at it first. In the meantime I can think about the solution a
> > bit more and possibly draft something if you wish.
> >
> > Thanks,
> > Lion
>
> Actually I was intrigued, what do you think about addressing the root of the
> use-after-free only and ignore the backlog corruption (kind of). After the
> recent patches where qlen_notify may get called multiple times, we could simply
> loosen qdisc_tree_reduce_backlog to always notify when the qdisc is empty.
> Since deletion of all qdiscs will run qdisc_reset / qdisc_purge_queue at one
> point or another, this should always catch left-overs. And we need not care
> about all the complexities involved of keeping the backlog right and / or
> prevent certain hierarchies which seems rather tedious.
> This requires some more testing, but I was imagining something like this:
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
>
> void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> {
> - bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
> const struct Qdisc_class_ops *cops;
> unsigned long cl;
> u32 parentid;
> bool notify;
> int drops;
>
> - if (n == 0 && len == 0)
> - return;
> drops = max_t(int, n, 0);
> rcu_read_lock();
> while ((parentid = sch->parent)) {
> @@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
>
> if (sch->flags & TCQ_F_NOPARENT)
> break;
> - /* Notify parent qdisc only if child qdisc becomes empty.
> - *
> - * If child was empty even before update then backlog
> - * counter is screwed and we skip notification because
> - * parent class is already passive.
> - *
> - * If the original child was offloaded then it is allowed
> - * to be seem as empty, so the parent is notified anyway.
> - */
> - notify = !sch->q.qlen && !WARN_ON_ONCE(!n &&
> - !qdisc_is_offloaded);
> + /* Notify parent qdisc only if child qdisc becomes empty. */
> + notify = !sch->q.qlen;
> /* TODO: perform the search on a per txq basis */
> sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
> if (sch == NULL) {
> @@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> }
> cops = sch->ops->cl_ops;
> if (notify && cops->qlen_notify) {
> + /* Note that qlen_notify must be idempotent as it may get called
> + * multiple times.
> + */
> cl = cops->find(sch, parentid);
> cops->qlen_notify(sch, cl);
> }
>
I believe this will fix the issue. My concern is we are not solving
the root cause. I also posted a bunch of fixes on related issues for
something Mingi Cho (on Cc) found - see attachments, i am not in favor
of these either.
Most of these setups are nonsensical. After seeing so many of these my
view is we start disallowing such hierarchies.
cheers,
jamal
Download attachment "drr_fix.diff" of type "application/octet-stream" (1813 bytes)
Download attachment "qfq_netem_child_fix.diff" of type "application/octet-stream" (2282 bytes)
Powered by blists - more mailing lists