lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <45876f14-cf28-4177-8ead-bb769fd9e57a@gmail.com>
Date: Mon, 23 Jun 2025 12:41:08 +0200
From: Lion Ackermann <nnamrec@...il.com>
To: netdev@...r.kernel.org
Cc: Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang
 <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>
Subject: Incomplete fix for recent bug in tc / hfsc

Hello,

I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
incomplete:
    sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
    https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/

This patch also included a test which landed:
    selftests/tc-testing: Add an HFSC qlen accounting test

Basically running the included test case on a sanitizer kernel or with
slub_debug=P will directly reveal the UAF:

    # this is from the test case:
    ip link set dev lo up
    tc qdisc add dev lo root handle 1: drr
    tc filter add dev lo parent 1: basic classid 1:1
    tc class add dev lo parent 1: classid 1:1 drr
    tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
    tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
    tc qdisc add dev lo parent 2:1 handle 3: netem
    tc qdisc add dev lo parent 3:1 handle 4: blackhole

    # .. and slightly modified to show UAF:
    echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
    tc class delete dev lo classid 1:1
    echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888


[ ... ]
[   11.405423] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b83: 0000 [#1] PREEMPT SMP NOPTI
[   11.407511] CPU: 5 PID: 456 Comm: socat Not tainted 6.6.93 #1
[   11.408496] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.fc42 04/01/2014
[   11.410008] RIP: 0010:drr_dequeue+0x30/0x230
[   11.410690] Code: 55 41 54 4c 8d a7 80 01 00 00 55 48 89 fd 53 48 8b 87 80 01 00 00 49 39 c4 0f 84 84 00 00 00 48 8b 9d 80 01 00 00 48 8b 7b 10 <48> 8b 47 18 48 8b 40 38 ff d0 0f 1f 00 48 85 c0 74 57 8b 50 28 8b
[   11.414042] RSP: 0018:ffffc90000dff920 EFLAGS: 00010287
[   11.415081] RAX: ffff888104097950 RBX: ffff888104097950 RCX: ffff888106af3300
[   11.416878] RDX: 0000000000000001 RSI: ffff888103adb200 RDI: 6b6b6b6b6b6b6b6b
[   11.418429] RBP: ffff888103bbb000 R08: 0000000000000000 R09: ffffc90000dff9b8
[   11.419933] R10: ffffc90000dffaa0 R11: ffffc90000dffa30 R12: ffff888103bbb180
[   11.421333] R13: 0000000000000000 R14: ffff888103bbb0ac R15: ffff888103bbb000
[   11.422698] FS:  000078f12f836740(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000
[   11.424250] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.425292] CR2: 00005a9725d64000 CR3: 000000010418c004 CR4: 0000000000770ee0
[   11.426584] PKRU: 55555554
[   11.427056] Call Trace:
[   11.427506]  <TASK>
[   11.427901]  __qdisc_run+0x85/0x610
[   11.428497]  __dev_queue_xmit+0x5f9/0xde0
[   11.429195]  ? nf_hook_slow+0x3e/0xc0
[   11.429873]  ip_finish_output2+0x2b7/0x550
[   11.430560]  ip_send_skb+0x82/0x90
[   11.431195]  udp_send_skb+0x15c/0x380
[   11.431880]  udp_sendmsg+0xb91/0xfa0
[   11.432524]  ? __pfx_ip_generic_getfrag+0x10/0x10
[   11.433434]  ? __sys_sendto+0x1e0/0x210
[   11.434216]  __sys_sendto+0x1e0/0x210
[   11.434984]  __x64_sys_sendto+0x20/0x30
[   11.435868]  do_syscall_64+0x5e/0x90
[   11.436631]  ? apparmor_file_permission+0x82/0x180
[   11.437637]  ? vfs_read+0x2fc/0x340
[   11.438520]  ? exit_to_user_mode_prepare+0x1a/0x150
[   11.440008]  ? syscall_exit_to_user_mode+0x27/0x40
[   11.440917]  ? do_syscall_64+0x6a/0x90
[   11.441617]  ? do_user_addr_fault+0x319/0x650
[   11.442524]  ? exit_to_user_mode_prepare+0x1a/0x150
[   11.443499]  entry_SYSCALL_64_after_hwframe+0x78/0xe2


To be completely honest I do not quite understand the rationale behind the
original patch. The problem is that the backlog corruption propagates to
the parent _before_ parent is even expecting any backlog updates.
Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
Because HFSC is messing with the backlog before the enqueue completed, 
DRR will simply make the class active even though it should have already
removed the class from the active list due to qdisc_tree_backlog_flush.
This leaves the stale class in the active list and causes the UAF.

Looking at other qdiscs the way DRR handles child enqueues seems to resemble
the common case. HFSC calling dequeue in the enqueue handler violates
expectations. In order to fix this either HFSC has to stop using dequeue or
all classful qdiscs have to be updated to catch this corner case where
child qlen was zero even though the enqueue succeeded. Alternatively HFSC
could signal enqueue failure if it sees child dequeue dropping packets to
zero? I am not sure how this all plays out with the re-entrant case of
netem though.

If I can provide more help, please let me know.

Thanks,
Lion

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ