lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <pF5XOOIim0IuEfhI-SOxTgRvNoDwuux7UHKnE_Y5-zVd4wmGvNk2ceHjKb8ORnzw0cGwfmVu42g9dL7XyJLf1NEzaztboTWcm0Ogxuojoeo=@willsroot.io>
Date: Tue, 08 Jul 2025 15:52:31 +0000
From: William Liu <will@...lsroot.io>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Cc: Savy <savy@...t3mfailure.io>, Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>, Paolo Abeni <pabeni@...hat.com>
Subject: [BUG] BUG_ON in htb_lookup_leaf

Hi,

We write to report a way to trigger the BUG_ON condition in htb_lookup_leaf in the dequeue path of the htb qdisc. Using the following reproducer (note that tc is patched to allow sfb to have a 0 value for the max option):

./tc qdisc del dev lo root
./tc qdisc add dev lo root handle 1: htb default 1
./tc class add dev lo parent 1: classid 1:1 htb rate 64bit 
./tc qdisc add dev lo parent 1:1 handle 2: netem
./tc qdisc add dev lo parent 2:1 handle 3: sfb
ping -I lo -f -c1 -s64 -W0.001 127.0.0.1 2>&1 >/dev/null &

We hit the following kernel panic:
[   84.138902] tc (239) used greatest stack depth: 24520 bytes left
[  157.701864] htb: netem qdisc 2: is non-work-conserving?
[  157.704354] ------------[ cut here ]------------
[  157.706230] kernel BUG at net/sched/sch_htb.c:824!
[  157.708206] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
[  157.710410] CPU: 1 UID: 0 PID: 251 Comm: ping Not tainted 6.16.0-rc4-g1f988d0788f5 #145 PREEMPT(voluntary) 
[  157.714168] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  157.717191] RIP: 0010:htb_lookup_leaf+0x560/0x690
[  157.718445] Code: 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 4c 8b 6c 24 40 e8 b4 4b 59 fd 0f 0b 45 31 ff eb 9e 45 31 c0 e9 85 fe ff ff e8 a0 4b 59 fd <0f> 0b 4c 8b 6c 24 40 e8 94 4b 59 fd 0f 0b eb de 48 89 ef e8 2f
[  157.723203] RSP: 0018:ffff888103e6f148 EFLAGS: 00010293
[  157.724567] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  157.726406] RDX: ffff888102ea0000 RSI: ffffffff842e9610 RDI: ffff888103e6f270
[  157.728248] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
[  157.730091] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888103782458
[  157.731927] R13: 1ffff110207cde32 R14: 0000000000000000 R15: ffff8881037820e8
[  157.733759] FS:  00007f9c0603f000(0000) GS:ffff88819241c000(0000) knlGS:0000000000000000
[  157.735846] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  157.737335] CR2: 00007f9c06279000 CR3: 000000010fb02000 CR4: 00000000000006f0
[  157.739173] Call Trace:
[  157.739834]  <TASK>
[  157.740406]  ? vprintk_emit+0x237/0x730
[  157.741433]  ? __pfx_vprintk_emit+0x10/0x10
[  157.742531]  ? lock_release+0xc4/0x290
[  157.743525]  ? __pfx_htb_lookup_leaf+0x10/0x10
[  157.744712]  htb_dequeue+0x1b16/0x22a0
[  157.745717]  ? __pfx_htb_dequeue+0x10/0x10
[  157.746813]  __qdisc_run+0x1bc/0x1a90
[  157.747786]  ? __pfx_htb_enqueue+0x10/0x10
[  157.748872]  __dev_queue_xmit+0x278f/0x4120
[  157.749981]  ? check_path.constprop.0+0x24/0x50
[  157.751173]  ? __pfx___dev_queue_xmit+0x10/0x10
[  157.752372]  ? __lock_acquire+0x16c2/0x2b10
[  157.753473]  ? lock_acquire+0x14c/0x2e0
[  157.754489]  ? find_held_lock+0x2b/0x80
[  157.755508]  ? mark_held_locks+0x40/0x70
[  157.756546]  ip_finish_output2+0x1275/0x1ee0
[  157.757676]  ? __pfx_ip_finish_output2+0x10/0x10
[  157.758906]  ? __pfx_ip_dst_mtu_maybe_forward+0x10/0x10
[  157.760263]  ? ip_output+0x61f/0xe20
[  157.761211]  ? find_held_lock+0x2b/0x80
[  157.762229]  __ip_finish_output.part.0+0x348/0x7e0
[  157.763479]  ip_output+0x298/0xe20
[  157.764386]  ? __pfx_ip_output+0x10/0x10
[  157.765424]  ? __pfx_ip_finish_output+0x10/0x10
[  157.766619]  ? __pfx_ip_output+0x10/0x10
[  157.767652]  ip_push_pending_frames+0x2f8/0x5a0
[  157.768849]  raw_sendmsg+0x12a6/0x3350
[  157.769855]  ? __pfx_raw_sendmsg+0x10/0x10
[  157.770933]  ? mark_held_locks+0x40/0x70
[  157.771959]  ? find_held_lock+0x2b/0x80
[  157.772973]  ? filemap_map_pages+0xd2c/0x13b0
[  157.774117]  ? lock_release+0xc4/0x290
[  157.775109]  ? sock_has_perm+0x2b3/0x360
[  157.776146]  ? find_held_lock+0x2b/0x80
[  157.777163]  ? __might_fault+0xe4/0x190
[  157.778177]  ? __might_fault+0x155/0x190
[  157.779212]  ? __check_object_size+0xa7/0x8a0
[  157.780364]  ? __pfx_raw_sendmsg+0x10/0x10
[  157.781444]  inet_sendmsg+0x11d/0x140
[  157.782417]  __sys_sendto+0x43d/0x520
[  157.783390]  ? __pfx___sys_sendto+0x10/0x10
[  157.784542]  ? count_memcg_events_mm.constprop.0+0xfa/0x300
[  157.786003]  ? lock_release+0xc4/0x290
[  157.787012]  __x64_sys_sendto+0xe1/0x1c0
[  157.788069]  ? trace_irq_enable.constprop.0+0xc2/0x110
[  157.789429]  do_syscall_64+0x64/0x2d0
[  157.790412]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  157.791745] RIP: 0033:0x7f9c062bf046
[  157.792698] Code: 0e 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 49
[  157.797469] RSP: 002b:00007ffcfb72c948 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  157.799431] RAX: ffffffffffffffda RBX: 00007ffcfb72e0d0 RCX: 00007f9c062bf046
[  157.801281] RDX: 0000000000000048 RSI: 000055eb00449950 RDI: 0000000000000003
[  157.803134] RBP: 000055eb00449950 R08: 00007ffcfb73034c R09: 0000000000000010
[  157.804987] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000048
[  157.806838] R13: 00007ffcfb72e090 R14: 00007ffcfb72c950 R15: 0000001d00000001
[  157.808696]  </TASK>
[  157.809300] Modules linked in:
[  157.810181] ---[ end trace 0000000000000000 ]---
[  157.811504] RIP: 0010:htb_lookup_leaf+0x560/0x690
[  157.813126] Code: 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 4c 8b 6c 24 40 e8 b4 4b 59 fd 0f 0b 45 31 ff eb 9e 45 31 c0 e9 85 fe ff ff e8 a0 4b 59 fd <0f> 0b 4c 8b 6c 24 40 e8 94 4b 59 fd 0f 0b eb de 48 89 ef e8 2f
[  157.818692] RSP: 0018:ffff888103e6f148 EFLAGS: 00010293
[  157.820315] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  157.822461] RDX: ffff888102ea0000 RSI: ffffffff842e9610 RDI: ffff888103e6f270
[  157.824639] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
[  157.826829] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888103782458
[  157.828915] R13: 1ffff110207cde32 R14: 0000000000000000 R15: ffff8881037820e8
[  157.831098] FS:  00007f9c0603f000(0000) GS:ffff88819241c000(0000) knlGS:0000000000000000
[  157.833561] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  157.835331] CR2: 00007f9c06279000 CR3: 000000010fb02000 CR4: 00000000000006f0
[  157.837536] Kernel panic - not syncing: Fatal exception in interrupt
[  157.840023] Kernel Offset: disabled
[  157.840956] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---


The following is the ftrace function_graph, with the BUG_ON substituted for a call to a placeholder function "htb_marker":

# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0)               |  htb_enqueue() {
 0) + 13.635 us   |    netem_enqueue();
 0)   4.719 us    |    htb_activate_prios();
 0) # 2249.199 us |  }
 0)               |  htb_dequeue() {
 0)   2.355 us    |    htb_lookup_leaf();
 0)               |    netem_dequeue() {
 0) + 11.061 us   |      sfb_enqueue();
 0)               |      qdisc_tree_reduce_backlog() {
 0)               |        qdisc_lookup_rcu() {
 0)   1.873 us    |          qdisc_match_from_root();
 0)   6.292 us    |        }
 0)   1.894 us    |        htb_search();
 0)               |        htb_qlen_notify() {
 0)   2.655 us    |          htb_deactivate_prios();
 0)   6.933 us    |        }
 0) + 25.227 us   |      }
 0)   1.983 us    |      sfb_dequeue();
 0) + 86.553 us   |    }
 0) # 2932.761 us |    qdisc_warn_nonwc();
 0)               |    htb_lookup_leaf() {
 0) # 1268.829 us |      htb_marker();
 0) # 1275.412 us |    }
 0) # 6453.144 us |  }
 ------------------------------------------

The root cause is the following:

1. htb_dequeue calls htb_dequeue_tree which calls the dequeue handler on the selected leaf qdisc: https://elixir.bootlin.com/linux/v6.16-rc4/source/net/sched/sch_htb.c#L909
2. netem_deqeueue calls enqueue on the child qdisc (in this case sfb)
3. Since sfb's max value is 0, it drops the packet and returns a failure value: https://elixir.bootlin.com/linux/v6.16-rc4/source/net/sched/sch_sfb.c#L349
4. Because of this, netem_dequeue calls qdisc_tree_reduce_backlog, and since qlen is now 0, it calls htb_qlen_notify -> htb_deactivate -> htb_deactiviate_prios -> htb_remove_class_from_row -> htb_safe_rb_erase
5. As this is the only class in the selected hprio rbtree, __rb_change_child in __rb_erase_augmented sets the rb_root pointer to null (https://elixir.bootlin.com/linux/v6.16-rc4/source/include/linux/rbtree_augmented.h#L242)
6. Because sfb dropped the packet, the original dequeue handler from step 1 returns null, which causes htb_dequeue_tree to call htb_lookup_leaf with the same hprio rbtree, and fail the BUG_ON

A potential fix I see is to just replace the BUG_ON with returning NULL. I can make a patch if this solution seems satisfactory. Please feel free to let me know of any questions or issues.

On another side note, when triaging this issue, I encountered another WARNING that can be hit if an htb child is configured with a cake with 1b as the memlimit here: https://elixir.bootlin.com/linux/v6.16-rc4/source/net/sched/sch_htb.c#L595. I don't think the `!cl->leaf.q->q.qlen` condition is needed here, as htb_dequeue_tree gracefully handles that case anyways. 

Best,
Will
Savy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ