netdev - Re: [BUG] net/sched: Race Condition and Null Dereference in codel_change, pie_change, fq_pie_change, fq_codel_change, hhf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aA1kmZ/Hs0a33l5j@pop-os.localdomain>
Date: Sat, 26 Apr 2025 15:56:25 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Will <willsroot@...tonmail.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Savy <savy@...t3mfailure.io>, jhs@...atatu.com, jiri@...nulli.us
Subject: Re: [BUG] net/sched: Race Condition and Null Dereference in
 codel_change, pie_change, fq_pie_change, fq_codel_change, hhf_change

Hi Will,

On Fri, Apr 25, 2025 at 02:14:07PM +0000, Will wrote:
> Hi all,
> 
> We've encountered and triaged the following race condition that occurs across 5 different qdiscs: codel, pie, fq_pie, fq_codel, and hhf. It came up on a modified version of Syzkaller we're working on for a research project. It works on upstream (02ddfb981de88a2c15621115dd7be2431252c568), the 6.6 LTS branch, and the 6.1 LTS branch and has existed since at least 2016 (and earlier too for some of the other listed qdiscs): https://github.com/torvalds/linux/commit/2ccccf5fb43ff62b2b9.
> 
> We will take codel_change as the main example here, as the other vulnerable qdiscs change functions follow the same pattern. When the limit changes, the qdisc attempts to shrink the queue size back under the limit: https://elixir.bootlin.com/linux/v6.15-rc3/source/net/sched/sch_codel.c#L146. However, this is racy against a qdisc's dequeue function. This limit check could pass and the qdisc will attempt to dequeue the head, but the actual qdisc's dequeue function (codel_qdisc_dequeue in this case) could run. This would lead to the dequeued skb being null in the change function when only one packet remains in the queue, so when the function calls qdisc_pkt_len(skb), a null dereference exception would occur.
> 

Thanks for your detailed report, reproducer and patch!

I have two questions here:

1. Why do you say it is racy? We have sch_tree_lock() held when flushing
the packets in the backlog, it should be sufficient to prevent
concurrent ->dequeue().

2. I don't see immediately from your report why we could get a NULL from
__qdisc_dequeue_head(), because unless sch->q.qlen is wrong, we should
always have packets in the queue until we reach 0 (you specifically used
0 as the limit here).

The reason why I am asking is that if we had any of them wrong here, we
would have a biger trouble than just missing a NULL check.


Best regards,
Cong Wang