netdev - Re: [BUG] net/sched: Race Condition and Null Dereference in codel_change, pie_change, fq_pie_change, fq_codel_change, hhf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <_Q1nNg0fi4eK6GzUh37pUz0t4_ce3LMlVRJ0daPQvjmM2OvYVmCpi1nSoYIIJllt8NxeSm02laUHa5-i3HHAAB3KXsf-flCsartZej5ykZg=@protonmail.com>
Date: Sun, 04 May 2025 15:35:58 +0000
From: Will <willsroot@...tonmail.com>
To: Savy <savy@...t3mfailure.io>
Cc: Cong Wang <xiyou.wangcong@...il.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, jhs@...atatu.com, jiri@...nulli.us
Subject: Re: [BUG] net/sched: Race Condition and Null Dereference in codel_change, pie_change, fq_pie_change, fq_codel_change, hhf_change

Hi Cong,

Just following up on this - are there any updates for a fix?

Thank you,
Will

On Tuesday, April 29th, 2025 at 1:41 PM, Savy <savy@...t3mfailure.io> wrote:

> 
> 
> 
> On Monday, April 28th, 2025 at 7:53 PM, Cong Wang xiyou.wangcong@...il.com wrote:
> 
> > Excellent analysis!
> > 
> > Do you mind testing the following patch?
> > 
> > Note:
> > 
> > 1) We can't just test NULL, because otherwise we would leak the skb's
> > in gso_skb list.
> > 
> > 2) I am totally aware that maybe there are some other cases need the
> > same fix, but I want to be conservative here since this will be
> > targeting for -stable. It is why I intentionally keep my patch minimum.
> > 
> > Thanks!
> > 
> > --------------->
> 
> 
> Hi Cong,
> 
> Thank you for the reply. We have tested your patch and can confirm that it resolves the issue.
> However, regarding point [1], we conducted some tests to verify if there is a skb leak in the gso_skb list,
> but the packet remains in the list only for a limited amount of time.
> 
> In our POC we set a very low TBF rate, so when the Qdisc runs out of tokens,
> it reschedules itself via qdisc watchdog after approximately 45 seconds.
> 
> Returning to the example above, here is what happens when the watchdog timer fires:
> 
> [ ... ]
> 
> Packet 2 is sent:
> 
> [ ... ]
> 
> tbf_dequeue()
> qdisc_peek_dequeued()
> skb_peek(&sch->gso_skb) // sch->gso_skb is empty
> 
> codel_qdisc_dequeue() // Codel qlen is 1
> qdisc_dequeue_head()
> // Packet 2 is removed from the queue
> // Codel qlen = 0
> __skb_queue_head(&sch->gso_skb, skb); // Packet 2 is added to gso_skb list
> 
> sch->q.qlen++ // Codel qlen = 1
> 
> 
> // TBF runs out of tokens and reschedules itself for later
> qdisc_watchdog_schedule_ns()
> 
> codel_change() // Patched, (!skb) break;, does not crash
> 
> // ... ~45 seconds later the qdisc watchdog timer fires
> 
> tbf_dequeue()
> qdisc_peek_dequeued()
> skb_peek(&sch->gso_skb) // sch->gso_skb is not empty (contains Packet 2)
> 
> // TBF now has enough tokens
> qdisc_dequeue_peeked()
> skb = __skb_dequeue(&sch->gso_skb) // Packet 2 is removed from the gso_skb list
> 
> sch->q.qlen-- // Codel qlen = 0
> 
> 
> Notice how the gso_skb list is correctly cleaned up when the watchdog timer fires.
> We also examined some edge cases, such as when the watchdog is canceled
> and there are still packets left in the gso_skb list, and it is always cleaned up:
> 
> Qdisc destruction case:
> 
> tbf_destroy()
> qdisc_put()
> __qdisc_destroy()
> qdisc_reset()
> __skb_queue_purge(&qdisc->gso_skb);
> 
> 
> Qdisc reset case:
> 
> tbf_reset()
> qdisc_reset()
> __skb_queue_purge(&qdisc->gso_skb);
> 
> 
> Perhaps the skb leak you mentioned occurs in another edge case that we overlooked?
> In any case, we believe your patch is technically more correct,
> as it makes sense to clean up packets in the gso_skb list first when the limit changes.
> 
> Best regards,
> Savy
> Will