netdev - Re: [BUG] net/sched: Race Condition and Null Dereference in codel_change, pie_change, fq_pie_change, fq_codel_change, hhf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <EBHeQZeq5AJteszZoHrsiJv6EGOnuByQ-XNejgA9WiqQ8g2jIXowzoGjuJowDcV6xi9xBgyMTwNlS8wN0zUOlRl4Bl2Mv-x883IKCvdySyU=@syst3mfailure.io>
Date: Tue, 29 Apr 2025 13:41:19 +0000
From: Savy <savy@...t3mfailure.io>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: Will <willsroot@...tonmail.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, jhs@...atatu.com, jiri@...nulli.us
Subject: Re: [BUG] net/sched: Race Condition and Null Dereference in codel_change, pie_change, fq_pie_change, fq_codel_change, hhf_change

On Monday, April 28th, 2025 at 7:53 PM, Cong Wang <xiyou.wangcong@...il.com> wrote:

> 
> 
> Excellent analysis!
> 
> Do you mind testing the following patch?
> 
> Note:
> 
> 1) We can't just test NULL, because otherwise we would leak the skb's
> in gso_skb list.
> 
> 2) I am totally aware that maybe there are some other cases need the
> same fix, but I want to be conservative here since this will be
> targeting for -stable. It is why I intentionally keep my patch minimum.
> 
> Thanks!
> 
> --------------->
> 

Hi Cong,

Thank you for the reply. We have tested your patch and can confirm that it resolves the issue.
However, regarding point [1], we conducted some tests to verify if there is a skb leak in the gso_skb list, 
but the packet remains in the list only for a limited amount of time.

In our POC we set a very low TBF rate, so when the Qdisc runs out of tokens, 
it reschedules itself via qdisc watchdog after approximately 45 seconds.

Returning to the example above, here is what happens when the watchdog timer fires:

[ ... ]

Packet 2 is sent:

    [ ... ]

    tbf_dequeue()
        qdisc_peek_dequeued()
            skb_peek(&sch->gso_skb) // sch->gso_skb is empty
            codel_qdisc_dequeue() // Codel qlen is 1
                qdisc_dequeue_head()
                // Packet 2 is removed from the queue
                // Codel qlen = 0
            __skb_queue_head(&sch->gso_skb, skb); // Packet 2 is added to gso_skb list
            sch->q.qlen++ // Codel qlen = 1

        // TBF runs out of tokens and reschedules itself for later
        qdisc_watchdog_schedule_ns() 

    codel_change() // Patched, (!skb) break;, does not crash

    // ... ~45 seconds later the qdisc watchdog timer fires

    tbf_dequeue()
        qdisc_peek_dequeued()
            skb_peek(&sch->gso_skb) // sch->gso_skb is _not_ empty (contains Packet 2)
        // TBF now has enough tokens
        qdisc_dequeue_peeked()
            skb = __skb_dequeue(&sch->gso_skb) // Packet 2 is removed from the gso_skb list
            sch->q.qlen-- // Codel qlen = 0

Notice how the gso_skb list is correctly cleaned up when the watchdog timer fires.
We also examined some edge cases, such as when the watchdog is canceled 
and there are still packets left in the gso_skb list, and it is always cleaned up:

Qdisc destruction case:

    tbf_destroy()
        qdisc_put()
            __qdisc_destroy()
               qdisc_reset()
                   __skb_queue_purge(&qdisc->gso_skb);

Qdisc reset case:

    tbf_reset()
        qdisc_reset()
            __skb_queue_purge(&qdisc->gso_skb);

Perhaps the skb leak you mentioned occurs in another edge case that we overlooked? 
In any case, we believe your patch is technically more correct,
as it makes sense to clean up packets in the gso_skb list first when the limit changes.

Best regards,
Savy
Will