lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <EBHeQZeq5AJteszZoHrsiJv6EGOnuByQ-XNejgA9WiqQ8g2jIXowzoGjuJowDcV6xi9xBgyMTwNlS8wN0zUOlRl4Bl2Mv-x883IKCvdySyU=@syst3mfailure.io>
Date: Tue, 29 Apr 2025 13:41:19 +0000
From: Savy <savy@...t3mfailure.io>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: Will <willsroot@...tonmail.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, jhs@...atatu.com, jiri@...nulli.us
Subject: Re: [BUG] net/sched: Race Condition and Null Dereference in codel_change, pie_change, fq_pie_change, fq_codel_change, hhf_change


On Monday, April 28th, 2025 at 7:53 PM, Cong Wang <xiyou.wangcong@...il.com> wrote:

> 
> 
> Excellent analysis!
> 
> Do you mind testing the following patch?
> 
> Note:
> 
> 1) We can't just test NULL, because otherwise we would leak the skb's
> in gso_skb list.
> 
> 2) I am totally aware that maybe there are some other cases need the
> same fix, but I want to be conservative here since this will be
> targeting for -stable. It is why I intentionally keep my patch minimum.
> 
> Thanks!
> 
> --------------->
> 

Hi Cong,

Thank you for the reply. We have tested your patch and can confirm that it resolves the issue.
However, regarding point [1], we conducted some tests to verify if there is a skb leak in the gso_skb list, 
but the packet remains in the list only for a limited amount of time.

In our POC we set a very low TBF rate, so when the Qdisc runs out of tokens, 
it reschedules itself via qdisc watchdog after approximately 45 seconds.

Returning to the example above, here is what happens when the watchdog timer fires:

[ ... ]

Packet 2 is sent:

    [ ... ]

    tbf_dequeue()
        qdisc_peek_dequeued()
            skb_peek(&sch->gso_skb) // sch->gso_skb is empty
            codel_qdisc_dequeue() // Codel qlen is 1
                qdisc_dequeue_head()
                // Packet 2 is removed from the queue
                // Codel qlen = 0
            __skb_queue_head(&sch->gso_skb, skb); // Packet 2 is added to gso_skb list
            sch->q.qlen++ // Codel qlen = 1

        // TBF runs out of tokens and reschedules itself for later
        qdisc_watchdog_schedule_ns() 

    codel_change() // Patched, (!skb) break;, does not crash

    // ... ~45 seconds later the qdisc watchdog timer fires

    tbf_dequeue()
        qdisc_peek_dequeued()
            skb_peek(&sch->gso_skb) // sch->gso_skb is _not_ empty (contains Packet 2)
        // TBF now has enough tokens
        qdisc_dequeue_peeked()
            skb = __skb_dequeue(&sch->gso_skb) // Packet 2 is removed from the gso_skb list
            sch->q.qlen-- // Codel qlen = 0

Notice how the gso_skb list is correctly cleaned up when the watchdog timer fires.
We also examined some edge cases, such as when the watchdog is canceled 
and there are still packets left in the gso_skb list, and it is always cleaned up:

Qdisc destruction case:

    tbf_destroy()
        qdisc_put()
            __qdisc_destroy()
               qdisc_reset()
                   __skb_queue_purge(&qdisc->gso_skb);

Qdisc reset case:

    tbf_reset()
        qdisc_reset()
            __skb_queue_purge(&qdisc->gso_skb);

Perhaps the skb leak you mentioned occurs in another edge case that we overlooked? 
In any case, we believe your patch is technically more correct,
as it makes sense to clean up packets in the gso_skb list first when the limit changes.

Best regards,
Savy
Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ