lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aA/czQYEtPmMim0G@pop-os.localdomain>
Date: Mon, 28 Apr 2025 12:53:49 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Will <willsroot@...tonmail.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Savy <savy@...t3mfailure.io>, jhs@...atatu.com, jiri@...nulli.us
Subject: Re: [BUG] net/sched: Race Condition and Null Dereference in
 codel_change, pie_change, fq_pie_change, fq_codel_change, hhf_change

On Sun, Apr 27, 2025 at 09:26:43PM +0000, Will wrote:
> Hi Cong,
> 
> Thank you for the reply. On further analysis, we realized that you are correct - it is not a race condition between xxx_change and xxx_dequeue. The root cause is more complicated and actually relates to the parent tbf qdisc. The bug is still a race condition though.
> 
> __qdisc_dequeue_head() can still return null even if sch->q.qlen is non-zero because of qdisc_peek_dequeued, which is the vulnerable qdiscs' peek handler, and tbf_dequeue calls it (https://elixir.bootlin.com/linux/v6.15-rc3/source/net/sched/sch_tbf.c#L280). There, the inner qdisc dequeues content before, adds it back to gso_skb, and increments qlen (https://elixir.bootlin.com/linux/v6.15-rc3/source/include/net/sch_generic.h#L1133). A queue state consistency issue arises when tbf does not have enough tokens (https://elixir.bootlin.com/linux/v6.15-rc3/source/net/sched/sch_tbf.c#L302) for dequeuing. The qlen value will be fixed when sufficient tokens exist and the watchdog fires again. However, there is a window for the inner qdisc to encounter this inconsistency and thus hit the null dereference.
> 
> Savy made this diagram below to showcase the interactions to trigger the bug.
> 
> Packet 1 is sent:
> 
>     tbf_enqueue()
>         qdisc_enqueue()
>             codel_qdisc_enqueue() // Codel qlen is 0
>                 qdisc_enqueue_tail()
>                 // Packet 1 is added to the queue
>                 // Codel qlen = 1
> 
>     tbf_dequeue()
>         qdisc_peek_dequeued()
>             skb_peek(&sch->gso_skb) // sch->gso_skb is empty
>             codel_qdisc_dequeue() // Codel qlen is 1
>                 qdisc_dequeue_head()
>                 // Packet 1 is removed from the queue
>                 // Codel qlen = 0
>             __skb_queue_head(&sch->gso_skb, skb); // Packet 1 is added to gso_skb list
>             sch->q.qlen++ // Codel qlen = 1
>         qdisc_dequeue_peeked()
>             skb = __skb_dequeue(&sch->gso_skb) // Packet 1 is removed from the gso_skb list
>             sch->q.qlen-- // Codel qlen = 0
> 
> Packet 2 is sent:
> 
>     tbf_enqueue()
>         qdisc_enqueue()
>             codel_qdisc_enqueue() // Codel qlen is 0
>                 qdisc_enqueue_tail()
>                 // Packet 2 is added to the queue
>                 // Codel qlen = 1
> 
>     tbf_dequeue()
>         qdisc_peek_dequeued()
>             skb_peek(&sch->gso_skb) // sch->gso_skb is empty
>             codel_qdisc_dequeue() // Codel qlen is 1
>                 qdisc_dequeue_head()
>                 // Packet 2 is removed from the queue
>                 // Codel qlen = 0
>             __skb_queue_head(&sch->gso_skb, skb); // Packet 2 is added to gso_skb list
>             sch->q.qlen++ // Codel qlen = 1
> 
>         // TBF runs out of tokens and reschedules itself for later
>         qdisc_watchdog_schedule_ns()
> 
> 
> Notice here how codel is left in an "inconsistent" state, as sch->q.qlen > 0, but there are no packets left in the codel queue (sch->q.head is NULL)
> 
> At this point, codel_change() can be used to update the limit to 0. However, even if  sch->q.qlen > 0, there are no packets in the queue, so __qdisc_dequeue_head() returns NULL and the null-ptr-deref occurs.
> 

Excellent analysis!

Do you mind testing the following patch?

Note:

1) We can't just test NULL, because otherwise we would leak the skb's
in gso_skb list. 

2) I am totally aware that _maybe_ there are some other cases need the
same fix, but I want to be conservative here since this will be
targeting for -stable. It is why I intentionally keep my patch minimum.

Thanks!

--------------->

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d48c657191cd..5a4840678ce5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1031,6 +1031,25 @@ static inline struct sk_buff *__qdisc_dequeue_head(struct qdisc_skb_head *qh)
 	return skb;
 }
 
+static inline struct sk_buff *qdisc_dequeue_internal(struct Qdisc *sch)
+{
+	struct sk_buff *skb;
+
+	skb = __skb_dequeue(&sch->gso_skb);
+	if (skb != NULL) {
+		if (qdisc_is_percpu_stats(sch)) {
+			qdisc_qstats_cpu_backlog_dec(sch, skb);
+			qdisc_qstats_cpu_qlen_dec(sch);
+		} else {
+			qdisc_qstats_backlog_dec(sch, skb);
+			sch->q.qlen--;
+		}
+		return skb;
+	}
+	skb = __qdisc_dequeue_head(&sch->q);
+	return skb;
+}
+
 static inline struct sk_buff *qdisc_dequeue_head(struct Qdisc *sch)
 {
 	struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 12dd71139da3..e1bf4919d258 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -144,10 +144,9 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt,
 
 	qlen = sch->q.qlen;
 	while (sch->q.qlen > sch->limit) {
-		struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
+		struct sk_buff *skb = qdisc_dequeue_internal(sch);
 
 		dropped += qdisc_pkt_len(skb);
-		qdisc_qstats_backlog_dec(sch, skb);
 		rtnl_qdisc_drop(skb, sch);
 	}
 	qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 3771d000b30d..b6ed94976e69 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -195,10 +195,9 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt,
 	/* Drop excess packets if new limit is lower */
 	qlen = sch->q.qlen;
 	while (sch->q.qlen > sch->limit) {
-		struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
+		struct sk_buff *skb = qdisc_dequeue_internal(sch);
 
 		dropped += qdisc_pkt_len(skb);
-		qdisc_qstats_backlog_dec(sch, skb);
 		rtnl_qdisc_drop(skb, sch);
 	}
 	qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ