[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aA/czQYEtPmMim0G@pop-os.localdomain>
Date: Mon, 28 Apr 2025 12:53:49 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Will <willsroot@...tonmail.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Savy <savy@...t3mfailure.io>, jhs@...atatu.com, jiri@...nulli.us
Subject: Re: [BUG] net/sched: Race Condition and Null Dereference in
codel_change, pie_change, fq_pie_change, fq_codel_change, hhf_change
On Sun, Apr 27, 2025 at 09:26:43PM +0000, Will wrote:
> Hi Cong,
>
> Thank you for the reply. On further analysis, we realized that you are correct - it is not a race condition between xxx_change and xxx_dequeue. The root cause is more complicated and actually relates to the parent tbf qdisc. The bug is still a race condition though.
>
> __qdisc_dequeue_head() can still return null even if sch->q.qlen is non-zero because of qdisc_peek_dequeued, which is the vulnerable qdiscs' peek handler, and tbf_dequeue calls it (https://elixir.bootlin.com/linux/v6.15-rc3/source/net/sched/sch_tbf.c#L280). There, the inner qdisc dequeues content before, adds it back to gso_skb, and increments qlen (https://elixir.bootlin.com/linux/v6.15-rc3/source/include/net/sch_generic.h#L1133). A queue state consistency issue arises when tbf does not have enough tokens (https://elixir.bootlin.com/linux/v6.15-rc3/source/net/sched/sch_tbf.c#L302) for dequeuing. The qlen value will be fixed when sufficient tokens exist and the watchdog fires again. However, there is a window for the inner qdisc to encounter this inconsistency and thus hit the null dereference.
>
> Savy made this diagram below to showcase the interactions to trigger the bug.
>
> Packet 1 is sent:
>
> tbf_enqueue()
> qdisc_enqueue()
> codel_qdisc_enqueue() // Codel qlen is 0
> qdisc_enqueue_tail()
> // Packet 1 is added to the queue
> // Codel qlen = 1
>
> tbf_dequeue()
> qdisc_peek_dequeued()
> skb_peek(&sch->gso_skb) // sch->gso_skb is empty
> codel_qdisc_dequeue() // Codel qlen is 1
> qdisc_dequeue_head()
> // Packet 1 is removed from the queue
> // Codel qlen = 0
> __skb_queue_head(&sch->gso_skb, skb); // Packet 1 is added to gso_skb list
> sch->q.qlen++ // Codel qlen = 1
> qdisc_dequeue_peeked()
> skb = __skb_dequeue(&sch->gso_skb) // Packet 1 is removed from the gso_skb list
> sch->q.qlen-- // Codel qlen = 0
>
> Packet 2 is sent:
>
> tbf_enqueue()
> qdisc_enqueue()
> codel_qdisc_enqueue() // Codel qlen is 0
> qdisc_enqueue_tail()
> // Packet 2 is added to the queue
> // Codel qlen = 1
>
> tbf_dequeue()
> qdisc_peek_dequeued()
> skb_peek(&sch->gso_skb) // sch->gso_skb is empty
> codel_qdisc_dequeue() // Codel qlen is 1
> qdisc_dequeue_head()
> // Packet 2 is removed from the queue
> // Codel qlen = 0
> __skb_queue_head(&sch->gso_skb, skb); // Packet 2 is added to gso_skb list
> sch->q.qlen++ // Codel qlen = 1
>
> // TBF runs out of tokens and reschedules itself for later
> qdisc_watchdog_schedule_ns()
>
>
> Notice here how codel is left in an "inconsistent" state, as sch->q.qlen > 0, but there are no packets left in the codel queue (sch->q.head is NULL)
>
> At this point, codel_change() can be used to update the limit to 0. However, even if sch->q.qlen > 0, there are no packets in the queue, so __qdisc_dequeue_head() returns NULL and the null-ptr-deref occurs.
>
Excellent analysis!
Do you mind testing the following patch?
Note:
1) We can't just test NULL, because otherwise we would leak the skb's
in gso_skb list.
2) I am totally aware that _maybe_ there are some other cases need the
same fix, but I want to be conservative here since this will be
targeting for -stable. It is why I intentionally keep my patch minimum.
Thanks!
--------------->
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d48c657191cd..5a4840678ce5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1031,6 +1031,25 @@ static inline struct sk_buff *__qdisc_dequeue_head(struct qdisc_skb_head *qh)
return skb;
}
+static inline struct sk_buff *qdisc_dequeue_internal(struct Qdisc *sch)
+{
+ struct sk_buff *skb;
+
+ skb = __skb_dequeue(&sch->gso_skb);
+ if (skb != NULL) {
+ if (qdisc_is_percpu_stats(sch)) {
+ qdisc_qstats_cpu_backlog_dec(sch, skb);
+ qdisc_qstats_cpu_qlen_dec(sch);
+ } else {
+ qdisc_qstats_backlog_dec(sch, skb);
+ sch->q.qlen--;
+ }
+ return skb;
+ }
+ skb = __qdisc_dequeue_head(&sch->q);
+ return skb;
+}
+
static inline struct sk_buff *qdisc_dequeue_head(struct Qdisc *sch)
{
struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 12dd71139da3..e1bf4919d258 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -144,10 +144,9 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt,
qlen = sch->q.qlen;
while (sch->q.qlen > sch->limit) {
- struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
+ struct sk_buff *skb = qdisc_dequeue_internal(sch);
dropped += qdisc_pkt_len(skb);
- qdisc_qstats_backlog_dec(sch, skb);
rtnl_qdisc_drop(skb, sch);
}
qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 3771d000b30d..b6ed94976e69 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -195,10 +195,9 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt,
/* Drop excess packets if new limit is lower */
qlen = sch->q.qlen;
while (sch->q.qlen > sch->limit) {
- struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
+ struct sk_buff *skb = qdisc_dequeue_internal(sch);
dropped += qdisc_pkt_len(skb);
- qdisc_qstats_backlog_dec(sch, skb);
rtnl_qdisc_drop(skb, sch);
}
qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
Powered by blists - more mailing lists