[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c8110ec8-a868-b531-a230-f4b645f5ac73@applied-asynchrony.com>
Date: Mon, 28 Apr 2025 14:28:14 +0200
From: Holger Hoffstätte <holger@...lied-asynchrony.com>
To: Greg KH <gregkh@...uxfoundation.org>
Cc: "Alan J. Wylie" <alan@...ie.me.uk>, Jamal Hadi Salim <jhs@...atatu.com>,
regressions@...ts.linux.dev, Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, Octavian Purdila <tavip@...gle.com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
stable@...r.kernel.org
Subject: Re: [REGRESSION] 6.14.3 panic - kernel NULL pointer dereference in
htb_dequeue
On 2025-04-28 13:45, Greg KH wrote:
> On Tue, Apr 22, 2025 at 07:20:24PM +0200, Holger Hoffstätte wrote:
>> (cc: Greg KH)
>>
>> On 2025-04-22 18:51, Alan J. Wylie wrote:
>>> On Mon, 21 Apr 2025 21:09:27 +0100
>>> "Alan J. Wylie" <alan@...ie.me.uk> wrote:
>>>
>>>> On Mon, 21 Apr 2025 21:47:44 +0200
>>>> Holger Hoffstätte <holger@...lied-asynchrony.com> wrote:
>>>>
>>>>>> I'm afraid that didn't help. Same panic.
>>>>>
>>>>> Bummer :-(
>>>>>
>>>>> Might be something else missing then - so for now the only other
>>>>> thing I'd suggest is to revert the removal of the qlen check in
>>>>> fq_codel.
>>>>
>>>> Like this?
>>>>
>>>> $ git diff sch_fq_codel.c
>>>> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
>>>> index 6c9029f71e88..4fdf317b82ec 100644
>>>> --- a/net/sched/sch_fq_codel.c
>>>> +++ b/net/sched/sch_fq_codel.c
>>>> @@ -316,7 +316,7 @@ static struct sk_buff *fq_codel_dequeue(struct
>>>> Qdisc *sch) qdisc_bstats_update(sch, skb);
>>>> flow->deficit -= qdisc_pkt_len(skb);
>>>> - if (q->cstats.drop_count) {
>>>> + if (q->cstats.drop_count && sch->q.qlen) {
>>>> qdisc_tree_reduce_backlog(sch, q->cstats.drop_count,
>>>> q->cstats.drop_len);
>>>> q->cstats.drop_count = 0;
>>>> $
>>>>
>>>
>>> It's been about 21 hours and no crash yet. I had an excellent day down
>>> a cave, so there's not been as much Internet traffic as usual, but
>>> there's a good chance the above patch as at least worked around, if not
>>> fixed the issue.
>>
>> Thought so .. \o/
>>
>> I guess now the question is what to do about it. IIUC the fix series [1]
>> addressed some kind of UAF problem, but obviously was not applied
>> correctly or is missing follow-ups. It's also a bit mysterious why
>> adding the HTB patch didn't work.
>>
>> Maybe Cong Wang can advise what to do here?
>>
>> So unless someone else has any ideas: Greg, please revert:
>>
>> 6.14.y/a57fe60ef4cf96bfbb6b58397ec28bdb5a5c6b31
>> ("codel: remove sch->q.qlen check before qdisc_tree_reduce_backlog()")
>>
>> and probably from 6.12 as well.
>
> Why only those 2 branches? What about all others, and mainline?
All branches that received upstream 342debc12183 aka
"codel: remove sch->q.qlen check before qdisc_tree_reduce_backlog()"
should be reverted for now. I previously didn't check them all.
It was part of a series, and the rest of the series is missing (not
picked by autosel). Luckily that does not matter because mainline
is apparently also buggy, so applying the rest of the series
would not help. This is currently still being debugged/worked on.
Just reverting this one commit provably fixes the crash for people
using -stable in e.g. gateway/router setups, with a htb->fq_codel chain.
The crash does not affect people using fq_codel as root qdisc.
Reverting is not the greatest solution since it reintroduces a potential
UAF, which has been there since fq_codel was first committed.
IIUC this potential UAF can only be induced with admin privileges.
I don't know how to explain this any better; all I did was try to
help Alan. :-(
-h
Powered by blists - more mailing lists