[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <3qom4fkg7kp4l3bcgrbivmm2yi2wqrmso7rb5qe3xffjj3k7hz@nc7gx4atzfyq>
Date: Sun, 15 Jun 2025 22:54:37 +0300
From: Fedor Pchelkin <pchelkin@...ras.ru>
To: Christian Lamparter <chunkeey@...il.com>
Cc: Dmitry Antipov <dmantipov@...dex.ru>, linux-wireless@...r.kernel.org,
Christian Lamparter <chunkeey@...glemail.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] wifi: carl9170: do not ping device which has failed to
load firmware
It's better to be back to the original thread. Sorry, I misclicked and
forgot to put In-Reply-To firstly.
That part of the broken thread:
https://lore.kernel.org/linux-wireless/y4ufvifcearf75qds5hlro3rfiadwfwlixz5xg3w6jjozk5sdg@7yyfsdvyehon/T/#u
Quoting your last reply there..
On Sat, 14 Jun 2025 18:33:28 +0200, Christian Lamparter wrote:
> Hi,
>
> On 6/13/25 10:19 PM, Fedor Pchelkin wrote:
> > Dmitry Antipov wrote:
> >> Syzkaller reports [1, 2] crashes caused by an attempts to ping
> >> the device which has failed to load firmware. Since such a device
> >> doesn't pass 'ieee80211_register_hw()', an internal workqueue
> >> managed by 'ieee80211_queue_work()' is not yet created and an
> >> attempt to queue work on it causes null-ptr-deref.
> >>
> >> [1] https://syzkaller.appspot.com/bug?extid=9a4aec827829942045ff
> >> [2] https://syzkaller.appspot.com/bug?extid=0d8afba53e8fb2633217
> >> Fixes: e4a668c59080 ("carl9170: fix spurious restart due to high latency")
> >> Signed-off-by: Dmitry Antipov <dmantipov@...dex.ru>
> >> ---
> >> drivers/net/wireless/ath/carl9170/usb.c | 15 ++++++++++-----
> >> 1 file changed, 10 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/net/wireless/ath/carl9170/usb.c b/drivers/net/wireless/ath/carl9170/usb.c
> >> index a3e03580cd9f..a0bfa0c477ee 100644
> >> --- a/drivers/net/wireless/ath/carl9170/usb.c
> >> +++ b/drivers/net/wireless/ath/carl9170/usb.c
> >> @@ -438,13 +438,18 @@ static void carl9170_usb_rx_complete(struct urb *urb)
> >>
> >> if (atomic_read(&ar->rx_anch_urbs) == 0) {
> >> /*
> >> - * The system is too slow to cope with
> >> - * the enormous workload. We have simply
> >> - * run out of active rx urbs and this
> >> - * unfortunately leads to an unpredictable
> >> - * device.
> >> + * At this point, either the system is too slow to
> >> + * cope with the enormous workload (so we have simply
> >> + * run out of active rx urbs and this unfortunately
> >> + * leads to an unpredictable device), or the device
> >> + * is not fully functional after an unsuccessful
> >> + * firmware loading attempts (so it doesn't pass
> >> + * ieee80211_register_hw() and there is no internal
> >> + * workqueue at all).
> >> */
> >>
> >> + if (WARN_ON_ONCE(!ar->registered))
> >> + return;
> >
> > Is WARN justifiable here if it concerns handling a predefined error
> > condition?
>
> The driver has many more WARN_ON_(ONCE). Most of them are from "back in the day". I think
> carl9170 predates Syzkaller by something like 5 years or less.
Just the fact of presence of other WARNs in the driver is not usually a
pretext to add the new ones. WARNs should be used to point out that
something is going wrong on the kernel level, i.e. be indicators of
kernel bugs (BUG and BUG_ONs are discouraged for the new code, if I
remember correctly).
>
> In this case, it would be good to know if this only happens with syzkaller, or with some
> dogy device (be it the hci, or maybe the ar9170 device itself - they are getting old by now).
> I mean Garbage In => Garbage Out. But yes, it shouldn't crash.
The patch description lacks details on why Syzkaller does ever hit such a
situation, even taking into account that fuzzers love to exaggerate the
likelihood of hitting some weird and impossible-to-hit-in-practice stuff.
Dmitry, if you'd like to, please give some comments on the following..
The path in question is executed when carl9170_usb_submit_rx_urb() fails.
I didn't check the repro, only judging by driver code and crash report,
but presumably it fails due to usb_submit_urb() returning some -ENODEV
because the device is disconnected concurrently. usb_submit_urb() errors
lead to &ar->rx_anch_urbs still remaining to have a zero value.
I think the key reason of the crash is that the device disconnection and
the URB completion callback - carl9170_usb_rx_complete() where the crash
occurs - have a race.
The logs say it's disconnected just before the crash:
[ 87.100458][ T2990] usb 4-1: USB disconnect, device number 11
[ 87.101369][ C1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000038: 0000 [#1] SMP KASAN PTI
[ 87.101407][ C1] KASAN: null-ptr-deref in range [0x00000000000001c0-0x00000000000001c7]
So it looks like ar->registered being false here is a "correct" failure
condition, i.e. it can be expected when the certain phase of the driver
initialization fails and should be handled without any WARNs.
>
> > I mean, yeah, it avoids a crash in the completion handler but kernels
> > with panic_on_warn - the ones which Syzkaller runs - will still stumble
> > here for no reason.
>
> I bet there is already a precedence for this, if someone knows it or has previous decisions:
> Please join in!
>
> Regards,
> Christian
Powered by blists - more mailing lists