[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <IA3PR11MB8986C779F51731B60D07BEB7E50EA@IA3PR11MB8986.namprd11.prod.outlook.com>
Date: Wed, 10 Sep 2025 09:02:51 +0000
From: "Loktionov, Aleksandr" <aleksandr.loktionov@...el.com>
To: Kohei Enju <enjuk@...zon.com>, "Lifshits, Vitaly"
<vitaly.lifshits@...el.com>
CC: "andrew+netdev@...n.ch" <andrew+netdev@...n.ch>, "Nguyen, Anthony L"
<anthony.l.nguyen@...el.com>, "davem@...emloft.net" <davem@...emloft.net>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"kohei.enju@...il.com" <kohei.enju@...il.com>, "kuba@...nel.org"
<kuba@...nel.org>, "kurt@...utronix.de" <kurt@...utronix.de>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "pabeni@...hat.com"
<pabeni@...hat.com>, "Kitszel, Przemyslaw" <przemyslaw.kitszel@...el.com>
Subject: RE: [Intel-wired-lan] [PATCH v1 iwl-net] igc: unregister netdev when
igc_led_setup() fails in igc_probe()
> -----Original Message-----
> From: Kohei Enju <enjuk@...zon.com>
> Sent: Wednesday, September 10, 2025 9:52 AM
> To: Lifshits, Vitaly <vitaly.lifshits@...el.com>
> Cc: andrew+netdev@...n.ch; Nguyen, Anthony L
> <anthony.l.nguyen@...el.com>; davem@...emloft.net;
> edumazet@...gle.com; enjuk@...zon.com; intel-wired-
> lan@...ts.osuosl.org; kohei.enju@...il.com; kuba@...nel.org;
> kurt@...utronix.de; netdev@...r.kernel.org; pabeni@...hat.com;
> Kitszel, Przemyslaw <przemyslaw.kitszel@...el.com>; Loktionov,
> Aleksandr <aleksandr.loktionov@...el.com>
> Subject: Re: [Intel-wired-lan] [PATCH v1 iwl-net] igc: unregister
> netdev when igc_led_setup() fails in igc_probe()
>
> + Aleksandr
>
> On Wed, 10 Sep 2025 10:28:17 +0300, Lifshits, Vitaly wrote:
>
> >On 9/8/2025 9:26 AM, Kurt Kanzenbach wrote:
> >> On Sat Sep 06 2025, Kohei Enju wrote:
> >>> Currently igc_probe() doesn't unregister netdev when
> igc_led_setup()
> >>> fails, causing BUG_ON() in free_netdev() and then kernel panics.
> [1]
> >>>
> >>> This behavior can be tested using fault-injection framework. I
> used the
> >>> failslab feature to test the issue. [2]
> >>>
> >>> Call unregister_netdev() when igc_led_setup() fails to avoid the
> kernel
> >>> panic.
> >>>
> >>> [1]
> >>> kernel BUG at net/core/dev.c:12047!
> >>> Oops: invalid opcode: 0000 [#1] SMP NOPTI
> >>> CPU: 0 UID: 0 PID: 937 Comm: repro-igc-led-e Not tainted 6.17.0-
> rc4-enjuk-tnguy-00865-gc4940196ab02 #64 PREEMPT(voluntary)
> >>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-
> debian-1.16.3-2 04/01/2014
> >>> RIP: 0010:free_netdev+0x278/0x2b0
> >>> [...]
> >>> Call Trace:
> >>> <TASK>
> >>> igc_probe+0x370/0x910
> >>> local_pci_probe+0x3a/0x80
> >>> pci_device_probe+0xd1/0x200
> >>> [...]
> >>>
> >>> [2]
> >>> #!/bin/bash -ex
> >>>
> >>> FAILSLAB_PATH=/sys/kernel/debug/failslab/
> >>> DEVICE=0000:00:05.0
> >>> START_ADDR=$(grep " igc_led_setup" /proc/kallsyms \
> >>> | awk '{printf("0x%s", $1)}')
> >>> END_ADDR=$(printf "0x%x" $((START_ADDR + 0x100)))
> >>>
> >>> echo $START_ADDR > $FAILSLAB_PATH/require-start
> >>> echo $END_ADDR > $FAILSLAB_PATH/require-end
> >>> echo 1 > $FAILSLAB_PATH/times
> >>> echo 100 > $FAILSLAB_PATH/probability
> >>> echo N > $FAILSLAB_PATH/ignore-gfp-wait
> >>>
> >>> echo $DEVICE > /sys/bus/pci/drivers/igc/bind
> >>>
> >>> Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
> >>> Signed-off-by: Kohei Enju <enjuk@...zon.com>
> >>
> >> Reviewed-by: Kurt Kanzenbach <kurt@...utronix.de>
> >
> >Thank you for the patch and for identifying this issue!
> >
> >I was wondering whether we could avoid failing the probe in cases
> where
> >igc_led_setup fails. It seems to me that a failure in the LED class
> >functionality shouldn't prevent the device's core functionality from
> >working properly.
>
> Indeed, that also makes sense.
>
> The behavior that igc_probe() succeeds even if igc_led_setup() fails
> also seems good to me, as long as notifying users that igc's led
> functionality is not available.
>
> >
> > From what I understand, errors in this function are not due to
> hardware
> >malfunctions. Therefore, I suggest we remove the error propagation.
> >
> >Alternatively, if feasible, we could consider reordering the function
> >calls so that the LED class setup occurs before the netdev
> registration.
> >
>
> I don't disagree with you, but I would like to hear Kurt and
> Aleksandr's
> opinion. Do you have any preference or suggestions?
>
> I'll revise and work on v2 if needed.
> Thanks!
Just in case /*I'm Alex*/ here are my 2cents:
I’d treat LED setup as best‑effort and not fail probe if it errors.
Warn once, mark LEDs unavailable, and continue. That keeps datapath
up and avoids tricky probe unwind. If we still want to fail on LED errors,
then either (a) fix the unwind (unregister_netdev et al.) or (b) move LED setup before register_netdev().
If LED labels depend on the netdev name, it’s fine to run LED setup after register_netdev().
Since errors are non‑fatal, there’s no unwind complexity.
Keep igc_led_setup() returning an error for internal visibility, but don’t propagate it as probe failure:
err = igc_led_setup(adapter);
if (err) {
netdev_warn_once(netdev,
"LED init failed (%d); continuing without LED support\n",
err);
adapter->leds_available = false;
} else {
adapter->leds_available = true;
}
In remove()/error paths, guard teardown:
if (adapter->leds_available)
igc_led_teardown(adapter);
Keep current order but fully unwind on error:
err = igc_led_setup(adapter);
if (err) {
unregister_netdev(netdev);
/* del NAPI, free queues, etc. in reverse order */
err = -E...;
goto err_free;
}
With the best regards
Alex
Powered by blists - more mailing lists