lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250910075231.99838-1-enjuk@amazon.com>
Date: Wed, 10 Sep 2025 16:52:23 +0900
From: Kohei Enju <enjuk@...zon.com>
To: <vitaly.lifshits@...el.com>
CC: <andrew+netdev@...n.ch>, <anthony.l.nguyen@...el.com>,
	<davem@...emloft.net>, <edumazet@...gle.com>, <enjuk@...zon.com>,
	<intel-wired-lan@...ts.osuosl.org>, <kohei.enju@...il.com>,
	<kuba@...nel.org>, <kurt@...utronix.de>, <netdev@...r.kernel.org>,
	<pabeni@...hat.com>, <przemyslaw.kitszel@...el.com>,
	<aleksandr.loktionov@...el.com>
Subject: Re: [Intel-wired-lan] [PATCH v1 iwl-net] igc: unregister netdev when igc_led_setup() fails in igc_probe()

+ Aleksandr

On Wed, 10 Sep 2025 10:28:17 +0300, Lifshits, Vitaly wrote:

>On 9/8/2025 9:26 AM, Kurt Kanzenbach wrote:
>> On Sat Sep 06 2025, Kohei Enju wrote:
>>> Currently igc_probe() doesn't unregister netdev when igc_led_setup()
>>> fails, causing BUG_ON() in free_netdev() and then kernel panics. [1]
>>>
>>> This behavior can be tested using fault-injection framework. I used the
>>> failslab feature to test the issue. [2]
>>>
>>> Call unregister_netdev() when igc_led_setup() fails to avoid the kernel
>>> panic.
>>>
>>> [1]
>>>   kernel BUG at net/core/dev.c:12047!
>>>   Oops: invalid opcode: 0000 [#1] SMP NOPTI
>>>   CPU: 0 UID: 0 PID: 937 Comm: repro-igc-led-e Not tainted 6.17.0-rc4-enjuk-tnguy-00865-gc4940196ab02 #64 PREEMPT(voluntary)
>>>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
>>>   RIP: 0010:free_netdev+0x278/0x2b0
>>>   [...]
>>>   Call Trace:
>>>    <TASK>
>>>    igc_probe+0x370/0x910
>>>    local_pci_probe+0x3a/0x80
>>>    pci_device_probe+0xd1/0x200
>>>   [...]
>>>
>>> [2]
>>>   #!/bin/bash -ex
>>>
>>>   FAILSLAB_PATH=/sys/kernel/debug/failslab/
>>>   DEVICE=0000:00:05.0
>>>   START_ADDR=$(grep " igc_led_setup" /proc/kallsyms \
>>>           | awk '{printf("0x%s", $1)}')
>>>   END_ADDR=$(printf "0x%x" $((START_ADDR + 0x100)))
>>>
>>>   echo $START_ADDR > $FAILSLAB_PATH/require-start
>>>   echo $END_ADDR > $FAILSLAB_PATH/require-end
>>>   echo 1 > $FAILSLAB_PATH/times
>>>   echo 100 > $FAILSLAB_PATH/probability
>>>   echo N > $FAILSLAB_PATH/ignore-gfp-wait
>>>
>>>   echo $DEVICE > /sys/bus/pci/drivers/igc/bind
>>>
>>> Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
>>> Signed-off-by: Kohei Enju <enjuk@...zon.com>
>> 
>> Reviewed-by: Kurt Kanzenbach <kurt@...utronix.de>
>
>Thank you for the patch and for identifying this issue!
>
>I was wondering whether we could avoid failing the probe in cases where 
>igc_led_setup fails. It seems to me that a failure in the LED class 
>functionality shouldn't prevent the device's core functionality from 
>working properly.

Indeed, that also makes sense.

The behavior that igc_probe() succeeds even if igc_led_setup() fails
also seems good to me, as long as notifying users that igc's led
functionality is not available.

>
> From what I understand, errors in this function are not due to hardware 
>malfunctions. Therefore, I suggest we remove the error propagation.
>
>Alternatively, if feasible, we could consider reordering the function 
>calls so that the LED class setup occurs before the netdev registration.
>

I don't disagree with you, but I would like to hear Kurt and Aleksandr's
opinion. Do you have any preference or suggestions?

I'll revise and work on v2 if needed.
Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ