linux-kernel - Re: [Intel-wired-lan] [RFC iwl-net] e1000: Hold RTNL when e1000

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6045c48d-f6cc-4961-819e-917933c3e466@intel.com>
Date: Tue, 22 Oct 2024 14:14:02 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Joe Damato <jdamato@...tly.com>, <netdev@...r.kernel.org>,
	<dmantipov@...dex.ru>, Tony Nguyen <anthony.l.nguyen@...el.com>, "Przemek
 Kitszel" <przemyslaw.kitszel@...el.com>, Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, "Simon
 Horman" <horms@...nel.org>, "moderated list:INTEL ETHERNET DRIVERS"
	<intel-wired-lan@...ts.osuosl.org>, open list <linux-kernel@...r.kernel.org>
Subject: Re: [Intel-wired-lan] [RFC iwl-net] e1000: Hold RTNL when e1000_down
 can be called



On 10/22/2024 1:00 PM, Joe Damato wrote:
> On Tue, Oct 22, 2024 at 05:21:53PM +0000, Joe Damato wrote:
>> e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
>>
>> There are a few paths for e1000_down to be called in e1000 where RTNL is
>> not currently being held:
>>   - e1000_shutdown (pci shutdown)
>>   - e1000_suspend (power management)
>>   - e1000_reinit_locked (via e1000_reset_task delayed work)
>>
>> Hold RTNL in two places to fix this issue:
>>   - e1000_reset_task
>>   - __e1000_shutdown (which is called from both e1000_shutdown and
>>     e1000_suspend).
> 
> It looks like there's one other spot I missed:
> 
> e1000_io_error_detected (pci error handler) which should also hold
> rtnl_lock:
> 
> +       if (netif_running(netdev)) {
> +               rtnl_lock();
>                 e1000_down(adapter);
> +               rtnl_unlock();
> +       }
> 
> I can send that update in the v2, but I'll wait to see if Intel has suggestions
> on the below.
>  
>> The other paths which call e1000_down seemingly hold RTNL and are OK:
>>   - e1000_close (ndo_stop)
>>   - e1000_change_mtu (ndo_change_mtu)
>>
>> I'm submitting this is as an RFC because:
>>   - the e1000_reinit_locked issue appears very similar to commit
>>     21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which
>>     fixes a similar issue in e1000e
>>
>> however
>>
>>   - adding rtnl to e1000_reinit_locked seemingly conflicts with an
>>     earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in
>>     e1000_reset_task").
>>
>> Hopefully Intel can weigh in and shed some light on the correct way to
>> go.
>>

>From my review, I think we need the RTNL lock around this function. The
deadlocks mentions in the fix lockdep patch appear to be due to having
an *extra* lock which could then cause issues.

>> Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs")
>> Reported-by: Dmitry Antipov <dmantipov@...dex.ru>
>> Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/
>> Signed-off-by: Joe Damato <jdamato@...tly.com>