netdev - Re: [PATCH] net: usbnet: Avoid potential RCU stall on LINK

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250718161825.65912e37@kernel.org>
Date: Fri, 18 Jul 2025 16:18:25 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: John Ernberg <john.ernberg@...ia.se>
Cc: Oliver Neukum <oneukum@...e.com>, Andrew Lunn <andrew+netdev@...n.ch>,
 "David S . Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Ming Lei
 <ming.lei@...onical.com>, "netdev@...r.kernel.org"
 <netdev@...r.kernel.org>, "linux-usb@...r.kernel.org"
 <linux-usb@...r.kernel.org>, "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>, "stable@...r.kernel.org"
 <stable@...r.kernel.org>
Subject: Re: [PATCH] net: usbnet: Avoid potential RCU stall on LINK_CHANGE
 event

On Fri, 18 Jul 2025 09:07:26 +0000 John Ernberg wrote:
> > Thanks for the analysis, I think I may have misread the code.
> > What I was saying is that we are restoring the carrier while
> > we are still processing the previous carrier off event in
> > the workqueue. My thinking was that if we deferred the
> > netif_carrier_on() to the workqueue this race couldn't happen.
> > 
> > usbnet_bh() already checks netif_carrier_ok() - we're kinda duplicating
> > the carrier state with this RX_PAUSED workaround.
> > 
> > I don't feel strongly about this, but deferring the carrier_on()
> > the the workqueue would be a cleaner solution IMO.
> >   
> 
> I've been thinking about this idea, but I'm concerned for the opposite 
> direction. I cannot think of a way to fully guarantee that the carrier 
> isn't turned on again incorrectly if an off gets queued.
> 
> The most I came up with was adding an extra flag bit to set carrier on, 
> and then test_and_clear_bit() it in the __handle_link_change() function.
> And also clear_bit() in the usbnet_link_change() function if an off 
> arrives. I cannot convince myself that there isn't a way for that to go 
> sideways. But perhaps that would be robust enough?

I think it should be robust enough.. Unless my grep skills are failing
me - no drivers which call usbnet_link_change() twiddle the link state
directly.

Give it a go, if you think your initial patch is cleaner -- it's fine.

> I've also considered the possibility of just not re-submitting the INTR 
> poll URB until the last one was fully processed when handling a link 
> change. But that might cause havoc with ASIX and Sierra devices as they 
> are calling usbnet_link_change() in other ways than through the 
> .status-callback. I don't have any of these devices so I cannot test 
> them for regressions. So this path feels quite dangerous.
> With a sub-driver property to enable this behavior it might work out?

Yeah, that seems more involved.