lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <3186253.aeNJFYEL58@saruman> Date: Mon, 29 Aug 2022 09:16:33 +0100 From: James Hogan <jhogan@...nel.org> To: Vinicius Costa Gomes <vinicius.gomes@...el.com> Cc: Paul Menzel <pmenzel@...gen.mpg.de>, Tony Nguyen <anthony.l.nguyen@...el.com>, Jesse Brandeburg <jesse.brandeburg@...el.com>, netdev@...r.kernel.org, intel-wired-lan@...ts.osuosl.org, Sasha Neftin <sasha.neftin@...el.com>, Aleksandr Loktionov <aleksandr.loktionov@...el.com> Subject: Re: [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path On Saturday, 13 August 2022 18:18:25 BST James Hogan wrote: > On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote: > > James Hogan <jhogan@...nel.org> writes: > > > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote: > > >> It was reported a RTNL deadlock in the igc driver that was causing > > >> problems during suspend/resume. > > >> > > >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock > > >> caused by taking RTNL in RPM resume path"). > > >> > > >> Reported-by: James Hogan <jhogan@...nel.org> > > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@...el.com> > > >> --- > > >> Sorry for the noise earlier, my kernel config didn't have runtime PM > > >> enabled. > > > > > > Thanks for looking into this. > > > > > > This is identical to the patch I've been running for the last week. The > > > deadlock is avoided, however I now occasionally see an assertion from > > > netif_set_real_num_tx_queues due to the lock not being taken in some > > > cases > > > via the runtime_resume path, and a suspicious > > > rcu_dereference_protected() > > > warning (presumably due to the same issue of the lock not being taken). > > > See here for details: > > > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/ > > > > Oh, sorry. I missed the part that the rtnl assert splat was already > > using similar/identical code to what I got/copied from igb. > > > > So what this seems to be telling us is that the "fix" from igb is only > > hiding the issue, > > I suppose the patch just changes the assumption from "lock will never be > held on runtime resume path" (incorrect, deadlock) to "lock will always be > held on runtime resume path" (also incorrect, probably racy). > > > and we would need to remove the need for taking the > > RTNL for the suspend/resume paths in igc and igb? (as someone else said > > in that igb thread, iirc) > > (I'll defer to others on this. I'm pretty unfamiliar with networking code > and this particular lock.) I'd be great to have this longstanding issue properly fixed rather than having to carry a patch locally that may not be lock safe. Also, any tips for diagnosing the issue of the network link not coming back up after resume? I sometimes have to unload and reload the driver module to get it back again. Cheers James
Powered by blists - more mailing lists