lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 04 Aug 2022 23:07:34 +0100
From:   James Hogan <jhogan@...nel.org>
To:     Paul Menzel <pmenzel@...gen.mpg.de>
Cc:     Tony Nguyen <anthony.l.nguyen@...el.com>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        intel-wired-lan@...ts.osuosl.org,
        Sasha Neftin <sasha.neftin@...el.com>,
        Aleksandr Loktionov <aleksandr.loktionov@...el.com>,
        netdev@...r.kernel.org
Subject: Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset

On Thursday, 4 August 2022 22:41:01 BST James Hogan wrote:
> On Thursday, 4 August 2022 14:27:24 BST Paul Menzel wrote:
> > Am 04.08.22 um 15:03 schrieb James Hogan:
> > > On Thursday, 28 July 2022 18:36:31 BST James Hogan wrote:
> > >> On Wednesday, 27 July 2022 15:37:09 BST Vinicius Costa Gomes wrote:
> > >>> Yeah, I agree that it seems like the issue is something else. I would
> > >>> suggest start with the "simple" things, enabling
> > >>> 'CONFIG_PROVE_LOCKING'
> > >>> and looking at the first splat, it could be that what you are seeing
> > >>> is
> > >>> caused by a deadlock somewhere else.
> > >> 
> > >> This is revealing I think (re-enabled PCIE_PTM and enabled
> > >> PROVE_LOCKING).
> > >> 
> > >> In this case it happened within minutes of boot, but a few previous
> > >> attempts with several suspend cycles with the same kernel didn't detect
> > >> the same thing.
> > > 
> > > I hate to nag, but any thoughts on the lockdep recursive locking warning
> > > below? It seems to indicate a recursive taking of rtnl_mutex in
> > > dev_ethtool
> > > and igc_resume, which would certainly seem to point the finger squarely
> > > back at the igc driver.
> > 
> > I hope, the developers will respond quickly. If it is indeed a
> > regression, and you do not want to wait for the developers, you could
> > try to bisect the issue. To speed up the test cycles, I recommend to try
> > to try to reproduce the issue in QEMU/KVM and passing through the
> > network device.
> 
> Unfortunately its new hardware for me, so I don't know if there's a good
> working version of the driver. I've only had constant pain with it so far.
> Frequent failed resumes, hangs on shutdown.
> 
> However I just did a bit more research and found these dead threads from a
> year ago which appear to pinpoint the issue:
> https://lore.kernel.org/all/20210420075406.64105-1-acelan.kao@canonical.com/
> https://lore.kernel.org/all/20210809032809.1224002-1-acelan.kao@canonical.c
> om/

And I just found this patch from December which may have been masked by the 
PTM issues:
https://lore.kernel.org/netdev/20211201185731.236130-1-vinicius.gomes@intel.com/

I'll build and run with that for a few days and see how it goes.

Cheers
James


Powered by blists - more mailing lists