lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231127152641.GA5149@wunner.de>
Date:   Mon, 27 Nov 2023 16:26:41 +0100
From:   Lukas Wunner <lukas@...ner.de>
To:     Vidya Sagar <vidyas@...dia.com>
Cc:     Bjorn Helgaas <helgaas@...nel.org>,
        Lorenzo Pieralisi <lpieralisi@...nel.org>,
        Sathyanarayanan Kuppuswamy 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        "kbusch@...nel.org" <kbusch@...nel.org>,
        Vikram Sethi <vsethi@...dia.com>,
        Krishna Thota <kthota@...dia.com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        sagar.tv@...il.com
Subject: Re: Race b/w pciehp and FirmwareFirst DPC->EDR flow - Reg

Hi Vidya,

sorry for the delay, still catching up on e-mails after Plumbers...

On Fri, Nov 10, 2023 at 10:31:55PM +0530, Vidya Sagar wrote:
> > - System doesn't have support for in-band PD and supports only OOB PD
> >   where writing to a private register would set the PD state

We already have an inband_presence_disabled flag in struct controller
which is set if the In-Band PD Disable Supported bit in the Slot
Capabilities 2 Register is set.  The flag may also be set through the
inband_presence_disabled_dmi_table[].

Currently the only place where the flag makes a difference is on
slot bringup:  pciehp_check_link_status() doesn't wait for the
Presence Detect Status bit to become set.

I'm wondering if we need to also disregard PDC events if In-Band PD
is disabled.  Not sure if the behavior you're seeing is caused by a
quirk of the hardware or is expected if In-Band PD is disabled.
Probably the former.  A code change would generally only be acceptable
in the latter case though I think.


> > 10. Since PDC (Presence Detect Change) bit is also set for the first
> >     interrupt, IST attempts to remove the devices (as part of
> >     pciehp_handle_presence_or_link_change())
> > 
> > At this point, there is a race between the device driver that is
> > trying to work with the device (through pci_error_handlers callback)
> > and the IST that is trying to remove the device.
> > To be fair to pciehp_handle_presence_or_link_change(), after removing
> > the devices, it checks for the link-up/PD being '1' and scans the
> > devices again if the device is still available. But unfortunately,
> > IST is deadlocked (with the device driver) while removing the devices
> > itself and won't go to the next step.

Could you provide stacktraces of the two deadlocked tasks?
Right now I don't quite understand why they're deadlocked.

Are you getting hung task messages in dmesg?
They should include stacktraces.

Also, which kernel version are we talking about?

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ