lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231018094435.GA21090@wunner.de>
Date:   Wed, 18 Oct 2023 11:44:35 +0200
From:   Lukas Wunner <lukas@...ner.de>
To:     Ricky WU <ricky_wu@...ltek.com>
Cc:     Kai-Heng Feng <kai.heng.feng@...onical.com>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
        "linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
        Kees Cook <keescook@...omium.org>,
        Tony Luck <tony.luck@...el.com>,
        "Guilherme G. Piccoli" <gpiccoli@...lia.com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] PCI: pciehp: Prevent child devices from doing RPM on
 PCIe Link Down

[cc -= unrelated mailing lists bpf, kernel-hardening]

On Tue, Oct 17, 2023 at 10:25:39AM +0000, Ricky WU wrote:
> > On Mon, Oct 16, 2023 at 12:01:31PM +0800, Kai-Heng Feng wrote:
> > > When inserting an SD7.0 card to Realtek card reader, it can trigger
> > > PCI slot Link down and causes the following error:
> > 
> > Why does *inserting* a card cause a Link Down?
> > 
> > 
> > > [   63.898861] pcieport 0000:00:1c.0: pciehp: Slot(8): Link Down
> > > [   63.912118] BUG: unable to handle page fault for address:
> > ffffb24d403e5010
> > [...]
> > > [   63.912198]  ? asm_exc_page_fault+0x27/0x30
> > > [   63.912203]  ? ioread32+0x2e/0x70
> > > [   63.912206]  ? rtsx_pci_write_register+0x5b/0x90 [rtsx_pci]
> > > [   63.912217]  rtsx_set_l1off_sub+0x1c/0x30 [rtsx_pci]
> > > [   63.912226]  rts5261_set_l1off_cfg_sub_d0+0x36/0x40 [rtsx_pci]
> > > [   63.912234]  rtsx_pci_runtime_idle+0xc7/0x160 [rtsx_pci]
> > > [   63.912243]  ? __pfx_pci_pm_runtime_idle+0x10/0x10
> > > [   63.912246]  pci_pm_runtime_idle+0x34/0x70
> > > [   63.912248]  rpm_idle+0xc4/0x2b0
> > > [   63.912251]  pm_runtime_work+0x93/0xc0
> > > [   63.912254]  process_one_work+0x21a/0x430
> > > [   63.912258]  worker_thread+0x4a/0x3c0
> > 
> > This looks like pcr->remap_addr is accessed after it has been iounmap'ed in
> > rtsx_pci_remove() or before it has been iomap'ed in rtsx_pci_probe().
> > 
> > Is the card reader itself located below a hotplug port and unplugged here?
> 
> Yes it is card reader unplug itself, because sd7.0 card is not used
> rtsx_pcr, it use nvme driver

I can only guess here as the dmesg and lspci output I requested
hasn't been provided:

I assume that this card reader may contain a PCIe switch with a
regular SD card reader below a first Downstream Port (which is
hotplug-capable).  If an SD express card is inserted, the regular
SD card reader disappears from the Downstream Port and the
inserted card appears as an NVMe drive (possibly below a
second Downstream Port, or below the first Downstream Port).

The commit message is somewhat misleading in that it links the
unhandled page fault to card insertion.  That may be the trigger,
but the root cause appears to be that a runtime PM request is
performed asynchronously after the SD card reader has iounmap'ed
pcr->remap_addr.

If that is indeed the root cause (as I suspect), the fix needs to
be placed somewhere else.

pciehp is only one of several hotplug drivers and fixing this only
in pciehp may leave the other hotplug drivers vulnerable to the same
issue.

Unfortunately the information provided so far is insufficient to
recommend a better fix.  It would be necessary to debug why the
asynchronous RPM request is not canceled even though rtsx_pci_remove()
runtime resumes the device before iounmap'ing pcr->remap_addr.
Perhaps there's a bug in runtime PM code that causes asynchronous
requests to not be canceled upon a pm_runtime_get_sync()?
Or perhaps a wmb() is necessary in pci_device_remove() after setting
pci_dev->driver = NULL?

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ