linux-kernel - Re: [PATCH 2/5 V5] PCI: pciehp: check and wait port status out of DPC before handling DLLSC and PDC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKF3qh2a+1E9yahNT9DuTmOkW-Fn+k=kakXkk_VKxZmrUcqPqA@mail.gmail.com>
Date:   Mon, 28 Sep 2020 16:49:44 +0800
From:   Ethan Zhao <xerces.zhao@...il.com>
To:     Andy Shevchenko <andy.shevchenko@...il.com>
Cc:     Ethan Zhao <haifeng.zhao@...el.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>, Oliver <oohall@...il.com>,
        ruscur@...sell.cc, Lukas Wunner <lukas@...ner.de>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Stuart Hayes <stuart.w.hayes@...il.com>,
        Alexandru Gagniuc <mr.nuke.me@...il.com>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        linux-pci <linux-pci@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        pei.p.jia@...el.com, ashok.raj@...ux.intel.com,
        Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@...el.com>
Subject: Re: [PATCH 2/5 V5] PCI: pciehp: check and wait port status out of DPC
 before handling DLLSC and PDC

On Mon, Sep 28, 2020 at 4:45 PM Andy Shevchenko
<andy.shevchenko@...il.com> wrote:
>
> On Mon, Sep 28, 2020 at 7:10 AM Ethan Zhao <haifeng.zhao@...el.com> wrote:
>
> We didn't settle on the v4, why v5?

We could fix it with v6, v5 is used to fix other things.

>
> > When root port has DPC capability and it is enabled, then triggered by
> > errors, DPC DLLSC and PDC interrupts will be sent to DPC driver, pciehp
> > driver at the same time.
> > That will cause following result:
> >
> > 1. Link and device are recovered by hardware DPC and software DPC driver,
> >    device
> >    isn't removed, but the pciehp might treat it as device was hot removed.
> >
> > 2. Race condition happens bettween pciehp_unconfigure_device() called by
> >    pciehp_ist() in pciehp driver and pci_do_recovery() called by
> >    dpc_handler in DPC driver. no luck, there is no lock to protect
> >    pci_stop_and_remove_bus_device()
> >    against pci_walk_bus(), they hold different samphore and mutex,
> >    pci_stop_and_remove_bus_device holds pci_rescan_remove_lock, and
> >    pci_walk_bus() holds pci_bus_sem.
> >
> > This race condition is not purely code analysis, it could be triggered by
> > following command series:
> >
> >   # setpci -s 64:02.0 0x196.w=000a // 64:02.0 rootport has DPC capability
> >   # setpci -s 65:00.0 0x04.w=0544  // 65:00.0 NVMe SSD populated in port
> >   # mount /dev/nvme0n1p1 nvme
> >
> > One shot will cause system panic and NULL pointer reference happened.
> > (tested on stable 5.8 & ICS(Ice Lake SP platform, see
> > https://en.wikichip.org/wiki/intel/microarchitectures/ice_lake_(server))
>
> ...
>
> >    Buffer I/O error on dev nvme0n1p1, logical block 3328, async page read
> >    BUG: kernel NULL pointer dereference, address: 0000000000000050
> >    #PF: supervisor read access in kernel mode
> >    #PF: error_code(0x0000) - not-present page
> >    PGD 0
> >    Oops: 0000 [#1] SMP NOPTI
> >    CPU: 12 PID: 513 Comm: irq/124-pcie-dp Not tainted 5.8.0 el8.x86_64+ #1
> >    RIP: 0010:report_error_detected.cold.4+0x7d/0xe6
> >    Code: b6 d0 e8 e8 fe 11 00 e8 16 c5 fb ff be 06 00 00 00 48 89 df e8 d3
> >    65 ff ff b8 06 00 00 00 e9 75 fc ff ff 48 8b 43 68 45 31 c9 <48> 8b 50
> >    50 48 83 3a 00 41 0f 94 c1 45 31 c0 48 85 d2 41 0f 94 c0
> >    RSP: 0018:ff8e06cf8762fda8 EFLAGS: 00010246
> >    RAX: 0000000000000000 RBX: ff4e3eaacf42a000 RCX: ff4e3eb31f223c01
> >    RDX: ff4e3eaacf42a140 RSI: ff4e3eb31f223c00 RDI: ff4e3eaacf42a138
> >    RBP: ff8e06cf8762fdd0 R08: 00000000000000bf R09: 0000000000000000
> >    R10: 000000eb8ebeab53 R11: ffffffff93453258 R12: 0000000000000002
> >    R13: ff4e3eaacf42a130 R14: ff8e06cf8762fe2c R15: ff4e3eab44733828
> >    FS:  0000000000000000(0000) GS:ff4e3eab1fd00000(0000) knl
> >    GS:0000000000000000
> >    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >    CR2: 0000000000000050 CR3: 0000000f8f80a004 CR4: 0000000000761ee0
> >    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >    PKRU: 55555554
> >    Call Trace:
> >    ? report_normal_detected+0x20/0x20
> >    report_frozen_detected+0x16/0x20
> >    pci_walk_bus+0x75/0x90
> >    ? dpc_irq+0x90/0x90
> >    pcie_do_recovery+0x157/0x201
> >    ? irq_finalize_oneshot.part.47+0xe0/0xe0
> >    dpc_handler+0x29/0x40
> >    irq_thread_fn+0x24/0x60
> >    irq_thread+0xea/0x170
> >    ? irq_forced_thread_fn+0x80/0x80
> >    ? irq_thread_check_affinity+0xf0/0xf0
> >    kthread+0x124/0x140
> >    ? kthread_park+0x90/0x90
> >    ret_from_fork+0x1f/0x30
> >    Modules linked in: nft_fib_inet.........
> >    CR2: 0000000000000050
>
> Do not pollute the commit messages with irrelevant information.

That is whole panic, Andy, someone like the whole Oops, If I removed something,
Will someone say 'why not integral，why removed somehting  ‘
>
> --
> With Best Regards,
> Andy Shevchenko