linux-kernel - Re: [RFC PATCH] nvme: avoid race-conditions when enabling devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180321154807.GD22254@ming.t460p>
Date:   Wed, 21 Mar 2018 23:48:09 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Marta Rybczynska <mrybczyn@...ray.eu>
Cc:     keith busch <keith.busch@...el.com>, axboe@...com, hch@....de,
        sagi@...mberg.me, linux-nvme@...ts.infradead.org,
        linux-kernel@...r.kernel.org, bhelgaas@...gle.com,
        linux-pci@...r.kernel.org, Pierre-Yves Kerbrat <pkerbrat@...ray.eu>
Subject: Re: [RFC PATCH] nvme: avoid race-conditions when enabling devices

On Wed, Mar 21, 2018 at 01:10:31PM +0100, Marta Rybczynska wrote:
> > On Wed, Mar 21, 2018 at 12:00:49PM +0100, Marta Rybczynska wrote:
> >> NVMe driver uses threads for the work at device reset, including enabling
> >> the PCIe device. When multiple NVMe devices are initialized, their reset
> >> works may be scheduled in parallel. Then pci_enable_device_mem can be
> >> called in parallel on multiple cores.
> >> 
> >> This causes a loop of enabling of all upstream bridges in
> >> pci_enable_bridge(). pci_enable_bridge() causes multiple operations
> >> including __pci_set_master and architecture-specific functions that
> >> call ones like and pci_enable_resources(). Both __pci_set_master()
> >> and pci_enable_resources() read PCI_COMMAND field in the PCIe space
> >> and change it. This is done as read/modify/write.
> >> 
> >> Imagine that the PCIe tree looks like:
> >> A - B - switch -  C - D
> >>                \- E - F
> >> 
> >> D and F are two NVMe disks and all devices from B are not enabled and bus
> >> mastering is not set. If their reset work are scheduled in parallel the two
> >> modifications of PCI_COMMAND may happen in parallel without locking and the
> >> system may end up with the part of PCIe tree not enabled.
> > 
> > Then looks serialized reset should be used, and I did see the commit
> > 79c48ccf2fe ("nvme-pci: serialize pci resets") fixes issue of 'failed
> > to mark controller state' in reset stress test.
> > 
> > But that commit only covers case of PCI reset from sysfs attribute, and
> > maybe other cases need to be dealt with in similar way too.
> > 
> 
> It seems to me that the serialized reset works for multiple resets of the
> same device, doesn't it? Our problem is linked to resets of different devices
> that share the same PCIe tree.

Given reset shouldn't be a frequent action, it might be fine to serialize all
reset from different devices.

Thanks,
Ming