[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20490639ea65ae09bec4ed33cf07a69b61935959.camel@kernel.crashing.org>
Date: Sat, 18 Aug 2018 13:41:02 +1000
From: Benjamin Herrenschmidt <benh@...nel.crashing.org>
To: Lukas Wunner <lukas@...ner.de>, Bjorn Helgaas <helgaas@...nel.org>
Cc: linux-pci@...r.kernel.org, Hari Vyas <hari.vyas@...adcom.com>,
Ray Jui <ray.jui@...adcom.com>,
Srinath Mannam <srinath.mannam@...adcom.com>,
Guenter Roeck <linux@...ck-us.net>,
Jens Axboe <axboe@...nel.dk>,
Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
Marta Rybczynska <mrybczyn@...ray.eu>,
Pierre-Yves Kerbrat <pkerbrat@...ray.eu>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/6] pci: Set pci_dev->is_added before calling
device_add
On Fri, 2018-08-17 at 20:15 +0200, Lukas Wunner wrote:
>
> The two steps (enumeration and driver attachment) are protected by
> pci_lock_rescan_remove(). Initially, when a pci_dev is newly allocated
> with kzalloc(), is_added is 0. The transition from 0 to 1 happens while
> under pci_lock_rescan_remove(). When that lock is released, is_added is
> always 1, barring a device_attach() error, in which case it will remain
> at 0.
>
> AFAICS, there is no second mutex needed to ensure synchronization of
> is_added, the existing mutex should be sufficient and the only problem
> are RMW races when accessing adjacent flags. Those were correctly addressed
> by Hari's patch. Benjamin seems to be alleging that concurrency issues
> exist beyond the RMW races, I fail to see them, sorry.
Im saying that fixing those issues using atomic bitops is a generally
unsafe practice even if it happens to work in a specific case.
In this one, I argue that the root problem was simply that is_added was
set in the wrong place. The "fix" from Hari touches 9 files, adds
horrible relative includes to an architecture and generally bloats
things while a single 3 liner would have been sufficient.
The current use of is_added is asymetric (it's cleared during
device_attach but set during detach) which is bogus and the entire race
stems from the fact that it is set after device_attach returns.
So setting it early is, imho, the right fix, is much simpler, and fixes
the odd imbalance we have to begin with.
Ben.
Powered by blists - more mailing lists