[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f06d727-d424-44f8-bd80-53c452b289d3@kernel.org>
Date: Fri, 3 Nov 2023 13:48:30 +0100
From: Konrad Dybcio <konradybcio@...nel.org>
To: Hector Martin <marcan@...can.st>, iommu@...ts.linux.dev,
Asahi Linux <asahi@...ts.linux.dev>,
LKML <linux-kernel@...r.kernel.org>
Cc: Janne Grunau <j@...nau.net>, Rob Herring <robh+dt@...nel.org>,
Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Robin Murphy <robin.murphy@....com>,
Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
Subject: Re: Race between of_iommu_configure() and iommu_probe_device()
On 3.11.2023 12:55, Hector Martin wrote:
> I just hit a crash in of_iommu_xlate() -> apple_dart_of_xlate() because
> dev->iommu was NULL. of_iommu_xlate() first calls iommu_fwspec_init
> which calls dev_iommu_get(), which allocates that member if NULL. That
> means it got freed in between, but the only thing that can do that is
> dev_iommu_free(), which is called from __iommu_probe_device() in the
> error path. That is serialized via a static lock, but not against the
> xlate stuff.
>
> I think the specific sequence of events was as follows:
>
> - IOMMU driver has not probed yet
> - Device driver tries to probe, and gets deferred via of_iommu_xlate()
> -> driver_deferred_probe_check_state() because there are no IOMMU ops yet
> - IOMMU driver probes
> - IOMMU driver registration triggers device probes
> - IOMMU device probe fails, because there is no fwnode/OF data yet (e.g.
> apple_dart_probe_device returns ENODEV if dev_iommu_priv_get() returns
> NULL, and that is set in apple_dart_of_xlate())
> - __iommu_probe_device is in the error exit path, and at this exact
> point a parallel device probe is running of_iommu_xlate()
> - of_iommu_xlate() calls iommu_fwspec_init(), which ensures dev->iommu
> is non-NULL, which at this point it is
> - immediately after that, __iommu_probe_device() calls dev_iommu_free()
> since it is in the process of erroring out. This frees and sets
> dev->iommu to NULL.
> - of_iommu_xlate() calls ops->of_xlate()
> - apple_dart_of_xlate() calls dev_iommu_priv_set(), which crashes
> because dev->iommu is now NULL.
>
> As far as I can tell it's not just the specific driver xlate call
> setting priv that's the problem here, but there is one big race between
> the entire fwspec codepath (accessing dev->iommu->fwspec) and
> __iommu_probe_device() (allocating and freeing dev->iommu).
>
> Thinking about this whole thing is making my brain hurt. Thoughts? How
> do we fix this?
FWIW I've been getting inexplicable boot-time crashes that sometimes
spew out a fraction of a log line like:
[x.yyyyyyyy] addr.iommu
on some Qualcomm devices every now and then for quite some time..
Not very common though. Might be this, might be something else..
Konrad
Powered by blists - more mailing lists