lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f06d727-d424-44f8-bd80-53c452b289d3@kernel.org>
Date:   Fri, 3 Nov 2023 13:48:30 +0100
From:   Konrad Dybcio <konradybcio@...nel.org>
To:     Hector Martin <marcan@...can.st>, iommu@...ts.linux.dev,
        Asahi Linux <asahi@...ts.linux.dev>,
        LKML <linux-kernel@...r.kernel.org>
Cc:     Janne Grunau <j@...nau.net>, Rob Herring <robh+dt@...nel.org>,
        Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
        Robin Murphy <robin.murphy@....com>,
        Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
Subject: Re: Race between of_iommu_configure() and iommu_probe_device()



On 3.11.2023 12:55, Hector Martin wrote:
> I just hit a crash in of_iommu_xlate() -> apple_dart_of_xlate() because
> dev->iommu was NULL. of_iommu_xlate() first calls iommu_fwspec_init
> which calls dev_iommu_get(), which allocates that member if NULL. That
> means it got freed in between, but the only thing that can do that is
> dev_iommu_free(), which is called from __iommu_probe_device() in the
> error path. That is serialized via a static lock, but not against the
> xlate stuff.
> 
> I think the specific sequence of events was as follows:
> 
> - IOMMU driver has not probed yet
> - Device driver tries to probe, and gets deferred via of_iommu_xlate()
> -> driver_deferred_probe_check_state() because there are no IOMMU ops yet
> - IOMMU driver probes
> - IOMMU driver registration triggers device probes
> - IOMMU device probe fails, because there is no fwnode/OF data yet (e.g.
> apple_dart_probe_device returns ENODEV if dev_iommu_priv_get() returns
> NULL, and that is set in apple_dart_of_xlate())
> - __iommu_probe_device is in the error exit path, and at this exact
> point a parallel device probe is running of_iommu_xlate()
> - of_iommu_xlate() calls iommu_fwspec_init(), which ensures dev->iommu
> is non-NULL, which at this point it is
> - immediately after that, __iommu_probe_device() calls dev_iommu_free()
> since it is in the process of erroring out. This frees and sets
> dev->iommu to NULL.
> - of_iommu_xlate() calls ops->of_xlate()
> - apple_dart_of_xlate() calls dev_iommu_priv_set(), which crashes
> because dev->iommu is now NULL.
> 
> As far as I can tell it's not just the specific driver xlate call
> setting priv that's the problem here, but there is one big race between
> the entire fwspec codepath (accessing dev->iommu->fwspec) and
> __iommu_probe_device() (allocating and freeing dev->iommu).
> 
> Thinking about this whole thing is making my brain hurt. Thoughts? How
> do we fix this?
FWIW I've been getting inexplicable boot-time crashes that sometimes
spew out a fraction of a log line like:

[x.yyyyyyyy] addr.iommu

on some Qualcomm devices every now and then for quite some time..
Not very common though. Might be this, might be something else..

Konrad

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ