lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 01 Feb 2013 11:31:59 -0700
From:	Shuah Khan <shuah.khan@...com>
To:	Joerg Roedel <joro@...tes.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	stable <stable@...r.kernel.org>,
	iommu@...ts.linux-foundation.org, shuahkhan@...il.com
Subject: Re: IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init()
 in Linux 3.4

On Fri, 2013-02-01 at 14:00 +0100, Joerg Roedel wrote:
> Hi Shuah,
> 
> On Thu, Jan 31, 2013 at 11:33:30AM -0700, Shuah Khan wrote:
> > Access to these ranges continues to work with no errors until AMD IOMMU
> > driver disables and re-enables IOMMU in enable_iommus(). These faults
> > don't persist and appear between the enable_iommus() call and before
> > amd_iommu_init() gets done printing "AMD-Vi: Lazy IO/TLB flushing
> > enabled" message.
> 
> Hmm, okay. I had a look into the v3.4 sources. This looks like a race
> condition. The IOMMUs are enabled in amd_iommu_init_hardware() but the
> unity-mapped regions are created later in amd_iommu_init_dma_ops(). This
> leaves a small window where the page-faults happen that you see.
> 
> But I am not sure why this doesn't hit on 3.7 and above. The race is
> still there. Anyway, definitly something that needs to be fixed.
> 

Hi Joerg,

Yes, 3.7 has the same window of opportunity for this race condition,
however I couldn't figure out why it doesn't happen on 3.7. On 3.7 the
window between amd_iommu_init_hardware() and amd_iommu_init_dma_ops()
might actually be wider than the window in 3.4.

I think understanding why it doesn't happen on 3.7 is probably key. On
3.6, I experimented with back-porting your Split device table
initialization patch (33f28c59e18d83fd2aeef258d211be66b9b80eb3) from 3.7
and the patch that moved iommu_init from subsys_initcall() to
arch_initcall() and that solved the problem on 3.6. I am attaching those
patches. I can't easily back-port either one of those to 3.4 though.

That experiment made me think that this problem has something to do with
when device_table gets initialized vs. dma_ops are initialized. However,
there is no change to when unity mapped regions are created in 3.4 and
3.7.

If you look at 3.4 initialization sequence closely, you will notice that
init_device_table() gets called before init_iommu_all() and
init_memory_definitions() get done.

Another big difference is 3.4 init_device_table() sets DEV_ENTRY_VALID,
and DEV_ENTRY_TRANSLATION bits way earlier than 3.7 and these bits get
set in init_device_table_dma() which is called much later in 3.7.

init_unity_mappings_for_device() has a strong dependency on pci
sub-system having been initialized. Is it possible to move it up closer
to amd_iommu_init_hardware()?

I have a system I can reproduce the problem easily and I have a tried
making a few changes to the initialization sequence, with no results.
Any thoughts what other changes should I be looking at to solve the
problem besides the ones I already tried.

Thanks,
-- Shuah






View attachment "0001-iommu-amd-delay-dma-init-right-before-dma_ops-are-in.patch" of type "text/x-patch" (2475 bytes)

View attachment "iommu-moving-initialization-earlier.patch" of type "text/x-patch" (374 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ