lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0912170916420.15740@localhost.localdomain>
Date:	Thu, 17 Dec 2009 09:27:56 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Alain Knaff <alain@...ff.lu>
cc:	markh@...pro.net, fdutils@...tils.linux.lu,
	linux-kernel@...r.kernel.org
Subject: Re: DMA cache consistency bug introduced in 2.6.28 (Was: Re: [Fdutils]
 Cannot format floppies under kernel 2.6.*?)



On Thu, 17 Dec 2009, Alain Knaff wrote:
> 
> 1. initial contents:  33 44 55 66
> 2. one DMA transfer is performed
> 3. program changes buffer to: 77 88 99 aa
> 4. new DMA transfer is performed => instead it transmits 33 88 99 aa
>    (i.e. first byte is from previous contents)
> 
> This used to work in 2.6.27.41, but broke in 2.6.28 . It doesn't happen on
> all hardware though.

Do you have a list of hardware it works on? Especially chipsets.

On x86, where all caches are supposed to be totally coherent (except for 
I$ under very special circumstances), the above should never be able to 
happen. At least not unless there is really buggy hardware involved.

> It does indeed seem to be related to a DMA-side cache (rather than the
> processor's cache not being flushed to main memory), as doing lots of
> memory intensive work (kernel compilation) between 2 and 3 doesn't fix the
> problem.

I'm not entirely surprised. Actual CPU bugs are pretty rare in the x86 
world. But chipset bugs? Another thing entirely. There are buffers and 
caches there, and those are sometimes software-visible. The most obvious 
case of that is just the IOMMU's themselves, but from your description I 
don't think you actually change the DMA _mappings_ do you? Just the 
actual buffer (that was then mapped earlier)?

So I don't think it's the IOMMU code itself necessarily, although an IOMMU 
may well be involved (eg I could easily see a few cachelines worth of 
actual DMA data caching going on in the whole IOMMU too)

And to some degree the floppy driver might be _more_ likely to see some 
kinds of bugs, because it uses that crazy legacy DMA engine. So it's not 
going to go through the regular PCI DMA hardware paths, it's going to go 
through its own special paths that nobody else uses any more (and thus has 
probably not had as much testing).

> In the diff between 2.6.27.41 and 2.6.28, I noticed a lot of changes in
> arch/x86/kernel/amd_iommu.c and related files, could any of these have
> triggered this behavior?

Could it have triggered? Sure. Chipset caches are often flushed by certain 
trivial operations (often the caches are small, and operations like "any 
PIO access" will make sure they are flushed). Different IOMMU flush 
patterns could easily account for it.

But I think we'd like to see a list of hardware where this can be 
triggered, and quite frankly, a 'git bisect' would be absolutely wonderful 
especially if the list of hardware is not showing any really obvious 
patterns (and I assume they aren't all _that_ obvious, or you'd have 
mentioned them).

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ