linux-kernel - Re: Implementing NVMHCI...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.00.0904111524000.4583@localhost.localdomain>
Date:	Sat, 11 Apr 2009 15:33:28 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Grant Grundler <grundler@...gle.com>
cc:	Alan Cox <alan@...rguk.ukuu.org.uk>, Jeff Garzik <jeff@...zik.org>,
	Linux IDE mailing list <linux-ide@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: Implementing NVMHCI...

On Sat, 11 Apr 2009, Grant Grundler wrote:
> 
> Why does it matter what the sector size is?
> I'm failing to see what the fuss is about.
> 
> We've abstract the DMA mapping/SG list handling enough that the
> block size should make no more difference than it does for the
> MTU size of a network.

The VM is not ready or willing to do more than 4kB pages for any normal 
cacheing scheme.

> And the linux VM does handle bigger than 4k pages (several architectures
> have implemented it) - even if x86 only supports 4k as base page size.

4k is not just the "supported" base page size, it's the only sane one. 
Bigger pages waste memory like mad on any normal load due to 
fragmentation. Only basically single-purpose servers are worth doing 
bigger pages for.

> Block size just defines the granularity of the device's address space in
> the same way the VM base page size defines the Virtual address space.

.. and the point is, if you have granularity that is bigger than 4kB, you 
lose binary compatibility on x86, for example. The 4kB thing is encoded in 
mmap() semantics.

In other words, if you have sector size >4kB, your hardware is CRAP. It's 
unusable sh*t. No ifs, buts or maybe's about it.

Sure, we can work around it. We can work around it by doing things like 
read-modify-write cycles with bounce buffers (and where DMA remapping can 
be used to avoid the copy). Or we can work around it by saying that if you 
mmap files on such a a filesystem, your mmap's will have to have 8kB 
alignment semantics, and the hardware is only useful for servers.

Or we can just tell people what a total piece of shit the hardware is.

So if you're involved with any such hardware or know people who are, you 
might give people strong hints that sector sizes >4kB will not be taken 
seriously by a huge number of people. Maybe it's not too late to head the 
crap off at the pass.

Btw, this is not a new issue. Sandisk and some other totally clueless SSD 
manufacturers tried to convince people that 64kB access sizes were the 
RightThing(tm) to do. The reason? Their SSD's were crap, and couldn't do 
anything better, so they tried to blame software.

Then Intel came out with their controller, and now the same people who 
tried to sell their sh*t-for-brain SSD's are finally admittign that 
it was crap hardware.

Do you really want to go through that one more time?

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/