lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 12 Apr 2009 12:35:49 -0600
From:	Robert Hancock <hancockrwd@...il.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Szabolcs Szakacsits <szaka@...s-3g.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Grant Grundler <grundler@...gle.com>,
	Linux IDE mailing list <linux-ide@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: Implementing NVMHCI...

Linus Torvalds wrote:
> IOW, when you allocate a new 32kB cluster, you will have to allocate 8 
> pages to do IO on it (since you'll have to initialize the diskspace), but 
> you can still literally treat those pages as _individual_ pages, and you 
> can write them out in any order, and you can free them (and then look them 
> up) one at a time.
> 
> Notice? The cluster size really only ends up being a disk-space allocation 
> issue, not an issue for actually caching the end result or for the actual 
> size of the IO.

Right.. I didn't realize we were actually that smart (not writing out 
the entire cluster when dirtying one page) but I guess it makes sense.

> 
> The hardware sector size is very different. If you have a 32kB hardware 
> sector size, that implies that _all_ IO has to be done with that 
> granularity. Now you can no longer treat the eight pages as individual 
> pages - you _have_ to write them out and read them in as one entity. If 
> you dirty one page, you effectively dirty them all. You can not drop and 
> re-allocate pages one at a time any more.
> 
> 				Linus

I suspect that in this case trying to gang together multiple pages 
inside the VM to actually handle it this way all the way through would 
be insanity. My guess is the only way you could sanely do it is the 
read-modify-write approach when writing out the data (in the block layer 
maybe?) where the read can be optimized away if the pages for the entire 
hardware sector are already in cache or the write is large enough to 
replace the entire sector. I assume we already do this in the md code 
somewhere for cases like software RAID 5 with a stripe size of >4KB..

That obviously would have some performance drawbacks compared to a 
smaller sector size, but if the device is bound and determined to use 
bigger sectors internally one way or the other and the alternative is 
the drive does R-M-W internally to emulate smaller sectors - which for 
some devices seems to be the case - maybe it makes more sense to do it 
in the kernel if we have more information to allow us to do it more 
efficiently. (Though, at least on the normal ATA disk side of things, 4K 
is the biggest number I've heard tossed about for a future expanded 
sector size, but flash devices like this may be another story..)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ