lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49E21E8A.2040005@gmail.com>
Date:	Sun, 12 Apr 2009 11:02:02 -0600
From:	Robert Hancock <hancockrwd@...il.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Szabolcs Szakacsits <szaka@...s-3g.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Grant Grundler <grundler@...gle.com>,
	Linux IDE mailing list <linux-ide@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: Implementing NVMHCI...

Linus Torvalds wrote:
> 
> On Sun, 12 Apr 2009, Szabolcs Szakacsits wrote:
>> I did not hear about NTFS using >4kB sectors yet but technically 
>> it should work.
>>
>> The atomic building units (sector size, block size, etc) of NTFS are 
>> entirely parametric. The maximum values could be bigger than the 
>> currently "configured" maximum limits. 
> 
> It's probably trivial to make ext3 support 16kB blocksizes (if it doesn't 
> already).
> 
> That's not the problem. The "filesystem layout" part is just a parameter.
> 
> The problem is then trying to actually access such a filesystem, in 
> particular trying to write to it, or trying to mmap() small chunks of it. 
> The FS layout is the trivial part.
> 
>> At present the limits are set in the BIOS Parameter Block in the NTFS
>> Boot Sector. This is 2 bytes for the "Bytes Per Sector" and 1 byte for 
>> "Sectors Per Block". So >4kB sector size should work since 1993.
>>
>> 64kB+ sector size could be possible by bootstrapping NTFS drivers 
>> in a different way. 
> 
> Try it. And I don't mean "try to create that kind of filesystem". Try to 
> _use_ it. Does Window actually support using it it, or is it just a matter 
> of "the filesystem layout is _specified_ for up to 64kB block sizes"?
> 
> And I really don't know. Maybe Windows does support it. I'm just very 
> suspicious. I think there's a damn good reason why NTFS supports larger 
> block sizes in theory, BUT EVERYBODY USES A 4kB BLOCKSIZE DESPITE THAT!

I can't find any mention that any formattable block size can't be used, 
other than the fact that "The maximum default cluster size under Windows 
NT 3.51 and later is 4K due to the fact that NTFS file compression is 
not possible on drives with a larger allocation size. So format will 
never use larger than 4k clusters unless the user specifically overrides 
the defaults".

It could be there are other downsides to >4K cluster sizes as well, but 
that's the reason they state.

What about FAT? It supports cluster sizes up to 32K at least (possibly 
up to 256K as well, although somewhat nonstandard), and that works.. We 
support that in Linux, don't we?

> 
> Because it really is a hard problem. It's really pretty nasty to have your 
> cache blocking be smaller than the actual filesystem blocksize (the other 
> way is much easier, although it's certainly not pleasant either - Linux 
> supports it because we _have_ to, but sector-size of hardware had 
> traditionally been 4kB, I'd certainly also argue against adding complexity 
> just to make it smaller, the same way I argue against making it much 
> larger).
> 
> And don't get me wrong - we could (fairly) trivially make the 
> PAGE_CACHE_SIZE be bigger - even eventually go so far as to make it a 
> per-mapping thing, so that you could have some filesystems with that 
> bigger sector size and some with smaller ones. I think Andrea had patches 
> that did a fair chunk of it, and that _almost_ worked.
> 
> But it ABSOLUTELY SUCKS. If we did a 16kB page-cache-size, it would 
> absolutely blow chunks. It would be disgustingly horrible. Putting the 
> kernel source tree on such a filesystem would waste about 75% of all 
> memory (the median size of a source file is just about 4kB), so your page 
> cache would be effectively cut in a quarter for a lot of real loads.
> 
> And to fix up _that_, you'd need to now do things like sub-page 
> allocations, and now your page-cache size isn't even fixed per filesystem, 
> it would be per-file, and the filesystem (and the drievrs!) would hav to 
> handle the cases of getting those 4kB partial pages (and do r-m-w IO after 
> all if your hardware sector size is >4kB).
> 
> IOW, there are simple things we can do - but they would SUCK. And there 
> are really complicated things we could do - and they would _still_ SUCK, 
> plus now I pretty much guarantee that your system would also be a lot less 
> stable. 
> 
> It really isn't worth it. It's much better for everybody to just be aware 
> of the incredible level of pure suckage of a general-purpose disk that has 
> hardware sectors >4kB. Just educate people that it's not good. Avoid the 
> whole insane suckage early, rather than be disappointed in hardware that 
> is total and utter CRAP and just causes untold problems.
> 
> Now, for specialty uses, things are different. CD-ROM's have had 2kB 
> sector sizes for a long time, and the reason it was never as big of a 
> problem isn't that they are still smaller than 4kB - it's that they are 
> read-only, and use special filesystems. And people _know_ they are 
> special. Yes, even when you write to them, it's a very special op. You'd 
> never try to put NTFS on a CD-ROM, and everybody knows it's not a disk 
> replacement.
> 
> In _those_ kinds of situations, a 64kB block isn't much of a problem. We 
> can do read-only media (where "read-only" doesn't have to be absolute: the 
> important part is that writing is special), and never have problems. 
> That's easy. Almost all the problems with block-size go away if you think 
> reading is 99.9% of the load. 
> 
> But if you want to see it as a _disk_ (ie replacing SSD's or rotational 
> media), 4kB blocksize is the maximum sane one for Linux/x86 (or, indeed, 
> any "Linux/not-just-database-server" - it really isn't so much about x86, 
> as it is about large cache granularity causing huge memory fragmentation 
> issues).
> 
> 			Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ