lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 29 Dec 2006 16:11:44 -0800 (PST)
From:	Linus Torvalds <torvalds@...l.org>
To:	Andrew Morton <akpm@...l.org>
cc:	Segher Boessenkool <segher@...nel.crashing.org>,
	David Miller <davem@...emloft.net>, nickpiggin@...oo.com.au,
	kenneth.w.chen@...el.com, guichaz@...oo.fr, hugh@...itas.com,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	ranma@...edrich.de, gordonfarquharson@...il.com,
	a.p.zijlstra@...llo.nl, tbm@...ius.com, arjan@...radead.org,
	andrei.popa@...eo.ro
Subject: Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)



On Fri, 29 Dec 2006, Andrew Morton wrote:
> 
> They're extra.  As in "can be optimised away".

Sure. Don't use buffer heads.

> The buffer_head is not an IO container.  It is the kernel's core
> representation of a disk block.

Please come back from the 90's.

The buffer heads are nothing but a mapping of where the hardware block is. 
If you use it for anything else, you're basically screwed.

> JBD implements physical block-based journalling, so it is 100% appropriate
> that JBD deal with these disk blocks using their buffer_head
> representation.

And as long as it does that, you just have to face the fact that it's 
going to perform like crap, including what you call "extra" writes, and 
what I call "deal with it".

Btw, you can make pages be physically indexed too, but they obviously
 (a) won't be coherent with any virtual mapping laid on top of it
 (b) will be _physical_, so any readahead etc will be based on physical 
     addresses too.

> I thought I fixed the performance problem?

No, you papered over it, for the reasonably common case where things were 
physically contiguous - exactly by using a physical page cache, so now it 
can do read-ahead based on that. Then, because the pages contain buffer 
heads, the directory accesses can look up buffers, and if it was all 
physically contiguous, it all works fine.

But if you actually want virtualluy indexed caching (and all _users_ want 
it), it really doesn't work.

> Somewhat nastily, but as ext3 directories are metadata it is appropriate
> that modifications to them be done in terms of buffer_heads (ie: blocks).

No. There is nothing "appropriate" about using buffer_heads for metadata. 

It's quite proper - and a hell of a lot more efficient - to use virtual 
page-caching for metadata too.

Look at the ext2 readdir() implementation, and compare it to the crapola 
horror that is ext3. Guess what? ext2 uses virtually indexed metadata, and 
as a result it is both simpler, smaller and a LOT faster than ext3 in 
accessing that metadata.

Face it, Andrew, you're wrong on this one. Really. Just take a look at 
ext2_readdir(). 

[ I'm not saying that ext2_readdir() is _beautiful_. If it had been 
  written with the page cache in mind, it would probably have been done 
  very differently. And it doesn't do any readahead, probably because 
  nobody cared enough, but it should be trivial to add, and it would 
  automatically "do the right thing" just because it's much easier at the 
  page cache level.

  But I _am_ saying that compared to ext3, the ext2 readdir is a work of 
  art. ]

"metadata" has _zero_ to do with "physically indexed". There is no 
correlation what-so-ever. If you think there is a correlation, it's all in 
your mind.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ