[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0701070957080.3661@woody.osdl.org>
Date: Sun, 7 Jan 2007 10:17:38 -0800 (PST)
From: Linus Torvalds <torvalds@...l.org>
To: Christoph Hellwig <hch@...radead.org>
cc: Willy Tarreau <w@....eu>, "H. Peter Anvin" <hpa@...or.com>,
git@...r.kernel.org, nigel@...el.suspend2.net,
"J.H." <warthog9@...nel.org>,
Randy Dunlap <randy.dunlap@...cle.com>,
Andrew Morton <akpm@...l.org>, Pavel Machek <pavel@....cz>,
kernel list <linux-kernel@...r.kernel.org>,
webmaster@...nel.org
Subject: Re: How git affects kernel.org performance
On Sun, 7 Jan 2007, Christoph Hellwig wrote:
>
> On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote:
> > The problem is that I have no sufficient FS knowledge to argument why
> > it helps here. It was a desperate attempt to fix the problem for us
> > and it definitely worked well.
>
> XFS does rather efficient btree directories, and it does sophisticated
> readahead for directories. I suspect that's what is helping you there.
The sad part is that this is a long-standing issue, and the directory
reading code in ext3 really _should_ be able to do ok.
A year or two ago I did a totally half-assed code for the non-hashed
readdir that improved performance by an order of magnitude for ext3 for a
test-case of mine, but it was subtly buggy and didn't do the hashed case
AT ALL. Andrew fixed it up so that it at least wasn't subtly buggy any
more, but in the process it also lost all capability of doing fragmented
directories (so it doesn't help very much any more under exactly the
situation that is the worst case), and it still doesn't do the hashed
directory case.
It's my personal pet peeve with ext3 (as Andrew can attest). And it's
really sad, because I don't think it is fundamental per se, but the way
the directory handling and jdb are done, it's apparently very hard to fix.
(It's clearly not _impossible_ to do: I think that it should be possible
to treat ext3 directories the same way we treat files, except they would
always be in "data=journal" mode. But I understand ext2, not ext3 (and
absolutely not jbd), so I'm not going to be able to do anything about it
personally).
Anyway, I think that disabling hashing can actually help. And I suspect
that even with hashing enabled, there should be some quick hack for making
the directory reading at least be able to do multiple outstanding reads in
parallel, instead of reading the blocks totally synchronously ("read five
blocks, then wait for the one we care" rather than the current "read one
block at a time, wait for it, read the next one, wait for it.."
situation).
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists