linux-kernel - Re: [00/17] Large Blocksize Support V3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1irb8u4pd.fsf@ebiederm.dsl.xmission.com>
Date:	Fri, 04 May 2007 06:57:18 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	David Chinner <dgc@....com>, clameter@....com,
	linux-kernel@...r.kernel.org, Mel Gorman <mel@...net.ie>,
	William Lee Irwin III <wli@...omorphy.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Badari Pulavarty <pbadari@...il.com>,
	Maxim Levitsky <maximlevitsky@...il.com>
Subject: Re: [00/17] Large Blocksize Support V3

Andrew Morton <akpm@...ux-foundation.org> writes:

> On Fri, 27 Apr 2007 18:03:21 +1000 David Chinner <dgc@....com> wrote:
>
>> > > > > You basically have to
>> > > > > jump through nasty, nasty hoops, to handle corner cases that are
> introduced
>> > > > > because the generic code can no longer reliably lock out access to a
>> > > > > filesystem block.
>> > > 
>> > > This way lies insanity.
>> > 
>> > You're addressing Christoph's straw man here.
>> 
>> No, I'm speaking from years of experience working on a
>> page/buffer/chunk cache capable of using both large pages and
>> aggregating multiple pages. It has, at times, almost driven me
>> insane and I don't want to go back there.
>
> We're talking about two separate things here - let us not conflate them.
>
> 1: The arguably-crippled HBA which wants bigger SG lists.
>
> 2: The late-breaking large-blocksizes-in-the-fs thing.

Well from other parts of the conversation there is a third issue.
  3: large-sectorsize-on-disk.

There are a handful of devices in the kernel that could benefit
and be cleaned up a great deal if they could assume they always
received data in their sg lists that were full sectors.  Nothing
needs to be physically contiguous to handle that case though.

If we support large sector sizes for raw block devices we would
still have an issue of what to do with filesystems that want
to live on them directly.

> None of this multiple-page-locking stuff we're discussing here is relevant
> to the HBA performance problem.  It's pretty simple (I think) for us to
> ensure that, for the great majority of the time, contiguous pages in a file
> are also physically contiguous.  Problem solved, HBA go nice and quick,
> move on.

I suspect we will still need Jens > 128 page linux scatter gather list
work to fully take advantage of this.

> Now, we have this the second and completely unrelated requirement:
> supporting fs-blocksize > PAGE_SIZE.  One way to address this is via the
> mangle-multiple-pages-into-one approach.  And it's obviously the best way
> to do it, if mangle-multiple-pages is already available.

Yep.

> But I don't know how important requirement 2 is.  XFS already has
> presumably-working private code to do it, and there is simplification and
> perhaps modest performance gain in the block allocator to be had here.
>
> And other filesystems (ie: ext4) _might_ use it.  But ext4 is extent-based,
> so perhaps it's not work churning the on-disk format to get a bit of a
> boost in the block allocator.
>
> So I _think_ what this boils down to is offering some simplifications in
> XFS, by adding complexications to core VFS and MM.  I dunno if that's a
> good deal.

Agreed.

When we are doing things optimistically and absolutely require large pages
this approach seems pretty sane.   When we start requiring large 64k
pages I get nervous.

> So...  tell us why you want feature 2?

A good question.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/