linux-kernel - Re: [RFC] fsblock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <467F6BC6.60209@yahoo.com.au>
Date:	Mon, 25 Jun 2007 17:16:22 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Andi Kleen <andi@...stfloor.org>
CC:	Nick Piggin <npiggin@...e.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux Memory Management List <linux-mm@...ck.org>,
	linux-fsdevel@...r.kernel.org
Subject: Re: [RFC] fsblock

Andi Kleen wrote:
> Nick Piggin <npiggin@...e.de> writes:
> 
>>- Structure packing. A page gets a number of buffer heads that are
>>  allocated in a linked list. fsblocks are allocated contiguously, so
>>  cacheline footprint is smaller in the above situation.
> 
> 
> It would be interesting to test if that makes a difference for 
> database benchmarks running over file systems. Databases
> eat a lot of cache so in theory any cache improvements
> in the kernel which often runs cache cold then should be beneficial. 
> 
> But I guess it would need at least ext2 to test; Minix is probably not
> good enough.

Yeah, you are right. ext2 would be cool to port as it would be
a reasonable platform for basic performance testing and comparisons.

> In general have you benchmarked the CPU overhead of old vs new code? 
> e.g. when we went to BIO scalability went up, but CPU costs
> of a single request also went up. It would be nice to not continue
> or better reverse that trend.

At the moment there are still a few silly things in the code, such
as always calling the insert_mapping indirect function (which is
the get_block equivalent). And it does a bit more RMWing than it
should still.

Also, it always goes to the pagecache radix-tree to find fsblocks,
wheras the buffer layer has a per-CPU cache front-end... so in
that regard, fsblock is really designed with lockless pagecache
in mind, where find_get_page is much faster even in the serial case
(though fsblock shouldn't exactly be slow with the current pagecache).

However, I don't think there are any fundamental performance
problems with fsblock. It even uses one less layer of locking to
do regular IO compared with buffer.c, so in theory it might even
have some advantage.

Single threaded performance of request submission is something I
will definitely try to keep optimal.

>>- Large block support. I can mount and run an 8K block size minix3 fs on
>>  my 4K page system and it didn't require anything special in the fs. We
>>  can go up to about 32MB blocks now, and gigabyte+ blocks would only
>>  require  one more bit in the fsblock flags. fsblock_superpage blocks
>>  are > PAGE_CACHE_SIZE, midpage ==, and subpage <.
> 
> 
> Can it be cleanly ifdefed or optimized away?

Yeah, it pretty well stays out of the way when using <= PAGE_CACHE_SIZE
size blocks, generally just a single test and branch of an already-used
cacheline. It can be optimised away completely by commenting out
#define BLOCK_SUPERPAGE_SUPPORT from fsblock.h.

> Unless the fragmentation
> problem is not solved it would seem rather pointless to me. Also I personally
> still think the right way to approach this is larger softpage size.

It does not suffer from a fragmentation problem. It will do scatter
gather IO if the pagecache of that block is not contiguous. My naming
may be a little confusing: fsblock_superpage (which is a function that
returns true if the given fsblock is larger than PAGE_CACHE_SIZE) is
just named as to whether the fsblock is larger than a page, rather than
having a connection to VM superpages.

Don't get me wrong, I think soft page size is a good idea for other
reasons as well (less page metadata and page operations), and that
8 or 16K would probably be a good sweet spot for today's x86 systems.

-- 
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/