linux-ext4 - Re: [RFC] improve malloc for large filesystems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <B5982325-9332-4F55-A989-9D51F172F500@whamcloud.com>
Date:   Wed, 20 Nov 2019 18:22:28 +0000
From:   Alex Zhuravlev <azhuravlev@...mcloud.com>
To:     "Theodore Y. Ts'o" <tytso@....edu>
CC:     Alex Zhuravlev <azhuravlev@...mcloud.com>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: [RFC] improve malloc for large filesystems



Thanks for the feedback.

> On 20 Nov 2019, at 21:13, Theodore Y. Ts'o <tytso@....edu> wrote:
> 
> Hi Alex,
> 
> A couple of comments.  First, please separate this patch so that these
> two separate pieces of functionality can be reviewed and tested
> separately:

Sure, that makes sense.

> 
> As far the prefetch is concerned, please note that the bitmap is first
> read into the buffer cache via read_block_bitmap_nowait(), but then it
> needs to be copied into buddy bitmap pages where it is cached along
> side the buddy bitmap.  (The copy in the buddy bitmap is a combination
> of the on-disk block allocation bitmap plus any outstanding
> preallocations.)  From that copy of block bitmap, we then generate the
> buddy bitmap and as a side effect, initialize the statistics
> (grp->bb_first_free, grp->bb_largest_free_order, grp->bb_counters[]).

> It is these statistics that we need to be able to make allocation
> decisions for a particular block group.  So perhaps we should drive
> the readahead of the bitmaps from ext4_mb_init_group() /
> ext4_mb_init_cache(), and make sure that we actually initialize the
> ext4_group_info structure, and not just read the bitmap into buffer
> cache and hope it gets used before memory pressure pushes it out of
> the buddy cache.

Indeed, but the point is that majority of time is IO itself, so having bitmap
In the buffer cache should improve, right? Not that I’m against buddy
Initialisation, but this would add extra complexity and not that much of
performance, IMO

Memory pressure is a good point though. Do you think touching bitmap
bh/page could be enough to prevent early dropping?
I can introduce another IO completion routine to schedule buddy init.

> Andreas has suggested going even farther, and perhaps storing this
> derived information from the allocation bitmaps someplace convenient
> on disk.  This is an on-disk format change, so we would want to think
> very carefully before going down that path.  Especially since if we're
> going to go this far, perhaps we should consider using an on-disk
> b-tree to store the allocation information, which could be more
> efficient than using allocation bitmaps plus buddy bitmaps.

This is what I normally try to avoid, but in general no objection.

Thanks, Alex