linux-ext4 - Re: [PATCH] ext4: add an interface to load block bitmaps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 24 May 2018 10:42:43 +0200
From:   Jan Kara <jack@...e.cz>
To:     Andreas Dilger <adilger@...ger.ca>
Cc:     Jan Kara <jack@...e.cz>, "Theodore Y. Ts'o" <tytso@....edu>,
        Wang Shilong <wangshilong1991@...il.com>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Wang Shilong <wshilong@....com>
Subject: Re: [PATCH] ext4: add an interface to load block bitmaps

On Wed 23-05-18 11:12:29, Andreas Dilger wrote:
> On May 23, 2018, at 9:30 AM, Jan Kara <jack@...e.cz> wrote:
> >>>> It turned out that block bitmaps loading could make
> >>>> some latency here,also for a heavy fragmented filesystem,
> >>>> we might need load many bitmaps to find some free blocks.
> >>>> 
> >>>> To improve above situation, we had a patch to load block
> >>>> bitmaps to memory and pin those bitmaps memory until umount
> >>>> or we release the memory on purpose, this could stable write
> >>>> performances and improve performances of a heavy fragmented
> >>>> filesystem.
> >>> 
> >>> This is true, but I wonder how realistic this is on real production
> >>> systems.  For a 1 TiB file system, pinning all of the block bitmaps
> >>> will require 32 megabytes of memory.  Is that really realistic for
> >>> your use case?
> >> 
> >> In the case of Lustre servers, they typically have 128GB of RAM or
> >> more, and are serving a few hundred TB of storage each these days,
> >> but do not have any other local users, so the ~6GB RAM usage isn't a
> >> huge problem if it improves the performance/consistency. The real
> >> issue is that the bitmaps do not get referenced often enough compared
> >> to the 10GB/s of data flowing through the server, so they are pushed
> >> out of memory too quickly.
> > 
> > OK, and that 10GB/s is mostly use once data?
> 
> Pretty much, yes.  There isn't much chance to re-use the data when it
> can only fit into RAM for a few seconds.

Fair enough.

> > So I could imagine we cache free block information in a more efficient format
> > (something like you or Ted describe), provide a proper shrinker for it
> > (possibly biasing it to make reclaim less likely), and enable it always...
> > That way users don't have to configure it and we don't have to be afraid of
> > eating too much memory in expense of something else.
> 
> One of the problems with this approach is that having compressed bitmaps
> wouldn't necessarily help the performance issue.  This would allow the
> initial block allocation to proceed without reading the bitmap from disk,
> but then the bitmap still needs to be updated and written to disk in that
> transaction.
>
> I guess one possibility is to reconstruct a full bitmap from the compressed
> in-memory bitmap for the write, but this also carries some risk if there
> is an error - the reconstructed bitmap is much more likely to have a large
> corruption because it is generated from a compressed version, compared to
> a (likely) small corruption in the full bitmap.

Yes, generating the bitmap buffer (so that it can be modified & journalled)
from the compressed data is what I had in mind. But note that even without
that at least the initial search for free blocks would not have to load all
the bitmaps for groups we find are not suitable for the allocation in the
end. I'm not sure how big help would it be for your workload but for a
fragmented filesystem I expect it would be actually a big win.

I agree that generating the bitmap buffer from the compressed in-memory
data sounds a bit scary but I'd be more afraid of programming bugs than
plain memory corruption. So at least initially we could have a paranoia
mode where we load the bitmap we are going to modify from disk and make
sure it's consistent with the compressed version.

> Of course, the other question is the complexity of implementing this.
> Pinning the bitmaps is a trivial change that can be applied to a wide range
> of kernel versions, while adding compressed bitmaps will add a lot more
> code and complexity.  I'm not against that, but it would take longer to do.

Yes, it's more complexity but pinning is OTOH user visible API so that has
some maintenance overhead as well as we have to maintain it indefinitely.
And your hack with dumpe2fs works good enough (as much as it's ugly) that I
don't think we need to hurry some half baked solutions...

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR