linux-ext4 - Re: [PATCH 2/2] ext4: skip non-loaded groups at cr=0/1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <7F6AF0FC-2E52-4FC5-9663-C8874BA7B98E@whamcloud.com>
Date:   Wed, 20 May 2020 19:59:09 +0000
From:   Alex Zhuravlev <azhuravlev@...mcloud.com>
To:     Andreas Dilger <adilger@...ger.ca>
CC:     Alex Zhuravlev <azhuravlev@...mcloud.com>,
        Ritesh Harjani <riteshh@...ux.ibm.com>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 2/2] ext4: skip non-loaded groups at cr=0/1



> On 20 May 2020, at 22:34, Andreas Dilger <adilger@...ger.ca> wrote:
> 
> On May 20, 2020, at 2:40 AM, Alex Zhuravlev <azhuravlev@...mcloud.com> wrote:
>> 
>>> On 17 May 2020, at 10:55, Andreas Dilger <adilger@...ger.ca> wrote:
>>> 
>>> The question is whether this is situation is affecting only a few inode
>>> allocations for a short time after mount, or does this persist for a long
>>> time?  I think that it _should_ be only a short time, because these other
>>> threads should all start prefetch on their preferred groups, so even if a
>>> few inodes have their blocks allocated in the "wrong" group, it shouldn't
>>> be a long term problem since the prefetched bitmaps will finish loading
>>> and allow the blocks to be allocated, or skipped if group is fragmented.
>> 
>> Yes, that’s the idea - there is a short window when buddy data is being
>> populated. And for each “cluster” (not just a single group) prefetching
>> will be initiated by allocation.
>> It’s possible that some number of inodes will get “bad” blocks right after
>> after mount.
>> If you think this is a bad scenario I can introduce couple more things:
>> 1) few times discussed prefetching thread
>> 2) let mballoc wait for the goal group to get ready - this essentials one
>>   more check in ext4_mb_good_group()
> 
> IMHO, this is an acceptable "cache warmup" behavior, not really different
> than mballoc doing limited scanning when looking for any other allocation.
> Since we already separate inode table blocks and data blocks into separate
> groups due to flex_bg, I don't think any group is "better" than another,
> so long as the allocations are avoiding worst-case fragmentation (i.e. a
> series of one-block allocations).

I tend to agree, but refreshed the patch to enable waiting for the goal group
(one more check). Extra waiting for one group during warmup should be fine, IMO.

Thanks, Alex