linux-kernel - Re: [PATCH] xfs: account extra freespace btree splits for multiple allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y31saO9m0ixZKdkV@B-P7TQMD6M-0146.local>
Date:   Wed, 23 Nov 2022 08:42:16 +0800
From:   Gao Xiang <hsiangkao@...ux.alibaba.com>
To:     Dave Chinner <david@...morbit.com>
Cc:     Oliver Sang <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
        lkp@...el.com, Zirong Lang <zlang@...hat.com>,
        linux-xfs@...r.kernel.org, ying.huang@...el.com,
        feng.tang@...el.com, zhengjun.xing@...ux.intel.com,
        fengwei.yin@...el.com, "Darrick J. Wong" <djwong@...nel.org>,
        Dave Chinner <dchinner@...hat.com>,
        Brian Foster <bfoster@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] xfs: account extra freespace btree splits for multiple
 allocations

On Wed, Nov 23, 2022 at 09:42:43AM +1100, Dave Chinner wrote:
> On Tue, Nov 22, 2022 at 06:03:03PM +0800, Oliver Sang wrote:
> > hi Gao Xiang,
> > 
> > On Tue, Nov 22, 2022 at 09:33:38AM +0800, Gao Xiang wrote:
> > > On Tue, Nov 22, 2022 at 09:09:34AM +0800, kernel test robot wrote:
> > > > 
> > > > please be noted we noticed Gao Xiang and Dave Chinner have already had lots of
> > > > discussion around this patch, which seems there is maybe new version later.
> > > > we just sent out this report FYI the possible performance impact of this patch.
> > > > 
> > > > 
> > > > Greeting,
> > > > 
> > > > FYI, we noticed a -15.1% regression of fxmark.ssd_xfs_MWCM_72_directio.works/sec due to commit:
> > > 
> > > Thanks for your report!
> > > 
> > > At a glance, I have no idea why this commit can have performance
> > > impacts.  Is the result stable?
> > 
> > in our tests, the result is quite stable.
> >      45589           -15.1%      38687 ±  2%  fxmark.ssd_xfs_MWCM_72_directio.works/sec
> > 
> > and detail data is as below:
> > for this commit:
> >   "fxmark.ssd_xfs_MWCM_72_directio.works/sec": [
> >     39192.224368,
> >     39665.690567,
> >     38980.680601,
> >     37298.99538,
> >     37483.256377,
> >     39504.606569
> >   ],
> > 
> > for parent:
> >   "fxmark.ssd_xfs_MWCM_72_directio.works/sec": [
> >     45381.458009,
> >     45314.376204,
> >     45724.688965,
> >     45751.955937,
> >     45614.323267,
> >     45747.216475
> >   ],
> 
> This MWCM workload uses a shared directory. Every worker thread (72
> of them) iterates creating a new file, writes 4kB of data to it and
> then closes it. There is no synchronisation between worker threads.
> 
> The worker threads will lockstep on the directory lock for file
> creation, they will all attempt to allocate data in the same AG as
> the file is created. Hence writeback will race with file creation
> for AG locks, too.  Once the first AG is full, they will all attempt
> to allocate in the next AG (file creation and writeback).
> 
> IOWs, this workload will race to fill AGs, will exercise the "AG
> full so skip to next AG" allocator fallbacks, etc.

Glad to know about that.  I didn't look into the MWCM workload before.

> 
> Changing where/how AGs are considered full will impact how the AG
> selection is made. I'm betting that there's a mismatch between the
> code that selects the initial AG for allocation (from
> xfs_bmap_btalloc() via the nullfb case) and the code that selects
> the actual AG for allocation (xfs_alloc_vextent() w/ NEAR_BNO
> policy) as a result of this change. This then results in
> xfs_alloc_vextent() trying to initially allocate from an AG that
> xfs_alloc_fix_freelist() considers to be full, so it skips the
> initial selected AG and starts searching for an AG it can allocate
> into.

I can imagine, but I didn't think out several block reservation could
cause such huge impacts.

> 
> Combine that with AGF lock contention from 70+ tasks all trying to
> allocate in the same location...

Yeah, anyway, I will reconfirm on our side about this workload
as well.

Thanks,
Gao Xiang

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@...morbit.com