lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220509000326.GN1098723@dread.disaster.area>
Date:   Mon, 9 May 2022 10:03:26 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Carel Si <beibei.si@...el.com>
Cc:     lkp@...el.com, linux-kernel@...r.kernel.org, lkp@...ts.01.org,
        fengwei.yin@...el.com, linux-xfs@...r.kernel.org,
        oliver.sang@...el.com
Subject: Re: [LKP] Re: [xfs]  32678f1513:  aim7.jobs-per-min -5.6% regression

On Sat, May 07, 2022 at 07:09:46PM +0800, Carel Si wrote:
> Hi Dave,
> 
> On Sat, May 07, 2022 at 07:29:24AM +1000, Dave Chinner wrote:
> > On Fri, May 06, 2022 at 05:22:50PM +0800, kernel test robot wrote:
> > > 
> > > 
> > > Greeting,
> > > 
> > > FYI, we noticed a -5.6% regression of aim7.jobs-per-min due to commit:
> > > 
> > > 
> > > commit: 32678f151338b9a321e9e27139a63c81f353acb7 ("[PATCH 1/4] xfs: detect self referencing btree sibling pointers")
> > > url: https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/xfs-fix-random-format-verification-issues/20220502-162206
> > > base: https://git.kernel.org/cgit/fs/xfs/xfs-linux.git for-next
> > > patch link: https://lore.kernel.org/linux-xfs/20220502082018.1076561-2-david@fromorbit.com
> > 
> > Well, that answers the concern I had about the impact of 
> > changing the way endian conversions were done in that patch.
> > 
> > > a44a027a8b2a20fe 32678f151338b9a321e9e27139a 
> > > ---------------- --------------------------- 
> > >          %stddev     %change         %stddev
> > >              \          |                \  
> > >     464232            -5.6%     438315        aim7.jobs-per-min
> > ....
> > >       0.13 ±  5%      +0.2        0.33 ±  6%  perf-profile.children.cycles-pp.__xfs_btree_check_sblock
> > ....
> > >       0.11 ±  4%      +0.2        0.30 ±  5%  perf-profile.self.cycles-pp.__xfs_btree_check_sblock
> > 
> > Because there is it, right at the bottom of the profile.
> > 
> > Can you try the patch below and see if that fixes the issue?
> 
> We tested below patch, it didn't fix the issue, still has -6.4% regression [1] 
> comparing to a44a027a8b ("Merge tag 'large-extent-counters-v9' of 
> https://github.com/chandanr/linux into xfs-5.19-for-next"). 

Really? It made the regression *worse*?

Oh, wait, *that* wasn't in the last set of profiles:

....
>      35.30 ±  4%      +1.3       36.59 ±  3%      +1.9       37.24 ±  3%  perf-profile.children.cycles-pp.osq_lock
>      36.88 ±  4%      +1.3       38.19 ±  2%      +1.9       38.83 ±  3%  perf-profile.children.cycles-pp.rwsem_optimistic_spin
>      37.40 ±  4%      +1.4       38.77 ±  2%      +2.0       39.35 ±  3%  perf-profile.children.cycles-pp.rwsem_down_write_slowpath
....
>       1.89 ±  3%      +0.1        1.96 ±  4%      +0.0        1.90 ±  2%  perf-profile.self.cycles-pp.rwsem_spin_on_owner
>       0.11 ±  4%      +0.2        0.30 ±  5%      +0.2        0.29        perf-profile.self.cycles-pp.__xfs_btree_check_sblock
>      35.08 ±  4%      +1.3       36.33 ±  2%      +1.9       37.01 ±  3%  perf-profile.self.cycles-pp.osq_lock

This test is hammering an rwsem which is them spinning on exclusive
write locks somewhere. That's where all the increase in system time
has come from, and that's what's causing the difference in
perofrmance. I bet this test is hammering single file write IO from
all 96 CPUs at once, and that's where all the problems start.

IOWs, this likley has nothing to with the btree validation change,
and everything to do with the system being driven into a lock and
cacheline contention corner. When there is lock contention like this
on a large CPU count system, we can't infer anything at all from any
other measurement because the cacheline contention skews
everything...

I'm not going to look at this any further, and I think I'm going to
ignore AIM7 write test regressions in future until the test is
reconfigured to avoid this sort of "100 cpus all trying to do
buffered writes to the same file" contention issue.

Cheers,

Dave.

-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ