lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YbJWBGaGAW/MenOn@localhost.localdomain>
Date:   Thu, 9 Dec 2021 14:16:20 -0500
From:   Josef Bacik <josef@...icpanda.com>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     peterz@...radead.org, vincent.guittot@...aro.org,
        torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-btrfs@...r.kernel.org, guro@...com, clm@...com
Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance
 patch

On Thu, Dec 09, 2021 at 05:22:05PM +0000, Valentin Schneider wrote:
> On 06/12/21 09:48, Valentin Schneider wrote:
> > On 03/12/21 14:00, Josef Bacik wrote:
> >> On Fri, Dec 03, 2021 at 12:03:27PM +0000, Valentin Schneider wrote:
> >>> Could you give the 4 top patches, i.e. those above
> >>> 8c92606ab810 ("sched/cpuacct: Make user/system times in cpuacct.stat more precise")
> >>> a try?
> >>>
> >>> https://git.gitlab.arm.com/linux-arm/linux-vs.git -b mainline/sched/nohz-next-update-regression
> >>>
> >>> I gave that a quick test on the platform that caused me to write the patch
> >>> you bisected and looks like it didn't break the original fix. If the above
> >>> counter-measures aren't sufficient, I'll have to go poke at your
> >>> reproducers...
> >>>
> >>
> >> It's better but still around 6% regression.  If I compare these patches to the
> >> average of the last few days worth of runs you're 5% better than before, so
> >> progress but not completely erased.
> >>
> >
> > Hmph, time for me to reproduce this locally then. Thanks!
> 
> I carved out a partition out of an Ampere eMAG's HDD to play with BTRFS
> via fsperf; this is what I get for the bisected commit (baseline is
> bisected patchset's immediate parent, aka v5.15-rc4) via a handful of
> ./fsperf -p before-regression -c btrfs -n 100 -t emptyfiles500k
> 
>   write_clat_ns_p99     195395.92     198790.46      4797.01    1.74%
>   write_iops             17305.79      17471.57       250.66    0.96%
> 
>   write_clat_ns_p99     195395.92     197694.06      4797.01    1.18%
>   write_iops             17305.79      17533.62       250.66    1.32%
> 
>   write_clat_ns_p99     195395.92     197903.67      4797.01    1.28%
>   write_iops             17305.79      17519.71       250.66    1.24%
> 
> If I compare against tip/sched/core however:
> 
>   write_clat_ns_p99     195395.92     202936.32      4797.01    3.86%
>   write_iops             17305.79      17065.46       250.66   -1.39%
> 
>   write_clat_ns_p99     195395.92     204349.44      4797.01    4.58%
>   write_iops             17305.79      17097.79       250.66   -1.20%
> 
>   write_clat_ns_p99     195395.92     204169.05      4797.01    4.49%
>   write_iops             17305.79      17112.29       250.66   -1.12%
> 
> tip/sched/core + my patches:
> 
>   write_clat_ns_p99     195395.92     205721.60      4797.01    5.28%
>   write_iops             17305.79      16947.59       250.66   -2.07%
> 
>   write_clat_ns_p99     195395.92     203358.04      4797.01    4.07%
>   write_iops             17305.79      16953.24       250.66   -2.04%
> 
>   write_clat_ns_p99     195395.92     201830.40      4797.01    3.29%
>   write_iops             17305.79      17041.18       250.66   -1.53%
> 
> So tip/sched/core seems to have a much worse regression, and my patches
> are making things worse on that system...
> 
> I've started a bisection to see where the above leads me, unfortunately
> this machine needs more babysitting than I thought so it's gonna take a
> while.
> 
> @Josef any chance you could see if the above also applies to you? tip lives
> at https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git, though from
> where my bisection is taking me it looks like you should see that against
> Linus' tree as well.
> 

This has made us all curious, so we're all fucking around with schbench to see
if we can make it show up without needing to use fsperf.  Maybe that'll help
with the bisect, because I had to bisect twice to land on your patches, and I
only emailed when I could see the change right before and right after your
patch.  It would not surprise me at all if there's something else here that's
causing us pain.
> Thanks,
> Valentin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ