lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 09 Dec 2021 17:22:05 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Josef Bacik <josef@...icpanda.com>
Cc:     peterz@...radead.org, vincent.guittot@...aro.org,
        torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-btrfs@...r.kernel.org
Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch

On 06/12/21 09:48, Valentin Schneider wrote:
> On 03/12/21 14:00, Josef Bacik wrote:
>> On Fri, Dec 03, 2021 at 12:03:27PM +0000, Valentin Schneider wrote:
>>> Could you give the 4 top patches, i.e. those above
>>> 8c92606ab810 ("sched/cpuacct: Make user/system times in cpuacct.stat more precise")
>>> a try?
>>>
>>> https://git.gitlab.arm.com/linux-arm/linux-vs.git -b mainline/sched/nohz-next-update-regression
>>>
>>> I gave that a quick test on the platform that caused me to write the patch
>>> you bisected and looks like it didn't break the original fix. If the above
>>> counter-measures aren't sufficient, I'll have to go poke at your
>>> reproducers...
>>>
>>
>> It's better but still around 6% regression.  If I compare these patches to the
>> average of the last few days worth of runs you're 5% better than before, so
>> progress but not completely erased.
>>
>
> Hmph, time for me to reproduce this locally then. Thanks!

I carved out a partition out of an Ampere eMAG's HDD to play with BTRFS
via fsperf; this is what I get for the bisected commit (baseline is
bisected patchset's immediate parent, aka v5.15-rc4) via a handful of
./fsperf -p before-regression -c btrfs -n 100 -t emptyfiles500k

  write_clat_ns_p99     195395.92     198790.46      4797.01    1.74%
  write_iops             17305.79      17471.57       250.66    0.96%

  write_clat_ns_p99     195395.92     197694.06      4797.01    1.18%
  write_iops             17305.79      17533.62       250.66    1.32%

  write_clat_ns_p99     195395.92     197903.67      4797.01    1.28%
  write_iops             17305.79      17519.71       250.66    1.24%

If I compare against tip/sched/core however:

  write_clat_ns_p99     195395.92     202936.32      4797.01    3.86%
  write_iops             17305.79      17065.46       250.66   -1.39%

  write_clat_ns_p99     195395.92     204349.44      4797.01    4.58%
  write_iops             17305.79      17097.79       250.66   -1.20%

  write_clat_ns_p99     195395.92     204169.05      4797.01    4.49%
  write_iops             17305.79      17112.29       250.66   -1.12%

tip/sched/core + my patches:

  write_clat_ns_p99     195395.92     205721.60      4797.01    5.28%
  write_iops             17305.79      16947.59       250.66   -2.07%

  write_clat_ns_p99     195395.92     203358.04      4797.01    4.07%
  write_iops             17305.79      16953.24       250.66   -2.04%

  write_clat_ns_p99     195395.92     201830.40      4797.01    3.29%
  write_iops             17305.79      17041.18       250.66   -1.53%

So tip/sched/core seems to have a much worse regression, and my patches
are making things worse on that system...

I've started a bisection to see where the above leads me, unfortunately
this machine needs more babysitting than I thought so it's gonna take a
while.

@Josef any chance you could see if the above also applies to you? tip lives
at https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git, though from
where my bisection is taking me it looks like you should see that against
Linus' tree as well.

Thanks,
Valentin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ