lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a61df03d-2e36-9e91-ff02-2f48eb660181@leemhuis.info>
Date:   Tue, 30 Nov 2021 08:16:40 +0100
From:   Thorsten Leemhuis <regressions@...mhuis.info>
To:     Josef Bacik <josef@...icpanda.com>, valentin.schneider@....com,
        "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>
Cc:     peterz@...radead.org, vincent.guittot@...aro.org,
        torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-btrfs@...r.kernel.org
Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance
 patch

Hi, this is your Linux kernel regression tracker speaking.

Top-posting for once, to make this easy accessible to everyone.

Adding the regression mailing list to the list of recipients, as it
should be in the loop for all regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced 7fd7a9e0caba
#regzbot ignore-activity

Reminder: when fixing the issue, please add a 'Link:' tag with the URL
to the report (the parent of this mail), then regzbot will automatically
mark the regression as resolved once the fix lands in the appropriate
tree. For more details about regzbot see footer.

Sending this to everyone that got the initial report, to make all aware
of the tracking. I also hope that messages like this motivate people to
directly get at least the regression mailing list and ideally even
regzbot involved when dealing with regressions, as messages like this
wouldn't be needed then.

Don't worry, I'll send further messages wrt to this regression just to
the lists (with a tag in the subject so people can filter them away), as
long as they are intended just for regzbot. With a bit of luck no such
messages will be needed anyway.

Ciao, Thorsten, your Linux kernel regression tracker.

---
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and/or the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
tell #regzbot about it in the report, as that will ensure the regression
gets on the radar of regzbot and the regression tracker. That's in your
interest, as they will make sure the report won't fall through the
cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include a 'Link:' tag to the report in the commit message, as explained
in Documentation/process/submitting-patches.rst
That aspect was recently was made more explicit in commit 1f57bd42b77c:
https://git.kernel.org/linus/1f57bd42b77c


On 29.11.21 18:03, Josef Bacik wrote:
> 
> Our nightly performance testing found a performance regression when we rebased
> our devel tree onto v5.16-rc.  This took me a few days to bisect down, but this
> patch
> 
> 7fd7a9e0caba ("sched/fair: Trigger nohz.next_balance updates when a CPU goes NOHZ-idle")
> 
> is the one that introduces the regression.  My performance testing box is a 2
> socket, with a model name "Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz", for a
> total of 12 cpu's reported in cpuinfo.  It has 128gib of RAM, and these perf
> tests are being run against a SSD and spinning rust device, but the regression
> is consistent across both configurations.  You can see the historical graph of
> the completion latencies for this specific run
> 
> http://toxicpanda.com/performance/emptyfiles500k_write_clat_ns_p99.png
> 
> Or for something a little more braindead (untar firefox) you can see a increase
> in the runtime
> 
> http://toxicpanda.com/performance/untarfirefox_elapsed.png
> 
> These two tests are single threaded, the regression doesn't appear to affect
> multi-threaded tests.  For a simple reproducer you can simply download a tarball
> of the firefox sources and untar it onto a clean btrfs file system.  The time
> before and after this commit goes up ~1-2 seconds on my machine.  For a less
> simple test you can create a clean btrfs file system and run
> 
> fio --name emptyfiles500k --create_on_open=1 --nrfiles=31250 --readwrite=write \
> 	--readwrite=write --ioengine=filecreate --fallocate=none --filesize=4k \
> 	--openfiles=1 --alloc-size 98304 --allrandrepeat=1 --randseed=12345 \
> 	--directory <mount point>
> 
> And you are looking for the "Write clat ns p99" metric.  You'll see a 5-10%
> increase in the latency time.  If you want to run our tests directly it's
> relatively easy to setup, you can clone the fsperf repo
> 
> https://github.com/josefbacik/fsperf
> 
> Then in the fsperf directory edit the local.cfg and add
> 
> [main]
> directory=/mnt/test
> 
> [btrfs]
> device=/dev/sdc
> iosched=none
> mkfs=mkfs.btrfs -f
> mount=mount -o noatime
> 
> And then run the following on the baseline kernel
> 
> ./fsperf -p regression -c btrfs -n 10 emptyfiles500k
> 
> This will run the test 10 times and save the results to the database.  Then you
> can boot into your changed kernel and runn
> 
> ./fsperf -p regrssion -c btrfs -n 10 -t emptyfiles500k
> 
> This will run the test 10 times and take the average and compare it to the
> baseline and print out the values, you'll see the increase latency values there.
> 
> I can reproduce this at will, if you want to just throw patches at me I'm happy
> to run it and let you know what happens.  I'm attaching my .config as well in
> case that is needed, but the HZ and PREEMPT settings are
> 
> CONFIG_NO_HZ_COMMON=y
> CONFIG_NO_HZ_FULL=y
> CONFIG_NO_HZ=y
> CONFIG_HZ_1000=y
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_COUNT=y
> CONFIG_PREEMPTION=y
> CONFIG_PREEMPT_DYNAMIC=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_HAVE_PREEMPT_DYNAMIC=y
> CONFIG_PREEMPT_NOTIFIERS=y
> CONFIG_DEBUG_PREEMPT=y
> 
> Thanks,
> 
> Josef
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ