[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net>
Date: Mon, 14 Sep 2020 22:21:42 +0200
From: Matthieu Baerts <matthieu.baerts@...sares.net>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Michael Larabel <Michael@...haellarabel.com>
Cc: Matthew Wilcox <willy@...radead.org>,
Amir Goldstein <amir73il@...il.com>,
Ted Ts'o <tytso@...gle.com>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Jan Kara <jack@...e.cz>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking
Hello everyone,
On 14/09/2020 19:47, Linus Torvalds wrote:
> Michael et al,
> Ok, I redid my failed "hybrid mode" patch from scratch (original
> patch never sent out, I never got it to a working point).
>
> Having learnt from my mistake, this time instead of trying to mix the
> old and the new code, instead I just extended the new code, and wrote
> a _lot_ of comments about it.
>
> I also made it configurable, using a "page_lock_unfairness" knob,
> which this patch defaults to 1000 (which is basically infinite).
> That's just a value that says how many times we'll try the old unfair
> case, so "1000" means "we'll re-queue up to a thousand times before we
> say enough is enough" and zero is the fair mode that shows the
> performance problems.
Thank you for the new patch and all the work around from everybody!
Sorry to jump in this thread but I wanted to share my issue, also linked
to the same commit:
2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
I have a simple test environment[1] using Docker and virtme[2] almost
with the default kernel config and validating some tests for the MPTCP
Upstream project[3]. Some of these tests are using a modified version of
packetdrill[4].
Recently, some of these packetdrill tests have been failing after 2
minutes (timeout) instead of being executed in a few seconds (~6
seconds). No packets are even exchanged during these two minutes.
I did a git bisect and it also pointed me to 2a9127fcf229.
I can run the same test 10 times without any issue with the parent
commit (v5.8 tag) but with 2a9127fcf229, I have a timeout most of the time.
Of course, when I try to add some debug info on the userspace or
kernelspace side, I can no longer reproduce the timeout issue. But
without debug, it is easy for me to validate if the issue is there or
not. My issue doesn't seem to be linked to a small file that needs to be
read multiple of times on a FS. Only a few bytes should be transferred
with packetdrill but when there is a timeout, it is even before that
because I don't see any transferred packets in case of issue. I don't
think a lot of IO is used by Packetdrill before transferring a few
packets to a "tun" interface but I didn't analyse further.
With your new patch and the default value, I no longer have the issue.
> I've only (lightly) tested those two extremes, I think the interesting
> range is likely in the 1-5 range.
>
> So you can do
>
> echo 0 > /proc/sys/vm/page_lock_unfairness
> .. run test ..
>
> and you should get the same numbers as without this patch (within
> noise, of course).
On my side, I have the issue with 0. So it seems good because expected!
> Or do
>
> echo 5 > /proc/sys/vm/page_lock_unfairness
> .. run test ..
>
> and get numbers for "we accept some unfairness, but if we have to
> requeue more than five times, we force the fair mode".
Already with 1, it is fine on my side: no more timeout! Same with 5. I
am not checking the performances but only the fact I can run packetdrill
without timeout. With 1 and 5, tests finish in a normal time, that's
really good. I didn't have any timeout in 10 runs, each of them started
from a fresh VM. Patch tested with success!
I would be glad to help by validating new modifications or providing new
info. My setup is also easy to put in place: a Docker image is built
with all required tools to start the same VM just like the one I have.
All scripts are on a public repository[1].
Please tell me if I can help!
Cheers,
Matt
[1]
https://github.com/multipath-tcp/mptcp_net-next/blob/scripts/ci/virtme.sh and
https://github.com/multipath-tcp/mptcp_net-next/blob/scripts/ci/Dockerfile.virtme.sh
[2] https://git.kernel.org/pub/scm/utils/kernel/virtme/virtme.git
[3] https://github.com/multipath-tcp/mptcp_net-next/wiki
[4] https://github.com/multipath-tcp/packetdrill
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
Powered by blists - more mailing lists