[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231123143452.erzar3sqhg37hjxz@revolver>
Date: Thu, 23 Nov 2023 09:34:52 -0500
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Bagas Sanjaya <bagasdotme@...il.com>
Cc: Chun Ng <chunn@...dia.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux Regressions <regressions@...ts.linux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Memory Management List <linux-mm@...ck.org>,
Ankita Garg <ankitag@...dia.com>,
Suren Baghdasaryan <surenb@...gle.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>
Subject: Re: [REGRESSION]: mmap performance regression starting with k-6.1
* Bagas Sanjaya <bagasdotme@...il.com> [231123 00:07]:
> On Wed, Nov 22, 2023 at 08:03:19PM +0000, Chun Ng wrote:
> > Hi,
> >
> > Recently I observed there is performance regression on system call mmap(..). I tried both vanilla kernels and Raspberry Pi kernels on a Raspberry Pi 4 box and the results are pretty consistent among them.
> >
> > Bisection showed that the regression starts from k-6.1, and the latest vanilla k-6.7 is still showing the same regression.
This is almost certainly the maple tree. The tree is slower on writes
than the rbtree and so if the benchmark mmaps/munmaps in a tight loop
you will see this slow down. What you are doing is measuring the speed
of inserting and removing a VMA with this benchmark, so it's not really
something that happens - we usually use the mapping between adding and
removing it.
What this gains us is the ability to remove contention on the mmap lock
during page faults. If you were to test contention around that lock,
you will see a slowdown until you reach v6.4, where per-vma locking
started to show up. More benchmarking will show different types of
fault handling outside of the mmap lock until (I believe) 6.6, where
most (or all?) types are supported.
Although this is expected, I am still looking to reduce any real
workloads that may suffer. I've been reducing the allocations, for
example.
> >
> > The test program calls mmap/munmap for a 4K page with MAP_ANON and MAP_PRIVATE flags, and ftrace is used to measure the time spent on the do_mmap(..) call. Measured time of a sample run with different vanilla kernel versions are:
> > k-5.10 and k-6.0: ~157us
> > k-6.1: ~194us
> > k-6.7: ~214us
I would have expected v6.7 to remain closer to v6.1, but that may depend
on the minor versions you have been testing and what fixes have landed
there.
> > Results are pretty consistent across multiple runs with a small percentage variance. Ftrace shows that latency of mmap_region(...) has increased since k-6.1. An application that makes frequent mmap(..) calls the accumulated extra latency is very noticeable.
> >
> > Please find the ftrace results and kernel config files in this folder:
> > https://drive.google.com/drive/folders/1qy8YTBqxu8Gdbs7IigYbSd4FXldId5sd?usp=drive_link
> >
> > The test program can be found in here:
> > https://drive.google.com/file/d/1tG6_BbQMCHwfKebvAIAg_xqbM_lpPcuM/view?usp=sharing
> >
> > Info on the testing environment:
> > cpufreq_governor: performance
> > Test machine: Raspberry Pi 4, 8GB DDR
> > SCHED_FIFO with priority 99 for running the test program
> >
> > Vanilla kernels are not tainted. However on k-6.0 and k-6.7, I have to patch the drivers/clk/bcm/clk-raspberrypi.c file with the version in Raspberry Pi kernel tree for the CPU frequency governor to work.
> >
>
> The next step is to find the commit that introduces your regression with
> `git bisect`. If you haven't done so, see
> Documentation/admin-guide/bug-bisect.rst for instructions.
>
> Anyway, I'm adding this regression to regzbot:
>
> #regzbot ^introduced: v6.0..v6.1
>
> Thanks.
>
> --
> An old man doll... just what I always wanted! - Clara
Powered by blists - more mailing lists