lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ymu9acl18pTA5GU6@ziqianlu-desk1>
Date:   Fri, 29 Apr 2022 18:26:49 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     Yang Shi <shy828301@...il.com>
CC:     "ying.huang@...el.com" <ying.huang@...el.com>,
        Michal Hocko <mhocko@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>
Subject: Re: [PATCH] mm: swap: determine swap device by using page nid

On Fri, Apr 22, 2022 at 10:00:59AM -0700, Yang Shi wrote:
> On Thu, Apr 21, 2022 at 11:24 PM Aaron Lu <aaron.lu@...el.com> wrote:
> >
> > On Thu, Apr 21, 2022 at 04:34:09PM +0800, ying.huang@...el.com wrote:
> > > On Thu, 2022-04-21 at 16:17 +0800, Aaron Lu wrote:
> > > > On Thu, Apr 21, 2022 at 03:49:21PM +0800, ying.huang@...el.com wrote:
> >
> > ... ...
> >
> > > > > For swap-in latency, we can use pmbench, which can output latency
> > > > > information.
> > > > >
> > > >
> > > > OK, I'll give pmbench a run, thanks for the suggestion.
> > >
> > > Better to construct a senario with more swapin than swapout.  For
> > > example, start a memory eater, then kill it later.
> >
> > What about vm-scalability/case-swapin?
> > https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-swapin
> >
> > I think you are pretty familiar with it but still:
> > 1) it starts $nr_task processes and each mmaps $size/$nr_task area and
> >    then consumes the memory, after this, it waits for a signal;
> > 2) start another process to consume $size memory to push the memory in
> >    step 1) to swap device;
> > 3) kick processes in step 1) to start accessing their memory, thus
> >    trigger swapins. The metric of this testcase is the swapin throughput.
> >
> > I plan to restrict the cgroup's limit to $size.
> >
> > Considering there is only one NVMe drive attached to node 0, I will run
> > the test as described before:
> > 1) bind processes to run on node 0, allocate on node 1 to test the
> >    performance when reclaimer's node id is the same as swap device's.
> > 2) bind processes to run on node 1, allocate on node 0 to test the
> >    performance when page's node id is the same as swap device's.
> >

Thanks to Tim, he has found me a server that has a single Optane disk
attached to node 0.

Let's use task0_mem0 to denote tasks bound to node 0 and memory bound
to node 0 through cgroup cpuset. And for the above swapin case:
when nr_task=1:
task0_mem0 throughput: [571652, 587158, 594316], avg=584375 -> baseline
task0_mem1 throughput: [582944, 583752, 589026], avg=585240    +0.15%
task1_mem0 throughput: [569349, 577459, 581107], avg=575971    -1.4%
task1_mem1 throughput: [564482, 570664, 571466], avg=568870    -2.6%

task0_mem1 is slightly better than task1_mem0.

For nr_task=8 or nr_task=16, I also gave it a run and the result is
almost the same for all 4 cases.

> > Ying and Yang,
> >
> > Let me know what you think about the case used and the way the test is
> > conducted.
> 
> Looks fine to me. To measure the latency, you could also try the below
> bpftrace script:
>

Trying to install bpftrace on an old distro(Ubuntu 16.04) is a real
pain, I gave up... But I managed to get an old bcc installed. Using
the provided funclatency script to profile 30 seconds swap_readpage(),
there is no obvious difference from the histrogram.

So for now, from the existing results, it did't show big difference.
Theoretically, for IO device, when swapping a remote page, using the
remote swap device that is at the same node as the page can reduce the
traffic of the interlink and improve performance. I think this is the
main motivation for this code change?
On swapin time, it's hard to say which node the task will run on anyway
so it's hard to say where to swap is beneficial.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ