linux-kernel - Re: [PATCH -mm -v3] mm, swap: Sort swap entries before free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170427043545.GA1726@bbox>
Date:   Thu, 27 Apr 2017 13:35:45 +0900
From:   Minchan Kim <minchan@...nel.org>
To:     "Huang, Ying" <ying.huang@...el.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
        Shaohua Li <shli@...nel.org>, Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH -mm -v3] mm, swap: Sort swap entries before free

On Wed, Apr 26, 2017 at 08:42:10PM +0800, Huang, Ying wrote:
> Minchan Kim <minchan@...nel.org> writes:
> 
> > On Fri, Apr 21, 2017 at 08:29:30PM +0800, Huang, Ying wrote:
> >> "Huang, Ying" <ying.huang@...el.com> writes:
> >> 
> >> > Minchan Kim <minchan@...nel.org> writes:
> >> >
> >> >> On Wed, Apr 19, 2017 at 04:14:43PM +0800, Huang, Ying wrote:
> >> >>> Minchan Kim <minchan@...nel.org> writes:
> >> >>> 
> >> >>> > Hi Huang,
> >> >>> >
> >> >>> > On Fri, Apr 07, 2017 at 02:49:01PM +0800, Huang, Ying wrote:
> >> >>> >> From: Huang Ying <ying.huang@...el.com>
> >> >>> >> 
> >> >>> >>  void swapcache_free_entries(swp_entry_t *entries, int n)
> >> >>> >>  {
> >> >>> >>  	struct swap_info_struct *p, *prev;
> >> >>> >> @@ -1075,6 +1083,10 @@ void swapcache_free_entries(swp_entry_t *entries, int n)
> >> >>> >>  
> >> >>> >>  	prev = NULL;
> >> >>> >>  	p = NULL;
> >> >>> >> +
> >> >>> >> +	/* Sort swap entries by swap device, so each lock is only taken once. */
> >> >>> >> +	if (nr_swapfiles > 1)
> >> >>> >> +		sort(entries, n, sizeof(entries[0]), swp_entry_cmp, NULL);
> >> >>> >
> >> >>> > Let's think on other cases.
> >> >>> >
> >> >>> > There are two swaps and they are configured by priority so a swap's usage
> >> >>> > would be zero unless other swap used up. In case of that, this sorting
> >> >>> > is pointless.
> >> >>> >
> >> >>> > As well, nr_swapfiles is never decreased so if we enable multiple
> >> >>> > swaps and then disable until a swap is remained, this sorting is
> >> >>> > pointelss, too.
> >> >>> >
> >> >>> > How about lazy sorting approach? IOW, if we found prev != p and,
> >> >>> > then we can sort it.
> >> >>> 
> >> >>> Yes.  That should be better.  I just don't know whether the added
> >> >>> complexity is necessary, given the array is short and sort is fast.
> >> >>
> >> >> Huh?
> >> >>
> >> >> 1. swapon /dev/XXX1
> >> >> 2. swapon /dev/XXX2
> >> >> 3. swapoff /dev/XXX2
> >> >> 4. use only one swap
> >> >> 5. then, always pointless sort.
> >> >
> >> > Yes.  In this situation we will do unnecessary sorting.  What I don't
> >> > know is whether the unnecessary sorting will hurt performance in real
> >> > life.  I can do some measurement.
> >> 
> >> I tested the patch with 1 swap device and 1 process to eat memory
> >> (remove the "if (nr_swapfiles > 1)" for test).  I think this is the
> >> worse case because there is no lock contention.  The memory freeing time
> >> increased from 1.94s to 2.12s (increase ~9.2%).  So there is some
> >> overhead for some cases.  I change the algorithm to something like
> >> below,
> >> 
> >>  void swapcache_free_entries(swp_entry_t *entries, int n)
> >>  {
> >>  	struct swap_info_struct *p, *prev;
> >>  	int i;
> >> +	swp_entry_t entry;
> >> +	unsigned int prev_swp_type;
> >>  
> >>  	if (n <= 0)
> >>  		return;
> >>  
> >> +	prev_swp_type = swp_type(entries[0]);
> >> +	for (i = n - 1; i > 0; i--) {
> >> +		if (swp_type(entries[i]) != prev_swp_type)
> >> +			break;
> >> +	}
> >
> > That's really what I want to avoid. For many swap usecases,
> > it adds unnecessary overhead.
> >
> >> +
> >> +	/* Sort swap entries by swap device, so each lock is only taken once. */
> >> +	if (i)
> >> +		sort(entries, n, sizeof(entries[0]), swp_entry_cmp, NULL);
> >>  	prev = NULL;
> >>  	p = NULL;
> >>  	for (i = 0; i < n; ++i) {
> >> -		p = swap_info_get_cont(entries[i], prev);
> >> +		entry = entries[i];
> >> +		p = swap_info_get_cont(entry, prev);
> >>  		if (p)
> >> -			swap_entry_free(p, entries[i]);
> >> +			swap_entry_free(p, entry);
> >>  		prev = p;
> >>  	}
> >>  	if (p)
> >> 
> >> With this patch, the memory freeing time increased from 1.94s to 1.97s.
> >> I think this is good enough.  Do you think so?
> >
> > What I mean is as follows(I didn't test it at all):
> >
> > With this, sort entries if we found multiple entries in current
> > entries. It adds some condition checks for non-multiple swap
> > usecase but it would be more cheaper than the sorting.
> > And it adds a [un]lock overhead for multiple swap usecase but
> > it should be a compromise for single-swap usecase which is more
> > popular.
> >
> 
> How about the following solution?  It can avoid [un]lock overhead and
> double lock issue for multiple swap user case and has good performance
> for one swap user case too.

How worse with approach I suggested compared to as-is?
Unless it's too bad, let's not add more complicated thing to just
enhance the minor usecase in such even *slow* path.
It adds code size/maintainance overead.
With your suggestion, it might enhance a bit with speicific benchmark
but not sure it's really worth for real practice.

> 
> Best Regards,
> Huang, Ying
> 
> From 7bd903c42749c448ef6acbbdee8dcbc1c5b498b9 Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@...el.com>
> Date: Thu, 23 Feb 2017 13:05:20 +0800
> Subject: [PATCH -v5] mm, swap: Sort swap entries before free
> 
> To reduce the lock contention of swap_info_struct->lock when freeing
> swap entry.  The freed swap entries will be collected in a per-CPU
> buffer firstly, and be really freed later in batch.  During the batch
> freeing, if the consecutive swap entries in the per-CPU buffer belongs
> to same swap device, the swap_info_struct->lock needs to be
> acquired/released only once, so that the lock contention could be
> reduced greatly.  But if there are multiple swap devices, it is
> possible that the lock may be unnecessarily released/acquired because
> the swap entries belong to the same swap device are non-consecutive in
> the per-CPU buffer.
> 
> To solve the issue, the per-CPU buffer is sorted according to the swap
> device before freeing the swap entries.  Test shows that the time
> spent by swapcache_free_entries() could be reduced after the patch.
> 
> With the patch, the memory (some swapped out) free time reduced
> 13.6% (from 2.59s to 2.28s) in the vm-scalability swap-w-rand test
> case with 16 processes.  The test is done on a Xeon E5 v3 system.  The
> swap device used is a RAM simulated PMEM (persistent memory) device.
> To test swapping, the test case creates 16 processes, which allocate
> and write to the anonymous pages until the RAM and part of the swap
> device is used up, finally the memory (some swapped out) is freed
> before exit.
> 
> Signed-off-by: Huang Ying <ying.huang@...el.com>
> Acked-by: Tim Chen <tim.c.chen@...el.com>
> Cc: Hugh Dickins <hughd@...gle.com>
> Cc: Shaohua Li <shli@...nel.org>
> Cc: Minchan Kim <minchan@...nel.org>
> Cc: Rik van Riel <riel@...hat.com>
> 
> v5:
> 
> - Use a smarter way to determine whether sort is necessary.
> 
> v4:
> 
> - Avoid unnecessary sort if all entries are from one swap device.
> 
> v3:
> 
> - Add some comments in code per Rik's suggestion.
> 
> v2:
> 
> - Avoid sort swap entries if there is only one swap device.
> ---
>  mm/swapfile.c | 43 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 71890061f653..10e75f9e8ac1 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -37,6 +37,7 @@
>  #include <linux/swapfile.h>
>  #include <linux/export.h>
>  #include <linux/swap_slots.h>
> +#include <linux/sort.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
> @@ -1065,20 +1066,52 @@ void swapcache_free(swp_entry_t entry)
>  	}
>  }
>  
> +static int swp_entry_cmp(const void *ent1, const void *ent2)
> +{
> +	const swp_entry_t *e1 = ent1, *e2 = ent2;
> +
> +	return (int)(swp_type(*e1) - swp_type(*e2));
> +}
> +
>  void swapcache_free_entries(swp_entry_t *entries, int n)
>  {
>  	struct swap_info_struct *p, *prev;
> -	int i;
> +	int i, m;
> +	swp_entry_t entry;
> +	unsigned int prev_swp_type;
>  
>  	if (n <= 0)
>  		return;
>  
>  	prev = NULL;
>  	p = NULL;
> -	for (i = 0; i < n; ++i) {
> -		p = swap_info_get_cont(entries[i], prev);
> -		if (p)
> -			swap_entry_free(p, entries[i]);
> +	m = 0;
> +	prev_swp_type = swp_type(entries[0]);
> +	for (i = 0; i < n; i++) {
> +		entry = entries[i];
> +		if (likely(swp_type(entry) == prev_swp_type)) {
> +			p = swap_info_get_cont(entry, prev);
> +			if (likely(p))
> +				swap_entry_free(p, entry);
> +			prev = p;
> +		} else if (!m)
> +			m = i;
> +	}
> +	if (p)
> +		spin_unlock(&p->lock);
> +	if (likely(!m))
> +		return;
> +
> +	/* Sort swap entries by swap device, so each lock is only taken once. */
> +	sort(entries + m, n - m, sizeof(entries[0]), swp_entry_cmp, NULL);
> +	prev = NULL;
> +	for (i = m; i < n; i++) {
> +		entry = entries[i];
> +		if (swp_type(entry) == prev_swp_type)
> +			continue;
> +		p = swap_info_get_cont(entry, prev);
> +		if (likely(p))
> +			swap_entry_free(p, entry);
>  		prev = p;
>  	}
>  	if (p)
> -- 
> 2.11.0
>