lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20131204165153.GA2672@hercules>
Date:	Wed, 4 Dec 2013 16:51:53 +0000
From:	Luis Henriques <luis.henriques@...onical.com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	stable@...r.kernel.org, akpm@...ux-foundation.org,
	athorlton@....com, riel@...hat.com, gregkh@...uxfoundation.org,
	torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: numa: return the number of base pages altered by
 protection changes

On Tue, Dec 03, 2013 at 03:04:00PM +0000, Mel Gorman wrote:
> commit 72403b4a0fbdf433c1fe0127e49864658f6f6468 upstream.

Thank you Mel, I'll queue this backport for the 3.11 kernel.

Cheers,
--
Luis


> 
> Commit 0255d4918480 ("mm: Account for a THP NUMA hinting update as
> one PTE update") was added to account for the number of PTE updates
> when marking pages prot_numa.  task_numa_work was using the old
> return value to track how much address space had been updated.
> Altering the return value causes the scanner to do more work than it
> is configured or documented to in a single unit of work.
> 
> This patch reverts that commit and accounts for the number of THP
> updates separately in vmstat.  It is up to the administrator to
> interpret the pair of values correctly.  This is a straight-forward
> operation and likely to only be of interest when actively debugging NUMA
> balancing problems.
> 
> The impact of this patch is that the NUMA PTE scanner will scan slower
> when THP is enabled and workloads may converge slower as a result.  On
> the flip size system CPU usage should be lower than recent tests
> reported.  This is an illustrative example of a short single JVM specjbb
> test
> 
> specjbb
>                        3.12.0                3.12.0
>                       vanilla      acctupdates
> TPut 1      26143.00 (  0.00%)     25747.00 ( -1.51%)
> TPut 7     185257.00 (  0.00%)    183202.00 ( -1.11%)
> TPut 13    329760.00 (  0.00%)    346577.00 (  5.10%)
> TPut 19    442502.00 (  0.00%)    460146.00 (  3.99%)
> TPut 25    540634.00 (  0.00%)    549053.00 (  1.56%)
> TPut 31    512098.00 (  0.00%)    519611.00 (  1.47%)
> TPut 37    461276.00 (  0.00%)    474973.00 (  2.97%)
> TPut 43    403089.00 (  0.00%)    414172.00 (  2.75%)
> 
>               3.12.0      3.12.0
>              vanillaacctupdates
> User         5169.64     5184.14
> System        100.45       80.02
> Elapsed       252.75      251.85
> 
> Performance is similar but note the reduction in system CPU time.  While
> this showed a performance gain, it will not be universal but at least
> it'll be behaving as documented.  The vmstats are obviously different but
> here is an obvious interpretation of them from mmtests.
> 
>                                 3.12.0      3.12.0
>                                vanillaacctupdates
> NUMA page range updates        1408326    11043064
> NUMA huge PMD updates                0       21040
> NUMA PTE updates               1408326      291624
> 
> "NUMA page range updates" == nr_pte_updates and is the value returned to
> the NUMA pte scanner.  NUMA huge PMD updates were the number of THP
> updates which in combination can be used to calculate how many ptes were
> updated from userspace.
> 
> Signed-off-by: Mel Gorman <mgorman@...e.de>
> Reported-by: Alex Thorlton <athorlton@....com>
> Reviewed-by: Rik van Riel <riel@...hat.com>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> Signed-off-by: Mel Gorman <mgorman@...e.de>
> ---
>  include/linux/vm_event_item.h | 1 +
>  mm/mprotect.c                 | 7 ++++++-
>  mm/vmstat.c                   | 1 +
>  3 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 1855f0a..c557c6d 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -39,6 +39,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>  		PAGEOUTRUN, ALLOCSTALL, PGROTATED,
>  #ifdef CONFIG_NUMA_BALANCING
>  		NUMA_PTE_UPDATES,
> +		NUMA_HUGE_PTE_UPDATES,
>  		NUMA_HINT_FAULTS,
>  		NUMA_HINT_FAULTS_LOCAL,
>  		NUMA_PAGE_MIGRATE,
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 412ba2b..6c3f56f 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -138,6 +138,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
>  	pmd_t *pmd;
>  	unsigned long next;
>  	unsigned long pages = 0;
> +	unsigned long nr_huge_updates = 0;
>  	bool all_same_node;
>  
>  	pmd = pmd_offset(pud, addr);
> @@ -148,7 +149,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
>  				split_huge_page_pmd(vma, addr, pmd);
>  			else if (change_huge_pmd(vma, pmd, addr, newprot,
>  						 prot_numa)) {
> -				pages++;
> +				pages += HPAGE_PMD_NR;
> +				nr_huge_updates++;
>  				continue;
>  			}
>  			/* fall through */
> @@ -168,6 +170,9 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
>  			change_pmd_protnuma(vma->vm_mm, addr, pmd);
>  	} while (pmd++, addr = next, addr != end);
>  
> +	if (nr_huge_updates)
> +		count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates);
> +
>  	return pages;
>  }
>  
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 9bb3145..5a442a7 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -812,6 +812,7 @@ const char * const vmstat_text[] = {
>  
>  #ifdef CONFIG_NUMA_BALANCING
>  	"numa_pte_updates",
> +	"numa_huge_pte_updates",
>  	"numa_hint_faults",
>  	"numa_hint_faults_local",
>  	"numa_pages_migrated",
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ