linux-kernel - Re: [PATCH v2 1/2] Make the batch size of the percpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20130521164154.bed705c6e117ceb76205cd65@linux-foundation.org>
Date:	Tue, 21 May 2013 16:41:54 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Tim Chen <tim.c.chen@...ux.intel.com>
Cc:	Tejun Heo <tj@...nel.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Ric Mason <ric.masonn@...il.com>,
	Simon Jeons <simon.jeons@...il.com>,
	Dave Hansen <dave.hansen@...el.com>,
	Andi Kleen <ak@...ux.intel.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>
Subject: Re: [PATCH v2 1/2] Make the batch size of the percpu_counter
 configurable

On Tue, 21 May 2013 16:27:29 -0700 Tim Chen <tim.c.chen@...ux.intel.com> wrote:

> Will something like the following work if we get rid of the percpu
> counter changes and use __percpu_counter_add(..., batch)?  In
> benchmark with a lot of memory changes via brk, this makes quite
> a difference when we go to a bigger batch size.

That looks pretty close.

> Tim
> 
> Change batch size for memory accounting to be proportional to memory available.
> 
> Currently the per cpu counter's batch size for memory accounting is
> configured as twice the number of cpus in the system.  However,
> for system with very large memory, it is more appropriate to make it
> proportional to the memory size per cpu in the system.
> 
> For example, for a x86_64 system with 64 cpus and 128 GB of memory,
> the batch size is only 2*64 pages (0.5 MB).  So any memory accounting
> changes of more than 0.5MB will overflow the per cpu counter into
> the global counter.  Instead, for the new scheme, the batch size
> is configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256),
> which is more inline with the memory size.
> 
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> ---
>  include/linux/mman.h |  5 +++++
>  mm/mmap.c            | 14 ++++++++++++++
>  mm/nommu.c           | 14 ++++++++++++++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 9aa863d..11d5ce9 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -10,12 +10,17 @@
>  extern int sysctl_overcommit_memory;
>  extern int sysctl_overcommit_ratio;
>  extern struct percpu_counter vm_committed_as;
> +extern int vm_committed_as_batch;
>  
>  unsigned long vm_memory_committed(void);
>  
>  static inline void vm_acct_memory(long pages)
>  {
> +#ifdef CONFIG_SMP
> +	__percpu_counter_add(&vm_committed_as, pages, vm_committed_as_batch);
> +#else
>  	percpu_counter_add(&vm_committed_as, pages);
> +#endif
>  }

I think we could use __percpu_counter_add() unconditionally here and
just do

#ifdef CONFIG_SMP
#define vm_committed_as_batch 0
#else
int vm_committed_as_batch;
#endif

The EXPORT_SYMBOL(vm_committed_as_batch) is unneeded.

> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3145,11 +3145,25 @@ void mm_drop_all_locks(struct mm_struct *mm)
>  /*
>   * initialise the VMA slab
>   */
> +
> +int vm_committed_as_batch;
> +EXPORT_SYMBOL(vm_committed_as_batch);
> +
> +static int mm_compute_batch(void)
> +{
> +	int nr = num_present_cpus();
> +	int batch = max(32, nr*2);
> +
> +	/* batch size set to 0.4% of (total memory/#cpus) */
> +	return max((int) (totalram_pages/nr) / 256, batch);
> +}

Change this to do the assignment to vm_committed_as_batch then put this
code inside #ifdef CONFIG_SMP and do

#else	/* CONFIG_SMP */
static inline void mm_compute_batch(void)
{
}
#endif

>  void __init mmap_init(void)
>  {
>  	int ret;
>  
>  	ret = percpu_counter_init(&vm_committed_as, 0);
> +	vm_committed_as_batch = mm_compute_batch();

This becomes just

	mm_compute_batch();

>  	VM_BUG_ON(ret);
>  }
>  
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 298884d..1b7008a 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -527,11 +527,25 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
>  /*
>   * initialise the VMA and region record slabs
>   */
> +
> +int vm_committed_as_batch;
> +EXPORT_SYMBOL(vm_committed_as_batch);
> +
> +static int mm_compute_batch(void)
> +{
> +	int nr = num_present_cpus();
> +	int batch = max(32, nr*2);
> +
> +	/* batch size set to 0.4% of (total memory/#cpus) */
> +	return max((int) (totalram_pages/nr) / 256, batch);
> +}
> +
>  void __init mmap_init(void)
>  {
>  	int ret;
>  
>  	ret = percpu_counter_init(&vm_committed_as, 0);
> +	vm_committed_as_batch = mm_compute_batch();
>  	VM_BUG_ON(ret);
>  	vm_region_jar = KMEM_CACHE(vm_region, SLAB_PANIC);

I'm not sure that CONFIG_MMU=n && CONFIG_SMP=y even exists.  Perhaps it
does.  But there's no point in ruling out that option here.

The nommu code becomes identical to the mmu code so we should put it in
a shared file.  I suppose mmap.c would be as good a place as any.

We could make mm_compute_batch() __init and call it from mm_init(). 
But really it should be __meminit and there should be a memory-hotplug
notifier handler which adjusts vm_committed_as_batch's value.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/