linux-kernel - Re: [RFC PATCH 3/3] mm: increase scalability of global memory commitment accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56BC9281.6090505@virtuozzo.com>
Date:	Thu, 11 Feb 2016 16:54:09 +0300
From:	Andrey Ryabinin <aryabinin@...tuozzo.com>
To:	Tim Chen <tim.c.chen@...ux.intel.com>,
	Andrew Morton <akpm@...ux-foundation.org>
CC:	<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
	Andi Kleen <ak@...ux.intel.com>,
	Mel Gorman <mgorman@...hsingularity.net>,
	Vladimir Davydov <vdavydov@...tuozzo.com>,
	Konstantin Khlebnikov <koct9i@...il.com>
Subject: Re: [RFC PATCH 3/3] mm: increase scalability of global memory
 commitment accounting



On 02/11/2016 03:24 AM, Tim Chen wrote:
> On Wed, 2016-02-10 at 13:28 -0800, Andrew Morton wrote:
> 
>>
>> If a process is unmapping 4MB then it's pretty crazy for us to be
>> hitting the percpu_counter 32 separate times for that single operation.
>>
>> Is there some way in which we can batch up the modifications within the
>> caller and update the counter less frequently?  Perhaps even in a
>> single hit?
> 
> I think the problem is the batch size is too small and we overflow
> the local counter into the global counter for 4M allocations.
> The reason for the small batch size was because we use
> percpu_counter_read_positive in __vm_enough_memory and it is not precise
> and the error could grow with large batch size.
> 
> Let's switch to the precise __percpu_counter_compare that is 
> unaffected by batch size.  It will do precise comparison and only add up
> the local per cpu counters when the global count is not precise
> enough.  
> 

I'm not certain about this. for_each_online_cpu() under spinlock somewhat doubtful.
And if we are close to limit we will be hitting slowpath all the time.


> So maybe something like the following patch with a relaxed batch size.
> I have not tested this patch much other than compiling and booting
> the kernel.  I wonder if this works for Andrey. We could relax the batch
> size further, but that will mean that we will incur the overhead
> of summing the per cpu counters earlier when the global count get close
> to the allowed limit.
> 
> Thanks.
> 
> Tim
>