lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <339bd1b0-681c-61fa-210b-59f1542431e2@redhat.com>
Date:   Mon, 12 Apr 2021 15:20:48 -0400
From:   Waiman Long <llong@...hat.com>
To:     Roman Gushchin <guro@...com>, Waiman Long <llong@...hat.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...ux.com>,
        Pekka Enberg <penberg@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
        cgroups@...r.kernel.org, linux-mm@...ck.org,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Alex Shi <alex.shi@...ux.alibaba.com>,
        Chris Down <chris@...isdown.name>,
        Yafang Shao <laoar.shao@...il.com>,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        Wei Yang <richard.weiyang@...il.com>,
        Masayoshi Mizuma <msys.mizuma@...il.com>
Subject: Re: [PATCH 0/5] mm/memcg: Reduce kmemcache memory accounting overhead

On 4/12/21 1:47 PM, Roman Gushchin wrote:
> On Mon, Apr 12, 2021 at 10:03:13AM -0400, Waiman Long wrote:
>> On 4/9/21 9:51 PM, Roman Gushchin wrote:
>>> On Fri, Apr 09, 2021 at 07:18:37PM -0400, Waiman Long wrote:
>>>> With the recent introduction of the new slab memory controller, we
>>>> eliminate the need for having separate kmemcaches for each memory
>>>> cgroup and reduce overall kernel memory usage. However, we also add
>>>> additional memory accounting overhead to each call of kmem_cache_alloc()
>>>> and kmem_cache_free().
>>>>
>>>> For workloads that require a lot of kmemcache allocations and
>>>> de-allocations, they may experience performance regression as illustrated
>>>> in [1].
>>>>
>>>> With a simple kernel module that performs repeated loop of 100,000,000
>>>> kmem_cache_alloc() and kmem_cache_free() of 64-byte object at module
>>>> init. The execution time to load the kernel module with and without
>>>> memory accounting were:
>>>>
>>>>     with accounting = 6.798s
>>>>     w/o  accounting = 1.758s
>>>>
>>>> That is an increase of 5.04s (287%). With this patchset applied, the
>>>> execution time became 4.254s. So the memory accounting overhead is now
>>>> 2.496s which is a 50% reduction.
>>> Hi Waiman!
>>>
>>> Thank you for working on it, it's indeed very useful!
>>> A couple of questions:
>>> 1) did your config included lockdep or not?
>> The test kernel is based on a production kernel config and so lockdep isn't
>> enabled.
>>> 2) do you have a (rough) estimation how much each change contributes
>>>      to the overall reduction?
>> I should have a better breakdown of the effect of individual patches. I
>> rerun the benchmarking module with turbo-boosting disabled to reduce
>> run-to-run variation. The execution times were:
>>
>> Before patch: time = 10.800s (with memory accounting), 2.848s (w/o
>> accounting), overhead = 7.952s
>> After patch 2: time = 9.140s, overhead = 6.292s
>> After patch 3: time = 7.641s, overhead = 4.793s
>> After patch 5: time = 6.801s, overhead = 3.953s
> Thank you! If there will be v2, I'd include this information into commit logs.

Yes, I am planning to send out v2 with these information in the 
cover-letter. I am just waiting a bit to see if there are more feedback.

-Longman

>
>> Patches 1 & 4 are preparatory patches that should affect performance.
>>
>> So the memory accounting overhead was reduced by about half.

BTW, the benchmark that I used is kind of the best case behavior as it 
as all updates are to the percpu stocks. Real workloads will likely to 
have a certain amount of update to the memcg charges and vmstats. So the 
performance benefit will be less.

Cheers,
Longman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ