linux-kernel - Re: [memcg] 0f12156dff: will-it-scale.per_process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <27face50-0e55-ad2b-ebb7-2fe48aee8374@kernel.dk>
Date:   Tue, 7 Sep 2021 10:14:22 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     kernel test robot <oliver.sang@...el.com>,
        Vasily Averin <vvs@...tuozzo.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Andrei Vagin <avagin@...il.com>,
        Borislav Petkov <bp@...en8.de>, Borislav Petkov <bp@...e.de>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        "J. Bruce Fields" <bfields@...ldses.org>,
        Jeff Layton <jlayton@...nel.org>,
        Jiri Slaby <jirislaby@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Michal Hocko <mhocko@...nel.org>,
        Oleg Nesterov <oleg@...hat.com>, Roman Gushchin <guro@...com>,
        Serge Hallyn <serge@...lyn.com>, Tejun Heo <tj@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Yutian Yang <nglaive@...il.com>,
        Zefan Li <lizefan.x@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        kernel test robot <lkp@...el.com>,
        Huang Ying <ying.huang@...el.com>,
        Feng Tang <feng.tang@...el.com>,
        Xing Zhengjun <zhengjun.xing@...ux.intel.com>
Subject: Re: [memcg] 0f12156dff: will-it-scale.per_process_ops -33.6%
 regression

On 9/7/21 9:57 AM, Shakeel Butt wrote:
> On Tue, Sep 7, 2021 at 8:46 AM Jens Axboe <axboe@...nel.dk> wrote:
>>
>> On 9/7/21 9:07 AM, kernel test robot wrote:
>>>
>>>
>>> Greeting,
>>>
>>> FYI, we noticed a -33.6% regression of will-it-scale.per_process_ops due to commit:
>>>
>>>
>>> commit: 0f12156dff2862ac54235fc72703f18770769042 ("memcg: enable accounting for file lock caches")
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> Are we at all worried about these? There's been a number of them
>> reported, basically for all the accounting enablements that have been
>> done in this merge window.
>>
>> When io_uring was switched to use accounted memory, we did a bunch of
>> work to ameliorate the inevitable slowdowns that happen if you do
>> repeated allocs and/or frees and have memcg accounting enabled.
>>
> 
> I think these are important and we should aim to continuously improve
> performance with memcg accounting. I would like to know more about the
> io_uring work done to improve memcg accounting. Maybe we can
> generalize it to others as well.

It's pretty basic and may not be applicable to all cases, we simply hang
on to our allocations for longer periods and reuse them. Hence instead
of always going through alloc+free to each "unit", they are recycled and
reused until no longer needed.

Now this is more efficient in general for us, as we can have a very high
rate of requests (and hence allocs+frees). I suspect most use cases
would benefit from simply having a cache in front of memcg slabs, but
that seems like solving the issue at the wrong layer. IMHO it'd be
better to have the memcg accounting be done in batches, eg have some
notion of deferred frees. If someone allocates before the deferred frees
are accounted, then that would have saved two pieces of accounting.

It is of course possible that a lot of these regressions are simply
accounting the alloc, in which case it seems like accounting in batches
might help there. All depends on the slack that is acceptable for memcg.

-- 
Jens Axboe