linux-kernel - Re: [PATCH 00/16] The new slab memory controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200113153121.GB22139@tower.DHCP.thefacebook.com>
Date:   Mon, 13 Jan 2020 15:31:25 +0000
From:   Roman Gushchin <guro@...com>
To:     Bharata B Rao <bharata@...ux.ibm.com>
CC:     "mhocko@...nel.org" <mhocko@...nel.org>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        "shakeelb@...gle.com" <shakeelb@...gle.com>,
        "vdavydov.dev@...il.com" <vdavydov.dev@...il.com>,
        "longman@...hat.com" <longman@...hat.com>
Subject: Re: [PATCH 00/16] The new slab memory controller

On Mon, Jan 13, 2020 at 02:17:10PM +0530, Bharata B Rao wrote:
> On Tue, Dec 10, 2019 at 06:05:20PM +0000, Roman Gushchin wrote:
> > > 
> > > With slab patches
> > > # docker stats --no-stream
> > > CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
> > > 24bc99d94d91        sleek               0.00%               1MiB / 25MiB        4.00%               1.81kB / 0B         0B / 0B             0
> > > 
> > > Without slab patches
> > > # docker stats --no-stream
> > > CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
> > > 52382f8aaa13        sleek               0.00%               8.688MiB / 25MiB    34.75%              1.53kB / 0B         0B / 0B             0
> > > 
> > > So that's an improvement of MEM USAGE from 8.688MiB to 1MiB. Note that this
> > > docker container isn't doing anything useful and hence the numbers
> > > aren't representative of any workload.
> > 
> > Cool, that's great!
> > 
> > Small containers is where the relative win is the biggest. Of course, it will
> > decrease with the size of containers, but it's expected.
> > 
> > If you'll get any additional numbers, please, share them. It's really
> > interesting, especially if you have larger-than-4k pages.
> 
> I run a couple of workloads contained within a memory cgroup and measured
> memory.kmem.usage_in_bytes and memory.usage_in_bytes with and without
> this patchset on PowerPC host. I see significant reduction in
> memory.kmem.usage_in_bytes and some reduction in memory.usage_in_bytes.
> Before posting the numbers, would like to get the following clarified:
> 
> In the original case, the memory cgroup is charged (including kmem charging)
 > when a new slab page is allocated. In your patch, the subpage charging is
> done in slab_pre_alloc_hook routine. However in this case, I couldn't find
> where exactly kmem counters are being charged/updated. Hence wanted to
> make sure that the reduction in memory.kmem.usage_in_bytes that I am
> seeing is indeed real and not because kmem accounting was missed out for
> slab usage?
> 
> Also, I see all non-root allocations are coming from a single set of
> kmem_caches. Guess <kmemcache_name>-memcg caches don't yet show up in
> /proc/slabinfo and nor their stats is accumulated into /proc/slabinfo?

Hello Bharata!

First I'd look at global slab counters in /proc/meminfo (or vmstat).
These are reflecting the total system-wide amount of pages used by all slab
memory and they are accurate.

What about cgroup-level counters, they are not perfect in the version which
I posted. In general on cgroup v1 kernel memory is accounted twice: as a part
of total memory (memory.usage_in_bytes) and as a separate value
(memory.kmem.usage_in_bytes). The version of the slab controller which you test
doesn't support the second one. Also, it doesn't include the space used by the
accounting meta-data (1 pointer per object) into the accounting.
But after all the difference in memory.usage_in_bytes values beside some margin
(~10% of the difference) is meaningful.

The next version which I'm working on right now (and hope to post in a week or so)
will address these issues.

Thanks!