linux-kernel - Re: [PATCH v6 00/19] The new cgroup slab memory controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200621233429.GA293939@carbon.DHCP.thefacebook.com>
Date:   Sun, 21 Jun 2020 16:34:29 -0700
From:   Roman Gushchin <guro@...com>
To:     Qian Cai <cai@....pw>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Lameter <cl@...ux.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Shakeel Butt <shakeelb@...gle.com>, <linux-mm@...ck.org>,
        Vlastimil Babka <vbabka@...e.cz>, <kernel-team@...com>,
        <linux-kernel@...r.kernel.org>, <catalin.marinas@....com>
Subject: Re: [PATCH v6 00/19] The new cgroup slab memory controller

On Sun, Jun 21, 2020 at 06:57:52PM -0400, Qian Cai wrote:
> On Mon, Jun 08, 2020 at 04:06:35PM -0700, Roman Gushchin wrote:
> > This is v6 of the slab cgroup controller rework.
> > 
> > The patchset moves the accounting from the page level to the object
> > level. It allows to share slab pages between memory cgroups.
> > This leads to a significant win in the slab utilization (up to 45%)
> > and the corresponding drop in the total kernel memory footprint.
> > The reduced number of unmovable slab pages should also have a positive
> > effect on the memory fragmentation.
> > 
> > The patchset makes the slab accounting code simpler: there is no more
> > need in the complicated dynamic creation and destruction of per-cgroup
> > slab caches, all memory cgroups use a global set of shared slab caches.
> > The lifetime of slab caches is not more connected to the lifetime
> > of memory cgroups.
> > 
> > The more precise accounting does require more CPU, however in practice
> > the difference seems to be negligible. We've been using the new slab
> > controller in Facebook production for several months with different
> > workloads and haven't seen any noticeable regressions. What we've seen
> > were memory savings in order of 1 GB per host (it varied heavily depending
> > on the actual workload, size of RAM, number of CPUs, memory pressure, etc).
> > 
> > The third version of the patchset added yet another step towards
> > the simplification of the code: sharing of slab caches between
> > accounted and non-accounted allocations. It comes with significant
> > upsides (most noticeable, a complete elimination of dynamic slab caches
> > creation) but not without some regression risks, so this change sits
> > on top of the patchset and is not completely merged in. So in the unlikely
> > event of a noticeable performance regression it can be reverted separately.
> 
> Reverting this series and its dependency [1], i.e.,
> 
> git revert --no-edit 05923a2ccacd..07666ee77fb4
> 
> on the top of next-20200621 fixed an issue where kmemleak could report
> thousands of leaks like this below using this .config (if ever matters),
> 
> https://github.com/cailca/linux-mm/blob/master/x86.config
> 
> unreferenced object 0xffff888ff2bf6200 (size 512):

Hi Qian!

My wild guess is that kmemleak is getting confused by modifying the lowest
bit of page->mem_cgroup/obhj_cgroups pointer:

struct page {
	...
	union {
		struct mem_cgroup *mem_cgroup;
		struct obj_cgroup **obj_cgroups;
	};
	...
}

We're using the lowest bit to distinguish between a "normal" mem_cgroup
pointer and a vector of obj_cgroup pointers.

This pointer to obj_cgroup vector is saved only here, so if we're modifying
the address, I guess it's what makes kmemleak think that there is a leak.

Or do you have a real leak?

Thanks!