linux-kernel - Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <52EF92DA.1060607@parallels.com>
Date:	Mon, 3 Feb 2014 17:00:10 +0400
From:	Vladimir Davydov <vdavydov@...allels.com>
To:	David Rientjes <rientjes@...gle.com>
CC:	<akpm@...ux-foundation.org>, <mhocko@...e.cz>,
	<penberg@...nel.org>, <cl@...ux.com>, <glommer@...il.com>,
	<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
	<devel@...nvz.org>
Subject: Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs

On 02/03/2014 03:04 PM, David Rientjes wrote:
> On Mon, 3 Feb 2014, Vladimir Davydov wrote:
>
>> AFAIU, cgroup identifiers dumped on oom (cgroup paths, currently) and
>> memcg slab cache names serve for different purposes.
> Sure, you may dump the name for a number of legitimate reasons, but the 
> problem still exists that it's difficult to determine what memcg is being 
> referenced without a flat hierarchy and unique memcg names for all 
> children.
>
>> The point is oom is
>> a perfectly normal situation for the kernel, and info dumped to dmesg is
>> for admin to find out the cause of the problem (a greedy user or
>> cgroup).
> Hmm, so if we hand out top-level memcgs to individual jobs or users, like 
> our userspace does, and they are able to configure their child memcgs as 
> they wish, and then they or the admin finds in the kernel log that a 
> memory hog was killed from the memcg with the perfectly anonymous memcg 
> name of "memcg", how do we determine what job or user triggered that kill?  
> User id is not going to be conclusive in a production environment with 
> shared user accounts.
>
>> On the other hand, slab cache names are dumped to dmesg only on
>> extraordinary situations - like bugs in slab implementation, or double
>> free, or detected memory leaks - where we usually do not need the name
>> of the memcg that triggered the problem, because the bug is likely to be
>> in the kernel subsys using the cache.
> There's certainly overlap here since slab leaks triggered by a particular 
> workload, perhaps by usage of a particular syscall, can occur and cause 
> oom killing but the problem remains that neither the memcg name nor the 
> slab cache name may be conclusive to determine what job or user triggered 
> the issue.  That's why we make strict demands that memcg names are always 
> unique and encode several key values to identify the user and job and we 
> don't rely on the parent.
>
> I can also see the huge maintenance burden it would be to keep around a 
> mapping of kmem ids to {user, job} pairs just in case we later identify a 
> problem and in 99% of the cases would be just wasted storage.
>
>> Plus, the names are exported to
>> sysfs in case of slub, again for debugging purposes, AFAIK. So IMO the
>> use cases for oom vs slab names are completely different - information
>> vs debugging - and I want to export kmem.id only for the ability of
>> debugging kmemcg and slab subsystems.
>>
> Eeek, I'm not sure I agree.  I've often found that reproducing rare slab 
> issues is very difficult without knowledge of the workload so that I can 
> reproduce it.  Whereas X is a very large number of machines and we see 
> this issue on 0.0001% of X machines, I would be required to enable this 
> "debugging" aid unconditionally to ever be able to map the stored kmem id 
> back to a user and job, that mapping would be extremely costly to 
> maintain, and we've gained nothing if we had already demanded that 
> userspace identify their memcg names with unique identifiers regardless of 
> where they are in the hierarchy.

I see your point, and it sounds quite reasonable to me. So I guess I'll
drop the patch removing the cgroup name part from slab cache names
(patch 2) and resend.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/