linux-kernel - Re: [PATCH RFC 1/4] mm, slab: move memcg charging to post-alloc hook

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bd05d62d-9f46-46b5-b444-6c4814526459@suse.cz>
Date: Wed, 13 Mar 2024 11:55:04 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
 Josh Poimboeuf <jpoimboe@...nel.org>, Jeff Layton <jlayton@...nel.org>,
 Chuck Lever <chuck.lever@...cle.com>, Kees Cook <kees@...nel.org>,
 Christoph Lameter <cl@...ux.com>, Pekka Enberg <penberg@...nel.org>,
 David Rientjes <rientjes@...gle.com>, Joonsoo Kim <iamjoonsoo.kim@....com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Hyeonggon Yoo <42.hyeyoo@...il.com>, Johannes Weiner <hannes@...xchg.org>,
 Michal Hocko <mhocko@...nel.org>, Shakeel Butt <shakeelb@...gle.com>,
 Muchun Song <muchun.song@...ux.dev>, Alexander Viro
 <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>,
 Jan Kara <jack@...e.cz>, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 cgroups@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH RFC 1/4] mm, slab: move memcg charging to post-alloc hook

On 3/12/24 19:52, Roman Gushchin wrote:
> On Fri, Mar 01, 2024 at 06:07:08PM +0100, Vlastimil Babka wrote:
>> The MEMCG_KMEM integration with slab currently relies on two hooks
>> during allocation. memcg_slab_pre_alloc_hook() determines the objcg and
>> charges it, and memcg_slab_post_alloc_hook() assigns the objcg pointer
>> to the allocated object(s).
>>
>> As Linus pointed out, this is unnecessarily complex. Failing to charge
>> due to memcg limits should be rare, so we can optimistically allocate
>> the object(s) and do the charging together with assigning the objcg
>> pointer in a single post_alloc hook. In the rare case the charging
>> fails, we can free the object(s) back.
>>
>> This simplifies the code (no need to pass around the objcg pointer) and
>> potentially allows to separate charging from allocation in cases where
>> it's common that the allocation would be immediately freed, and the
>> memcg handling overhead could be saved.
>>
>> Suggested-by: Linus Torvalds <torvalds@...ux-foundation.org>
>> Link: https://lore.kernel.org/all/CAHk-=whYOOdM7jWy5jdrAm8LxcgCMFyk2bt8fYYvZzM4U-zAQA@mail.gmail.com/
>> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
> 
> Nice cleanup, Vlastimil!
> Couple of small nits below, but otherwise, please, add my
> 
> Reviewed-by: Roman Gushchin <roman.gushchin@...ux.dev>

Thanks!

>>  	/*
>>  	 * The obtained objcg pointer is safe to use within the current scope,
>>  	 * defined by current task or set_active_memcg() pair.
>>  	 * obj_cgroup_get() is used to get a permanent reference.
>>  	 */
>> -	struct obj_cgroup *objcg = current_obj_cgroup();
>> +	objcg = current_obj_cgroup();
>>  	if (!objcg)
>>  		return true;
>>  
>> +	/*
>> +	 * slab_alloc_node() avoids the NULL check, so we might be called with a
>> +	 * single NULL object. kmem_cache_alloc_bulk() aborts if it can't fill
>> +	 * the whole requested size.
>> +	 * return success as there's nothing to free back
>> +	 */
>> +	if (unlikely(*p == NULL))
>> +		return true;
> 
> Probably better to move this check up? current_obj_cgroup() != NULL check is more
> expensive.

It probably doesn't matter in practice anyway, but my thinking was that
*p == NULL is so rare (the object allocation failed) it shouldn't matter
that we did current_obj_cgroup() uselessly in case it happens.
OTOH current_obj_cgroup() returning NULL is not that rare (?) so it
could be useful to not check *p in those cases?

>> +
>> +	flags &= gfp_allowed_mask;
>> +
>>  	if (lru) {
>>  		int ret;
>>  		struct mem_cgroup *memcg;
>> @@ -1926,71 +1939,51 @@ static bool __memcg_slab_pre_alloc_hook(struct kmem_cache *s,
>>  			return false;
>>  	}
>>  
>> -	if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s)))
>> +	if (obj_cgroup_charge(objcg, flags, size * obj_full_size(s)))
>>  		return false;
>>  
>> -	*objcgp = objcg;
>> +	for (i = 0; i < size; i++) {
>> +		slab = virt_to_slab(p[i]);
> 
> Not specific to this change, but I wonder if it makes sense to introduce virt_to_slab()
> variant without any extra checks for this and similar cases, where we know for sure
> that p resides on a slab page. What do you think?