linux-kernel - Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <faf416b9-f46c-8534-7fb7-557c046a564d@suse.cz>
Date:   Fri, 17 Jun 2022 11:40:51 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Christoph Lameter <cl@...two.de>,
        Rongwei Wang <rongwei.wang@...ux.alibaba.com>
Cc:     David Rientjes <rientjes@...gle.com>, songmuchun@...edance.com,
        Hyeonggon Yoo <42.hyeyoo@...il.com>, akpm@...ux-foundation.org,
        roman.gushchin@...ux.dev, iamjoonsoo.kim@....com,
        penberg@...nel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and
 slab_free

On 6/8/22 14:23, Christoph Lameter wrote:
> On Wed, 8 Jun 2022, Rongwei Wang wrote:
> 
>> If available, I think document the issue and warn this incorrect behavior is
>> OK. But it still prints a large amount of confusing messages, and disturbs us?
> 
> Correct it would be great if you could fix this in a way that does not
> impact performance.
> 
>> > are current operations on the slab being validated.
>> And I am trying to fix it in following way. In a short, these changes only
>> works under the slub debug mode, and not affects the normal mode (I'm not
>> sure). It looks not elegant enough. And if all approve of this way, I can
>> submit the next version.
> 
> 
>>
>> Anyway, thanks for your time:).
>> -wrw
>>
>> @@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s,
> struct
>> slab *slab,
>>
>>  {
>>         void *prior;
>> -       int was_frozen;
>> +       int was_frozen, to_take_off = 0;
>>         struct slab new;
> 
> to_take_off has the role of !n ? Why is that needed?
> 
>> -       do {
>> -               if (unlikely(n)) {
>> +               spin_lock_irqsave(&n->list_lock, flags);
>> +               ret = free_debug_processing(s, slab, head, tail, cnt, addr);
> 
> Ok so the idea is to take the lock only if kmem_cache_debug. That looks
> ok. But it still adds a number of new branches etc to the free loop.

It also further complicates the already tricky code. I wonder if we should
make more benefit from the fact that for kmem_cache_debug() caches we don't
leave any slabs on percpu or percpu partial lists, and also in
free_debug_processing() we aready take both list_lock and slab_lock. If we
just did the freeing immediately there under those locks, we would be
protected against other freeing cpus by that list_lock and don't need the
double cmpxchg tricks.

What about against allocating cpus? More tricky as those will currently end
up privatizing the freelist via get_partial(), only to deactivate it again,
so our list_lock+slab_lock in freeing path would not protect in the
meanwhile. But the allocation is currently very inefficient for debug
caches, as in get_partial() it will take the list_lock to take the slab from
partial list and then in most cases again in deactivate_slab() to return it.

If instead the allocation path for kmem_cache_debug() cache would take a
single object from the partial list (not whole freelist) under list_lock, it
would be ultimately more efficient, and protect against freeing using
list_lock. Sounds like an idea worth trying to me?
And of course we would stop creating the 'validate' sysfs files for
non-debug caches.