lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8fa5bba6-b48c-9cb5-2051-2d986a9d653b@suse.cz>
Date:   Wed, 15 Jun 2022 09:18:10 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Jann Horn <jannh@...gle.com>
Cc:     Christoph Lameter <cl@...ux.com>,
        Pekka Enberg <penberg@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hyeonggon Yoo <42.hyeyoo@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/slub: add missing TID updates on slab deactivation

On 6/14/22 17:54, Jann Horn wrote:
> On Tue, Jun 14, 2022 at 10:23 AM Vlastimil Babka <vbabka@...e.cz> wrote:
> 
>> >               stat(s, DEACTIVATE_BYPASS);
>> >               goto new_slab;
>> > @@ -2968,6 +2969,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> >       freelist = c->freelist;
>> >       c->slab = NULL;
>> >       c->freelist = NULL;
>>
>> Previously these were part of deactivate_slab(), which does that at the very
>> end, but also without bumping tid.
>> I just wonder if it's necessary too, because IIUC the scenario you described
>> relies on the missing bump above. This alone doesn't cause the c->slab vs
>> c->freelist mismatch?
> 
> It's a different scenario, but at least in the current version, the
> ALLOC_NODE_MISMATCH case jumps straight to the deactivate_slab label,
> which takes the local_lock, grabs the old c->freelist, NULLs out
> ->slab and ->freelist, then drops the local_lock again. If the
> c->freelist was non-NULL, then this will prevent concurrent cmpxchg
> success; but there is no reason why c->freelist has to be non-NULL
> here. So if c->freelist is already NULL, we basically just take the
> local_lock, set c->slab to NULL, and drop the local_lock. And IIUC the

Ah, right. Thanks for the explanation.

> local_lock is the only protection we have here against concurrency,
> since the slub_get_cpu_ptr() in __slab_alloc() only disables
> migration?

On PREEMPT_RT it disables migration, but on !PREEMPT_RT it's a plain
get_cpu_ptr() that does preempt_disable(). But that's an implementation
detail, disabling migration would be sufficient on !PREEMPT_RT too, but
right now it's cheaper to disable migration.

> So again a concurrent fastpath free should be able to set
> c->freelist to non-NULL after c->slab has been set to NULL.
> 
> So I think this TID bump is also necessary for correctness in the
> current version.

OK.

> And looking back at older kernels, back to at least 4.9, the
> ALLOC_NODE_MISMATCH case looks similarly broken - except that again,
> as you pointed out, we don't have the fine-grained locking, so it only
> becomes racy if we hit new_slab_objects() -> new_slab() ->
> allocate_slab() and then either we do local_irq_enable() or the
> allocation fails.
> 
>> Thanks. Applying to slab/for-5.19-rc3/fixes branch.
> 
> Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ