[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <29723aaa-5e28-51d3-7f87-9edf0f7b9c33@linux.alibaba.com>
Date: Wed, 8 Jun 2022 11:04:56 +0800
From: Rongwei Wang <rongwei.wang@...ux.alibaba.com>
To: Christoph Lameter <cl@...two.de>
Cc: David Rientjes <rientjes@...gle.com>, songmuchun@...edance.com,
Hyeonggon Yoo <42.hyeyoo@...il.com>, akpm@...ux-foundation.org,
vbabka@...e.cz, roman.gushchin@...ux.dev, iamjoonsoo.kim@....com,
penberg@...nel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and
slab_free
On 6/7/22 8:14 PM, Christoph Lameter wrote:
> On Fri, 3 Jun 2022, Rongwei Wang wrote:
>
>> Recently, I am also find other ways to solve this. That case was provided by
>> Muchun is useful (Thanks Muchun!). Indeed, it seems that use n->list_lock here
>> is unwise. Actually, I'm not sure if you recognize the existence of such race?
>> If all agrees this race, then the next question may be: do we want to solve
>> this problem? or as David said, it would be better to deprecate validate
>> attribute directly. I have no idea about it, hope to rely on your experience.
>>
>> In fact, I mainly want to collect your views on whether or how to fix this bug
>> here. Thanks!
>
>
> Well validate_slab() is rarely used and should not cause the hot paths to
> incur performance penalties. Fix it in the validation logic somehow? Or
> document the issue and warn that validation may not be correct if there
If available, I think document the issue and warn this incorrect
behavior is OK. But it still prints a large amount of confusing
messages, and disturbs us?
> are current operations on the slab being validated.
And I am trying to fix it in following way. In a short, these changes
only works under the slub debug mode, and not affects the normal mode
(I'm not sure). It looks not elegant enough. And if all approve of this
way, I can submit the next version.
Anyway, thanks for your time:).
-wrw
@@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s,
struct slab *slab,
{
void *prior;
- int was_frozen;
+ int was_frozen, to_take_off = 0;
struct slab new;
unsigned long counters;
struct kmem_cache_node *n = NULL;
@@ -3315,14 +3311,23 @@ static void __slab_free(struct kmem_cache *s,
struct slab *slab,
if (kfence_free(head))
return;
- if (kmem_cache_debug(s) &&
- !free_debug_processing(s, slab, head, tail, cnt, addr))
- return;
+ n = get_node(s, slab_nid(slab));
+ if (kmem_cache_debug(s)) {
+ int ret;
- do {
- if (unlikely(n)) {
+ spin_lock_irqsave(&n->list_lock, flags);
+ ret = free_debug_processing(s, slab, head, tail, cnt, addr);
+ if (!ret) {
spin_unlock_irqrestore(&n->list_lock, flags);
- n = NULL;
+ return;
+ }
+ }
+
+ do {
+ if (unlikely(to_take_off)) {
+ if (!kmem_cache_debug(s))
+ spin_unlock_irqrestore(&n->list_lock,
flags);
+ to_take_off = 0;
}
prior = slab->freelist;
counters = slab->counters;
@@ -3343,8 +3348,6 @@ static void __slab_free(struct kmem_cache *s,
struct slab *slab,
new.frozen = 1;
} else { /* Needs to be taken off a list */
-
- n = get_node(s, slab_nid(slab));
/*
* Speculatively acquire the list_lock.
* If the cmpxchg does not succeed then
we may
@@ -3353,8 +3356,10 @@ static void __slab_free(struct kmem_cache *s,
struct slab *slab,
* Otherwise the list_lock will
synchronize with
* other processors updating the list
of slabs.
*/
- spin_lock_irqsave(&n->list_lock, flags);
+ if (!kmem_cache_debug(s))
+ spin_lock_irqsave(&n->list_lock,
flags);
+ to_take_off = 1;
}
}
@@ -3363,8 +3368,9 @@ static void __slab_free(struct kmem_cache *s,
struct slab *slab,
head, new.counters,
"__slab_free"));
- if (likely(!n)) {
-
+ if (likely(!to_take_off)) {
+ if (kmem_cache_debug(s))
+ spin_unlock_irqrestore(&n->list_lock, flags);
if (likely(was_frozen)) {
/*
* The list lock was not taken therefore no list
>
Powered by blists - more mailing lists