netdev - Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e7cbca4d-9b34-46f8-961a-9f8ddc92be21@suse.cz>
Date: Mon, 17 Jun 2024 23:34:04 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: paulmck@...nel.org
Cc: "Jason A. Donenfeld" <Jason@...c4.com>,
 "Uladzislau Rezki (Sony)" <urezki@...il.com>,
 Jakub Kicinski <kuba@...nel.org>, Julia Lawall <Julia.Lawall@...ia.fr>,
 linux-block@...r.kernel.org, kernel-janitors@...r.kernel.org,
 bridge@...ts.linux.dev, linux-trace-kernel@...r.kernel.org,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, kvm@...r.kernel.org,
 linuxppc-dev@...ts.ozlabs.org, "Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>,
 Christophe Leroy <christophe.leroy@...roup.eu>,
 Nicholas Piggin <npiggin@...il.com>, netdev@...r.kernel.org,
 wireguard@...ts.zx2c4.com, linux-kernel@...r.kernel.org,
 ecryptfs@...r.kernel.org, Neil Brown <neilb@...e.de>,
 Olga Kornievskaia <kolga@...app.com>, Dai Ngo <Dai.Ngo@...cle.com>,
 Tom Talpey <tom@...pey.com>, linux-nfs@...r.kernel.org,
 linux-can@...r.kernel.org, Lai Jiangshan <jiangshanlai@...il.com>,
 netfilter-devel@...r.kernel.org, coreteam@...filter.org,
 kasan-dev <kasan-dev@...glegroups.com>
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple
 kmem_cache_free callback

On 6/17/24 8:54 PM, Paul E. McKenney wrote:
> On Mon, Jun 17, 2024 at 07:23:36PM +0200, Vlastimil Babka wrote:
>> On 6/17/24 6:12 PM, Paul E. McKenney wrote:
>>> On Mon, Jun 17, 2024 at 05:10:50PM +0200, Vlastimil Babka wrote:
>>>> On 6/13/24 2:22 PM, Jason A. Donenfeld wrote:
>>>>> On Wed, Jun 12, 2024 at 08:38:02PM -0700, Paul E. McKenney wrote:
>>>>>> o	Make the current kmem_cache_destroy() asynchronously wait for
>>>>>> 	all memory to be returned, then complete the destruction.
>>>>>> 	(This gets rid of a valuable debugging technique because
>>>>>> 	in normal use, it is a bug to attempt to destroy a kmem_cache
>>>>>> 	that has objects still allocated.)
>>>>
>>>> This seems like the best option to me. As Jason already said, the debugging
>>>> technique is not affected significantly, if the warning just occurs
>>>> asynchronously later. The module can be already unloaded at that point, as
>>>> the leak is never checked programatically anyway to control further
>>>> execution, it's just a splat in dmesg.
>>>
>>> Works for me!
>>
>> Great. So this is how a prototype could look like, hopefully? The kunit test
>> does generate the splat for me, which should be because the rcu_barrier() in
>> the implementation (marked to be replaced with the real thing) is really
>> insufficient. Note the test itself passes as this kind of error isn't wired
>> up properly.
> 
> ;-) ;-) ;-)

Yeah yeah, I just used the kunit module as a convenient way add the code
that should see if there's the splat :)

> Some might want confirmation that their cleanup efforts succeeded,
> but if so, I will let them make that known.

It could be just the kunit test that could want that, but I don't see
how it could wrap and inspect the result of the async handling and
suppress the splats for intentionally triggered errors as many of the
other tests do.

>> Another thing to resolve is the marked comment about kasan_shutdown() with
>> potential kfree_rcu()'s in flight.
> 
> Could that simply move to the worker function?  (Hey, had to ask!)

I think I had a reason why not, but I guess it could move. It would just
mean that if any objects are quarantined, we'll go for the async freeing
even though those could be flushed immediately. Guess that's not too bad.

>> Also you need CONFIG_SLUB_DEBUG enabled otherwise node_nr_slabs() is a no-op
>> and it might fail to notice the pending slabs. This will need to change.
> 
> Agreed.
> 
> Looks generally good.  A few questions below, to be taken with a
> grain of salt.

Thanks!

>> +static void kmem_cache_kfree_rcu_destroy_workfn(struct work_struct *work)
>> +{
>> +	struct kmem_cache *s;
>> +	int err = -EBUSY;
>> +	bool rcu_set;
>> +
>> +	s = container_of(work, struct kmem_cache, async_destroy_work);
>> +
>> +	// XXX use the real kmem_cache_free_barrier() or similar thing here
>> +	rcu_barrier();

Note here's the barrier.

>> +	cpus_read_lock();
>> +	mutex_lock(&slab_mutex);
>> +
>> +	rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
>> +
>> +	err = shutdown_cache(s, true);
> 
> This is currently the only call to shutdown_cache()?  So there is to be
> a way for the caller to have some influence over the value of that bool?

Not the only caller, there's still the initial attempt in
kmem_cache_destroy() itself below.

> 
>> +	WARN(err, "kmem_cache_destroy %s: Slab cache still has objects",
>> +	     s->name);
> 
> Don't we want to have some sort of delay here?  Or is this the
> 21-second delay and/or kfree_rcu_barrier() mentioned before?

Yes this is after the barrier. The first immediate attempt to shutdown
doesn't warn.

>> +	mutex_unlock(&slab_mutex);
>> +	cpus_read_unlock();
>> +	if (!err && !rcu_set)
>> +		kmem_cache_release(s);
>> +}
>> +
>>  void kmem_cache_destroy(struct kmem_cache *s)
>>  {
>>  	int err = -EBUSY;
>> @@ -494,9 +527,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
>>  	if (s->refcount)
>>  		goto out_unlock;
>>  
>> -	err = shutdown_cache(s);
>> -	WARN(err, "%s %s: Slab cache still has objects when called from %pS",
>> -	     __func__, s->name, (void *)_RET_IP_);
>> +	err = shutdown_cache(s, false);
>> +	if (err)
>> +		schedule_work(&s->async_destroy_work);

And here's the initial attempt that used to warn but now doesn't and
instead schedules the async one.

>>  out_unlock:
>>  	mutex_unlock(&slab_mutex);
>>  	cpus_read_unlock();
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 1617d8014ecd..4d435b3d2b5f 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -5342,7 +5342,8 @@ static void list_slab_objects(struct kmem_cache *s, struct slab *slab,
>>   * This is called from __kmem_cache_shutdown(). We must take list_lock
>>   * because sysfs file might still access partial list after the shutdowning.
>>   */
>> -static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
>> +static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n,
>> +			 bool warn_inuse)
>>  {
>>  	LIST_HEAD(discard);
>>  	struct slab *slab, *h;
>> @@ -5353,7 +5354,7 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
>>  		if (!slab->inuse) {
>>  			remove_partial(n, slab);
>>  			list_add(&slab->slab_list, &discard);
>> -		} else {
>> +		} else if (warn_inuse) {
>>  			list_slab_objects(s, slab,
>>  			  "Objects remaining in %s on __kmem_cache_shutdown()");
>>  		}
>> @@ -5378,7 +5379,7 @@ bool __kmem_cache_empty(struct kmem_cache *s)
>>  /*
>>   * Release all resources used by a slab cache.
>>   */
>> -int __kmem_cache_shutdown(struct kmem_cache *s)
>> +int __kmem_cache_shutdown(struct kmem_cache *s, bool warn_inuse)
>>  {
>>  	int node;
>>  	struct kmem_cache_node *n;
>> @@ -5386,7 +5387,7 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
>>  	flush_all_cpus_locked(s);
>>  	/* Attempt to free all objects */
>>  	for_each_kmem_cache_node(s, node, n) {
>> -		free_partial(s, n);
>> +		free_partial(s, n, warn_inuse);
>>  		if (n->nr_partial || node_nr_slabs(n))
>>  			return 1;
>>  	}
>>