lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48D2875E.3090300@colorfullife.com>
Date:	Thu, 18 Sep 2008 18:52:46 +0200
From:	Manfred Spraul <manfred@...orfullife.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	Lai Jiangshan <laijs@...fujitsu.com>, Ingo Molnar <mingo@...e.hu>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Dipankar Sarma <dipankar@...ibm.com>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC PATCH] rcu: introduce kfree_rcu()

Andrew Morton wrote:
> On Thu, 18 Sep 2008 12:18:28 +0800 Lai Jiangshan <laijs@...fujitsu.com> wrote:
>
>   
>> sometimes a rcu callback is just calling kfree() to free a struct's memory
>> (we say this callback is a trivial callback.).
>> this patch introduce kfree_rcu() to do these things directly, easily.
>>
>> There are 4 reasons that we need kfree_rcu():
>>
>> 1) unloadable modules:
>>    a module(rcu callback is defined in this module) using rcu must
>>    call rcu_barrier() when unload. rcu_barrier() will increase
>>    the system's overhead(the more cpus the worse) and
>>    rcu_barrier() is very time-consuming. if all rcu callback defined
>>    in this module are trivial callback, we can just call kfree_rcu()
>>    instead, save a rcu_barrier() when unload.
>>
>> 2) duplicate code:
>>    all trivial callback are duplicate code though the structs to be freed
>>    are different. it's just a container_of() and a kfree().
>>    There are about 50% callbacks are trivial callbacks for call_rcu() in
>>    current kernel code.
>>
>> 3) cache:
>>    the instructions of trivial callback is not in the cache supposedly.
>>    calling a trivial callback will let to cache missing very likely.
>>    the more trivial callback the more cache missing. OK, this is
>>    not a problem now or in a few days: Only less than 1% trivial callback
>>    are called in running kernel.
>>
>> 4) future:
>>    the number of user of rcu is increasing. new code for rcu is
>>    trivial callback very likely. it means more modules using rcu
>>    and more duplicate code(may come to 90% of callbacks is trivial
>>    callbacks) and more cache missing.
>>
>> Implementation:
>>    there were a lot of ideas came out when i implemented kfree_rcu().
>>    I chose the simplest one as this patch shows. but these implementation
>>    may cannot be used for to free a struct larger than 16KBytes.
>>
>> kfree_rcu_bh()? kfree_rcu_sched()?
>>    these two are not need current. call_rcu_bh() & call_rcu_sched()
>>    are hardly be called(and hardly be called for trivial callback).
>>
>> vfree_rcu()?
>>    No, vfree() is not atomic function, will not be called in softirq.
>>
>>     
>
> This is all rather mysterious.
>
>   
>> ---
>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>> index e8b4039..04c654f 100644
>> --- a/include/linux/rcupdate.h
>> +++ b/include/linux/rcupdate.h
>> @@ -253,4 +253,25 @@ extern void rcu_barrier_sched(void);
>>  extern void rcu_init(void);
>>  extern int rcu_needs_cpu(int cpu);
>>  
>> +#define __KFREE_RCU_MAX_OFFSET 4095
>> +#define KFREE_RCU_MAX_OFFSET (sizeof(void *) * __KFREE_RCU_MAX_OFFSET)
>> +
>> +#define __rcu_reclaim(head) \
>> +do { \
>> +	unsigned long __offset = (unsigned long)head->func; \
>> +	if (__offset <= __KFREE_RCU_MAX_OFFSET) \
>> +		kfree((void *)head - sizeof(void *) * __offset); \
>> +	else \
>> +		head->func(head); \
>> +} while(0)
>>     
>
> All the above could do with some comments explaining what it does.
>   
__rcu_reclaim either treats head->func as an offset for kfree or as a 
function pointer.


>>  #endif /* __LINUX_RCUPDATE_H */
>> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
>> index aad93cd..5a14190 100644
>> --- a/kernel/rcuclassic.c
>> +++ b/kernel/rcuclassic.c
>> @@ -232,7 +232,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>>  	while (list) {
>>  		next = list->next;
>>  		prefetch(next);
>> -		list->func(list);
>> +		__rcu_reclaim(list);
>>     
Here it's used:
the softirq that is called after the grace period calls kfree directly 
instead of calling a wrapper function around kfree.

>>  		list = next;
>>  		if (++count >= rdp->blimit)
>>  			break;
>> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
>> index 467d594..aa9b56a 100644
>> --- a/kernel/rcupdate.c
>> +++ b/kernel/rcupdate.c
>> @@ -162,6 +162,18 @@ void rcu_barrier_sched(void)
>>  }
>>  EXPORT_SYMBOL_GPL(rcu_barrier_sched);
>>  
>> +void kfree_rcu(const void *ptr, struct rcu_head *head)
>> +{
>> +	unsigned long offset;
>> +	typedef void (*rcu_callback)(struct rcu_head *);
>> +
>> +	offset = (void *)head - (void *)ptr;
>>     
What about offset_of? the computation is known at compile time.
>> +	BUG_ON(offset > KFREE_RCU_MAX_OFFSET);
>> +
>>     
I'd try to make that a compile time error. Is that possible? perhaps 
with some __builtin_constant_p (head-ptr) or something like that. Or 
with offset_of.

>> +	call_rcu(head, (rcu_callback)(offset / sizeof(void *)));
>>     
>
> How can this work?  We take the difference between two pointers, divide
> that by 4 or 8, then treat the resulting number as the address of an
> RCU callback function.
>
> I think I'm missing something here.
>
>   
__rcu_reclaim() knows that function pointers < 4096 are actually offsets 
for kfree.


I like the idea:
- the call to list->func() is probably very difficult to predict for a 
branch target predictor.
- it's just a waste not to call kfree directly.
- I'm not sure about the implementation.

--
    Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ