linux-kernel - Re: [PATCH, RFC, tip/core/rcu] scalable classic RCU implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 24 Aug 2008 10:08:44 +0200
From:	Manfred Spraul <manfred@...orfullife.com>
To:	paulmck@...ux.vnet.ibm.com
CC:	linux-kernel@...r.kernel.org, cl@...ux-foundation.org,
	mingo@...e.hu, akpm@...ux-foundation.org, dipankar@...ibm.com,
	josht@...ux.vnet.ibm.com, schamp@....com, niv@...ibm.com,
	dvhltc@...ibm.com, ego@...ibm.com, laijs@...fujitsu.com,
	rostedt@...dmis.org
Subject: Re: [PATCH, RFC, tip/core/rcu] scalable classic RCU implementation

Paul E. McKenney wrote:
> +/*
> + * Definition for node within the RCU grace-period-detection hierarchy.
> + */
> +struct rcu_node {
> +	spinlock_t lock;
> +	unsigned long	qsmask;	/* CPUs or groups that need to switch in      */
> +				/*  order for current grace period to proceed.*/
> +	unsigned long	qsmaskinit;
> +				/* Per-GP initialization for qsmask.	      */
>   
I'm not sure if a bitmap is the right storage. If I understand the code 
correctly, it contains two information:
1) If the bitmap is clear, then all cpus have completed whatever they 
need to do.
A counter is more efficient than a bitmap. Especially: It would allow to 
choose the optimal fan-out, independent from 32/64 bits.
2) The information if the current cpu must do something to complete the 
current period.non
This is a local information, usually (always?) only the current cpu 
needs to know if it must do something.
But this doesn't need to be stored in a shared structure, the 
information could be stored in a per-cpu structure.

> +	/*
> +	 * Extract the list of ready callbacks, disabling to prevent
> +	 * races with call_rcu() from interrupt handlers.
> +	 */
> +	local_irq_save(flags);
> +	list = rdp->nxtlist;
> +	rdp->nxtlist = *rdp->nxttail[RCU_DONE_TAIL];
> +	*rdp->nxttail[RCU_DONE_TAIL] = NULL;
> +	tail = rdp->nxttail[RCU_DONE_TAIL];
> +	for (count = RCU_NEXT_SIZE - 1; count >= 0; count--)
> +		if (rdp->nxttail[count] == rdp->nxttail[RCU_DONE_TAIL])
> +			rdp->nxttail[count] = &rdp->nxtlist;
> +	local_irq_restore(flags);
>   
Do you have a description of the events between call_rcu() and the rcu 
callback?
Is the following description correct?

__call_rcu() queues in RCU_NEXT_TAIL.
In the middle of the current grace period: rcu_check_quiescent_state() 
calls rcu_next_callbacks_are_ready().
Entry now in RCU_NEXT_READY_TAIL
** 0.5 cycles: wait until all cpus have completed the current cycle.
rcu_process_gp_end() moves from NEXT_READY_TAIL to WAIT_TAIL

** full grace period
rcu_process_gp_end() moves from WAIT_TAIL to DONE_TAIL
rcu_do_batch() finds the entries in DONE_TAIL and calls the callback.


> +/*
> + * Do softirq processing for the current CPU.
> + */
>  static void rcu_process_callbacks(struct softirq_action *unused)
>  {
>  	/*
>  	 * Memory references from any prior RCU read-side critical sections
> -	 * executed by the interrupted code must be see before any RCU
> +	 * executed by the interrupted code must be seen before any RCU
>  	 * grace-period manupulations below.
>  	 */
>  
>  	smp_mb(); /* See above block comment. */
>   
Why this mb()? There was a grace period between the last read side 
critical section that might have accessed the pointer.
The rcu internal code already does spin_lock()+spin_unlock(). Isn't that 
sufficient?

>  
> -	__rcu_process_callbacks(&rcu_ctrlblk, &__get_cpu_var(rcu_data));
> -	__rcu_process_callbacks(&rcu_bh_ctrlblk, &__get_cpu_var(rcu_bh_data));
> +	__rcu_process_callbacks(&rcu_state, &__get_cpu_var(rcu_data));
> +	__rcu_process_callbacks(&rcu_bh_state, &__get_cpu_var(rcu_bh_data));
>   
Have you considered merging RCU_DONE_TAIL for rcu_data and rcu_bh_data?

> +
> +/**
> + * call_rcu - Queue an RCU callback for invocation after a grace period.
> + * @head: structure to be used for queueing the RCU updates.
> + * @func: actual update function to be invoked after the grace period
> + *
> + * The update function will be invoked some time after a full grace
> + * period elapses, in other words after all currently executing RCU
> + * read-side critical sections have completed.  RCU read-side critical
> + * sections are delimited by rcu_read_lock() and rcu_read_unlock(),
> + * and may be nested.
> + */
>   
The docbook entry is duplicated: They are in include/linux/rcupdate.h 
and here.
What about removing one of them?
I would go one step further:
Even add call_rcu_sched() into rcupdate.h. Add a Kconfig bool 
"RCU_NEEDS_SCHED" and automatically define either the extern or the #define.

--
    Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/