linux-kernel - Re: [RFC][PATCH] rcu classic: new algorithm for callbacks-processing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080623091349.GK22569@linux.vnet.ibm.com>
Date:	Mon, 23 Jun 2008 02:13:49 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Lai Jiangshan <laijs@...fujitsu.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	torvalds@...ux-foundation.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC][PATCH] rcu classic: new algorithm for callbacks-processing

On Mon, Jun 23, 2008 at 11:25:38AM +0800, Lai Jiangshan wrote:
> 
> I apologize for for so later response. I do not stop this works.
> But some problems occurred when i tested. (Actually, i wanted to reply
> you after all are fixed, my fault!)

No problem -- as long as things are progressing, I am happy!  ;-)

> Paul E. McKenney wrote:
> > On Tue, Jun 03, 2008 at 11:46:11AM +0800, Lai Jiangshan wrote:
> >> The code/algorithm of the implement of current callbacks-processing
> >> is very efficient and technical. But when I studied it and I found
> >> a disadvantage:
> >>
> >> In multi-CPU systems, when a new RCU callback is being
> >> queued(call_rcu[_bh]), this callback will be invoked after the grace
> >> period for the batch with batch number = rcp->cur+2 has completed
> >> very very likely in current implement. Actually, this callback can be
> >> invoked after the grace period for the batch with
> >> batch number = rcp->cur+1 has completed. The delay of invocation means
> >> that latency of synchronize_rcu() is extended. But more important thing
> >> is that the callbacks usually free memory, and these works are delayed
> >> too! it's necessary for reclaimer to free memory as soon as
> >> possible when left memory is few.
> > 
> > Speeding up the RCU grace periods would indeed be a very good thing!
> > 
> >> A very simple way can solve this problem:
> >> a field(struct rcu_head::batch) is added to record the batch number for
> >> the RCU callback. And when a new RCU callback is being queued, we
> >> determine the batch number for this callback(head->batch = rcp->cur+1)
> >> and we move this callback to rdp->donelist if we find
> >> that head->batch <= rcp->completed when we process callbacks.
> >> This simple way reduces the wait time for invocation a lot. (about
> >> 2.5Grace Period -> 1.5Grace Period in average in multi-CPU systems)
> >>
> >> This is my algorithm. But I do not add any field for struct rcu_head
> >> in my implement. We just need to memorize the last 2 batches and
> >> their batch number, because these 2 batches include all entries that
> >> for whom the grace period hasn't completed. So we use a special
> >> linked-list rather than add a field.
> >> Please see the comment of struct rcu_data.
> > 
> > Maintaining the single list with multiple pointers into it certainly
> > does seem to simplify the list processing, as does extracting the common
> > code from call_rcu() and call_rcu_bh().  Just out of curiosity, why
> > did you keep donelist as a separate list instead of an additional pointer
> > into the mxtlist?
>
> 	donelist is only accessed in softirq(do not need irq disabled),
> but nxtlist is not. i didn't want to modify rcu_do_batch().

OK, we can always handle this separately if it makes sense.

> >> rcutourture was tested successfully(x86_64/4cpu i386/2cpu i386/1cpu).
> > 
> > Of course, RCU implementations need careful inspection, testing and
> > validation.  Running rcutorture is a good first step, but unfortunately
> > only a first step.  So I need to ask you the following questions:
> > 
> > 1.	How long did you run rcutorture?
>
> 	2 hours at the first time i run rcutorture , but no hotplug nor
> test_no_idle_hz argument. How long would be appropriate?

I would suggest 24 hours for a core change like this.

> > 2.	Do you have access to weak-memory-order machines on which
> > 	to do rcutorture testing?  (If not, I expect that we can
> > 	motivate testing elsewhere.)
>
> 	I can't access to weak-memory-order machines. Could you please
> test it after all my test are OK?

I would be very happy to!  Let me know when you are ready, and send
me a patch against some published version (e.g., one of the ones that
is run by test.kernel.org).

> > 3.	Did you run CPU hotplug while running rcutorture?  Doing so
> > 	is extremely important, as RCU interacts with CPU hotplug.
>
> 	failed with the following script is run at the same time,
> i hasn't found out the reason:
> #!/bin/sh
> 
> # 4cpus
> 
> cpu1=1
> cpu2=1
> cpu3=1
> while ((1))
> do
> 	no=$(($RANDOM % 3 + 1))
> 	if ((!cpu$no))
> 	then
> 		echo 1 > /sys/devices/system/cpu/cpu$no/online
> 		((cpu$no=1))
> 	else
> 		echo 0 > /sys/devices/system/cpu/cpu$no/online
> 		((cpu$no=0))
> 	fi
> 	echo 1 $cpu1 $cpu2 $cpu3
> 	sleep 2
> done

You tried this without your changes and it passed, correct?

Never forget to test the base kernel.  ;-)

> > 4.	Did you use the rcutorture test_no_idle_hz and shuffle_interval
> > 	arguments to test out RCU's interaction with CONFIG_NO_HZ?
> > 	(This requires running a CONFIG_NO_HZ kernel.)
>
> 	test is OK with test_no_idle_hz=1 shuffle_interval=5.

Very good!!!  For how long?

> It seems my patch changes nothing about NO_HZ.

Agreed, but changes can have unanticipated side-effects, so it is
always good to check.

> > 5.	One concern I have is the removal of a few memory barriers.
> > 	Could you please tell me why it is safe to remove these?
>
> 	Yes, it is safe, it's may delay the processing a little
> when read the old/error values for rcp->cur/rcp->next_pending.
> I had fixed it. But it may still delay the processing when old value
> for rcp->completed is read in rcu_pending().

Would you please write down why you believe it is safe?  It is not that
I doubt your ability, it is just that RCU implementations can be a bit
tricky at times.  It will help me check your code if I fully understand
your thinking.

> > Could you please run any additional combinations of tests that you
> > are able to given the hardware you have access to?
>
> 	Yes, i will test and i want more advice.

And, as noted earlier, I will be happy to fill in testing on weak-memory
machines once it is working well on the machines you have access to.

> > And thank you very much for all your work in simplifying and speeding
> > up RCU grace-period detection!  There may be some additional work
> > required, but this patch does look promising!
> > 
> > 							Thanx, Paul
> > 
> 
> 	How can I test to find out whether a patch of rcu
> advances system's performance?

By running a number of benchmarks.  You might want to check boot-up
and shutdown speed as well.

> 	I didn't changed any code for batch's grace period. I just
> insert callbacks into the right batch to speeding up their grace periods
> in SMP.
> 
> 	And I think broadcasting when a new batch is started will
> speed up batch's grace period.

It might well, and speeding up the grace period would be a good thing.
However, it will be necessary to test overhead -- and we might need some
help from the SGI guys for their very large machines.  (I have some
moderately large ones I can get access to from time to time, but the SGI
machines are the biggest I am aware of.)

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/