[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1306252364.3026.63.camel@edumazet-laptop>
Date: Tue, 24 May 2011 17:52:44 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: paulmck@...ux.vnet.ibm.com
Cc: David Miller <davem@...emloft.net>, netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH] net: use synchronize_rcu_expedited()
Le mardi 24 mai 2011 à 08:44 -0700, Paul E. McKenney a écrit :
> On Tue, May 24, 2011 at 11:07:32AM +0200, Eric Dumazet wrote:
> > synchronize_rcu() is very slow in various situations (HZ=100,
> > CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)
> >
> > Extract from my (mostly idle) 8 core machine :
> >
> > synchronize_rcu() in 99985 us
> > synchronize_rcu() in 79982 us
> > synchronize_rcu() in 87612 us
> > synchronize_rcu() in 79827 us
> > synchronize_rcu() in 109860 us
> > synchronize_rcu() in 98039 us
> > synchronize_rcu() in 89841 us
> > synchronize_rcu() in 79842 us
> > synchronize_rcu() in 80151 us
> > synchronize_rcu() in 119833 us
> > synchronize_rcu() in 99858 us
> > synchronize_rcu() in 73999 us
> > synchronize_rcu() in 79855 us
> > synchronize_rcu() in 79853 us
> >
> >
> > When we hold RTNL mutex, we would like to spend some cpu cycles but not
> > block too long other processes waiting for this mutex.
> >
> > We also want to setup/dismantle network features as fast as possible at
> > boot/shutdown time.
> >
> > This patch makes synchronize_net() call the expedited version if RTNL is
> > locked.
> >
> > synchronize_rcu_expedited() typical delay is about 20 us on my machine.
> >
> > synchronize_rcu_expedited() in 18 us
> > synchronize_rcu_expedited() in 18 us
> > synchronize_rcu_expedited() in 18 us
> > synchronize_rcu_expedited() in 18 us
> > synchronize_rcu_expedited() in 20 us
> > synchronize_rcu_expedited() in 16 us
> > synchronize_rcu_expedited() in 20 us
> > synchronize_rcu_expedited() in 18 us
> > synchronize_rcu_expedited() in 18 us
>
> Cool!!!
>
> Just out of curiosity, how many CPUs does your system have?
16 (2x4x2) [ processor.max_cstate=1 ]
I am now trying to optimize rcu_barrier(), if you have an idea to get an
expedited version as well ?
We can see in following trace 3 groups, spaced by one jiffie (HZ=100)
Maybe we can avoid sending a call_rcu() to a cpu that has no pending rcu
work ?
[ 835.189996] cpu0 synchronize_rcu_expedited() in 30 us
-> begin rcu_barrier() immediately
[ 835.259702] cpu15 rcu_barrier_callback()
[ 835.259705] cpu14 rcu_barrier_callback()
[ 835.259708] cpu7 rcu_barrier_callback()
[ 835.259711] cpu12 rcu_barrier_callback()
[ 835.259714] cpu8 rcu_barrier_callback()
[ 835.259716] cpu1 rcu_barrier_callback()
[ 835.259719] cpu0 rcu_barrier_callback()
[ 835.269691] cpu13 rcu_barrier_callback()
[ 835.269695] cpu11 rcu_barrier_callback()
[ 835.269698] cpu5 rcu_barrier_callback()
[ 835.269700] cpu6 rcu_barrier_callback()
[ 835.269702] cpu10 rcu_barrier_callback()
[ 835.269705] cpu3 rcu_barrier_callback()
[ 835.269707] cpu2 rcu_barrier_callback()
[ 835.279687] cpu4 rcu_barrier_callback()
[ 835.279689] cpu9 rcu_barrier_callback()
[ 835.279744] cpu0 rcu_barrier() in 89499 us
Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists