[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090515111241.GC7745@hmsreliant.think-freely.org>
Date: Fri, 15 May 2009 07:12:41 -0400
From: Neil Horman <nhorman@...driver.com>
To: Eric Dumazet <dada1@...mosbay.com>
Cc: Jiri Pirko <jpirko@...hat.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
netdev@...r.kernel.org, davem@...emloft.net
Subject: Re: [PATCH] dropmon: add ability to detect when hardware
dropsrxpackets
On Fri, May 15, 2009 at 09:37:28AM +0200, Eric Dumazet wrote:
> Jiri Pirko a écrit :
> > Thu, May 14, 2009 at 07:29:54PM CEST, nhorman@...driver.com wrote:
> >> On Thu, May 14, 2009 at 02:44:08PM +0200, Jiri Pirko wrote:
> >>>> +
> >>>> + /*
> >>>> + * Clean the device list
> >>>> + */
> >>>> + list_for_each_entry_rcu(new_stat, &hw_stats_list, list) {
> >>> ^^^^^^^^^^^^^^^^^^^^^^^
> >>> This is meaningless here. Use list_for_each_entry_rcu only under rcu_read_lock.
> >>> Also it would be good to use list_for_each_entry_safe here since you're
> >>> modifying the list.
> >>>
> >> The definition of list_for_each_entry_rcu specifically says its safe against
> >> list-mutation primitives, so its fine. Although you are correct, in that its
> >> safety is dependent on the protection of rcu_read_lock(), so I'll add that in.
> >
> > You are right that list_for_each_entry_rcu is safe against list-mutation
> > primitives. But there's no need for this on the update side when you hold a writer
> > spinlock. Here I think it's better (and also less confusing) to use ordinary
> > list_for_each_entry which in this case must be list_for_each_entry_safe.
>
> Absolutely.
>
> RCU is tricky, and must be well understood. Using the right verbs is really needed
> to help everybody read the code and be able to understand it quickly and maintain it
> if needed.
>
> In this particular case, the list_for_each_entry_rcu() is a litle bit more
> expensive than regular list_for_each_entry(), as it includes additionnal barriers.
>
> Just reading code like this :
>
> + spin_lock(&trace_state_lock);
> + rcu_read_lock();
> + list_for_each_entry_rcu(new_stat, &hw_stats_list, list) {
> + if (new_stat->dev == NULL) {
> + list_del_rcu(&new_stat->list);
> + call_rcu(&new_stat->rcu, free_dm_hw_stat);
> + }
> + }
> + rcu_read_unlock();
> + spin_unlock(&trace_state_lock);
>
> We *know* something is wrong, as rcu_read_lock() is supposed to guard a readside,
> so having both rcu_read_lock() and list_del_rcu() is certainly wrong, we dont have
> to actually understand what is really done by the algorithm.
>
> So following is much better : (but maybe not correct... I let you
> check if list_for_each_entry_safe() would not be better here, since
> you delete elements in a list while iterating in it)
>
> Maybe you can add a break; after call_rcu() if you know only one element
> can match the "if (new_stat->dev == NULL) " condition, and use normal list_for_each_entry()
>
> + spin_lock(&trace_state_lock);
> + list_for_each_entry(new_stat, &hw_stats_list, list) {
> + if (new_stat->dev == NULL) {
> + list_del_rcu(&new_stat->list);
> + call_rcu(&new_stat->rcu, free_dm_hw_stat);
> + }
> + }
> + spin_unlock(&trace_state_lock);
>
> Even if the 'wrong' version was not buggy (no crashes or corruption), the 'right' one
> is a commonly used construct in kernel that most dev are able to read without asking
> themselves "is is correct or not ? Dont we have something strange here ?"
>
> Neil, you need to read more code playing with RCU to get familiar with it, we all did same
> errors in the past :)
>
Thanks :)
So help me understand something here then. The trace_state_lock has no
intention of protecting the actual read side of this list (the the napi trace
hook that this patch adds). My understanding of list_for_each_entry_rcu and
list_del_rcu, was that even with a call list_del_rcu, the list remained
unchanged unti a quiescent point was passed by all cpus (which is detected via
the counter in rcu_read_[un]lock. How is it insufficient to use
rcu_read_[un]lock here to implement that protection? I agree it looks
counterintuitive, but it makes sense to me from a function standpoint.
Further to that point, given the example that you have above of what you think
_should_ work, is exactly what I have in the set_all_monitor_trace function
currently (saving for the conversion of list_for_each_entry_rcu to
list_for_each_entry_safe and the removal of the read_lock). If those changes
improve performance, then I can get behind them, but I don't see what problem
they avoid if I don't use them. Can you clarify that?
Regards
Neil
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists