lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090515111241.GC7745@hmsreliant.think-freely.org>
Date:	Fri, 15 May 2009 07:12:41 -0400
From:	Neil Horman <nhorman@...driver.com>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	Jiri Pirko <jpirko@...hat.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	netdev@...r.kernel.org, davem@...emloft.net
Subject: Re: [PATCH] dropmon: add ability to detect when hardware
	dropsrxpackets

On Fri, May 15, 2009 at 09:37:28AM +0200, Eric Dumazet wrote:
> Jiri Pirko a écrit :
> > Thu, May 14, 2009 at 07:29:54PM CEST, nhorman@...driver.com wrote:
> >> On Thu, May 14, 2009 at 02:44:08PM +0200, Jiri Pirko wrote:
> >>>> +
> >>>> +		/*
> >>>> +		 * Clean the device list
> >>>> +		 */
> >>>> +		list_for_each_entry_rcu(new_stat, &hw_stats_list, list) {
> >>> 		^^^^^^^^^^^^^^^^^^^^^^^
> >>> This is meaningless here. Use list_for_each_entry_rcu only under rcu_read_lock.
> >>> Also it would be good to use list_for_each_entry_safe here since you're
> >>> modifying the list.
> >>>
> >> The definition of list_for_each_entry_rcu specifically says its safe against
> >> list-mutation primitives, so its fine.  Although you are correct, in that its
> >> safety is dependent on the protection of rcu_read_lock(), so I'll add that in.
> > 
> > You are right that list_for_each_entry_rcu is safe against list-mutation
> > primitives. But there's no need for this on the update side when you hold a writer
> > spinlock. Here I think it's better (and also less confusing) to use ordinary
> > list_for_each_entry which in this case must be list_for_each_entry_safe.
> 
> Absolutely.
> 
> RCU is tricky, and must be well understood. Using the right verbs is really needed
> to help everybody read the code and be able to understand it quickly and maintain it
> if needed.
> 
> In this particular case, the list_for_each_entry_rcu() is a litle bit more
> expensive than regular list_for_each_entry(), as it includes additionnal barriers.
> 
> Just reading code like this :
> 
> +	spin_lock(&trace_state_lock);
> +		rcu_read_lock();
> +		list_for_each_entry_rcu(new_stat, &hw_stats_list, list) {
> +			if (new_stat->dev == NULL) {
> +				list_del_rcu(&new_stat->list);
> +				call_rcu(&new_stat->rcu, free_dm_hw_stat);
> +			}
> +		}
> +		rcu_read_unlock();
> +	spin_unlock(&trace_state_lock);
> 
> We *know* something is wrong, as rcu_read_lock() is supposed to guard a readside,
> so having both rcu_read_lock() and list_del_rcu() is certainly wrong, we dont have 
> to actually understand what is really done by the algorithm.
> 
> So following is much better : (but maybe not correct... I let you 
> check if list_for_each_entry_safe() would not be better here, since
> you delete elements in a list while iterating in it)
> 
> Maybe you can add a break; after call_rcu() if you know only one element
> can match the "if (new_stat->dev == NULL) " condition, and use normal list_for_each_entry()
> 
> +	spin_lock(&trace_state_lock);
> +		list_for_each_entry(new_stat, &hw_stats_list, list) {
> +			if (new_stat->dev == NULL) {
> +				list_del_rcu(&new_stat->list);
> +				call_rcu(&new_stat->rcu, free_dm_hw_stat);
> +			}
> +		}
> +	spin_unlock(&trace_state_lock);
> 
> Even if the 'wrong' version was not buggy (no crashes or corruption), the 'right' one
> is a commonly used construct in kernel that most dev are able to read without asking 
> themselves "is is correct or not ? Dont we have something strange here ?"
> 
> Neil, you need to read more code playing with RCU to get familiar with it, we all did same
> errors in the past :)
> 

Thanks :)
So help me understand something here then.  The trace_state_lock has no
intention of protecting the actual read side of this list (the the napi trace
hook that this patch adds).  My understanding of list_for_each_entry_rcu and
list_del_rcu, was that even with a call list_del_rcu, the list remained
unchanged unti a quiescent point was passed by all cpus (which is detected via
the counter in rcu_read_[un]lock.  How is it insufficient to use
rcu_read_[un]lock here to implement that protection?  I agree it looks
counterintuitive, but it makes sense to me from a function standpoint.

Further to that point, given the example that you have above of what you think
_should_ work, is exactly what I have in the set_all_monitor_trace function
currently (saving for the conversion of list_for_each_entry_rcu to
list_for_each_entry_safe and the removal of the read_lock).  If those changes
improve performance, then I can get behind them, but I don't see what problem
they avoid if I don't use them.  Can you clarify that?

Regards
Neil

> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ