linux-kernel - Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5787C834.9060609@hpe.com>
Date:	Thu, 14 Jul 2016 13:13:24 -0400
From:	Waiman Long <waiman.long@....com>
To:	Tejun Heo <tj@...nel.org>
CC:	Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.com>,
	Jeff Layton <jlayton@...chiereds.net>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	<linux-fsdevel@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andi Kleen <andi@...stfloor.org>,
	Dave Chinner <dchinner@...hat.com>,
	Boqun Feng <boqun.feng@...il.com>,
	Scott J Norton <scott.norton@....com>,
	Douglas Hatch <doug.hatch@....com>
Subject: Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected
 lists

On 07/14/2016 07:50 AM, Tejun Heo wrote:
> Hello,
>
> On Wed, Jul 13, 2016 at 10:54:19PM -0400, Waiman Long wrote:
>> On 07/13/2016 12:08 PM, Tejun Heo wrote:
>>> On Mon, Jul 11, 2016 at 01:32:06PM -0400, Waiman Long wrote:
>>> ...
>>>> A new header file include/linux/dlock-list.h will be added with the
>>> Heh, I think perpcu_list was the better name but suppose I'm too late.
>> Given the fact that we may change the level of lock granularity in the
>> future, it is too late to change it back to percpu_list.
> It's more about the consistency.  We have other data structures which
> are in a similar situation.  If we end up doing Nth CPU for dlist,
> we'll probably end up doing the same thing for other percpu data
> structures.

I see.

I got comment related to the percpu-list name from Christoph Lameter a 
while ago. His argument was that since deletion can happenned from any 
CPU, it was not really percpu like the other percpu structures. That 
prompted me to change the name to the current form. I am fine with 
either names, but I would like to keep the current name unless there is 
a great rationale for switching back.


>>> Why do we need two variants of this?  We need a state variable to walk
>>> the list anyway.  Why not make dlock_list_iterate() safe against
>>> removal and get rid of the second variant?  Also, dlock_list_next()
>>> probably is a better name.
>> The two variants are the equivalent of list_for_each_entry() and
>> list_for_each_entry_safe(). There are scenarios where
>> list_for_each_entry_safe() isn't really safe when the next pointer of the
>> current node is modified, for example. In this particular use case, the safe
>> version isn't needed as all the list_for_each_entry_safe() call sites were
>> modified to use list_for_each_entry() instead. I keep the
>> dlock_list_iterate_safe() for future use cases that may need the safe
>> version.
> I don't think it makes sense to worry about the cases where the next
> entry to iterate may be removed by the iterator.  What I'm trying to
> say is just make the iteration always safe and don't worry about the
> distinction.  For list_for_each_entry(), it makes the difference of
> requiring and not requiring a separtae state variable.  Here, we need
> it anyway.

A lot of those functions that need to iterate the list will release the 
lock in the middle, do some stuff, reacquire the lock and move on to the 
next entry. So it is entirely possible that new entries will be inserted 
between the current entry and the next one in between the release and 
re-acquisition of the lock. Using the safe version will skip those newly 
added entries which is a change in behavior for the current code. That 
is my main concern for making it deletion safe by default.


>
>>>> +#ifdef CONFIG_DEBUG_SPINLOCK
>>>> +#define DLOCK_LIST_WARN_ON(x)	WARN_ON(x)
>>>> +#else
>>>> +#define DLOCK_LIST_WARN_ON(x)
>>>> +#endif
>>> I'd just use WARN_ON_ONCE() without the CONFIG guard.
>> I'd do so if it is used in a real function. However, it is used in the
>> dlock_list_iterate() macro which will duplicate the WARN_ON call on all the
>> call sites. That is why I am hesitant to do that.
> They're used in inline functions but not macros.  Just uninline the
> functions?

Yes, by uninlining those functions, I can eliminate that macro.

>>>> +/*
>>>> + * Next per-cpu list entry
>>>> + */
>>>> +#define dlock_list_next_entry(pos, member) list_next_entry(pos, member.list)
>>> Why does this need to be exposed?
>> It was used in one of the modified file. I can change it to an inline
>> function. Or do you mean making it a real function and put it in the .c
>> file?
> If it's used, never mind.
>
>>>> +static __always_inline bool
>>>> +__dlock_list_next_cpu(struct dlock_list_head *head,
>>>> +		      struct dlock_list_state *state)
>>>> +{
>>> ...
>>>> +static inline bool dlock_list_iterate_safe(struct dlock_list_head *head,
>>>> +					   struct dlock_list_state *state)
>>>> +{
>>> Inlining these doesn't make senes to me.
>> They are inlined for performance reason. I can make them real functions if
>> you think it is better.
> The functions aren't that small and inlining functions which aren't
> very small tend to become more expensive pretty quickly with multiple
> users.  I'd just make them proper functions.

Yes, will do so.

>>>> diff --git a/lib/Makefile b/lib/Makefile
>>>> index 499fb35..92e8c38 100644
>>>> --- a/lib/Makefile
>>>> +++ b/lib/Makefile
>>>> +/*
>>>> + * The dlock list lock needs its own class to avoid warning and stack
>>>> + * trace when lockdep is enabled.
>>>> + */
>>> Can you please elaborate on this?
>> Kernel warning was given out and I think something got disabled without that
>> when the kernel booted up with lockdep enabled. I think it is because the
>> locks were dynamically allocated. It will not be a problem if they are
>> statically allocated.
> Ah, okay.  Can you update the comment to include the above
> information.  It doesn't have to be long.  Just mention that this is
> lockdep key for dlock data structure which is dynamically allocated.

I will update the comment.

>>>> +void dlock_list_add(struct dlock_list_node *node, struct dlock_list_head *head)
>>>> +{
>>>> +	struct dlock_list_head *myhead;
>>>> +
>>>> +	/*
>>>> +	 * Disable preemption to make sure that CPU won't gets changed.
>>>> +	 */
>>>> +	myhead = get_cpu_ptr(head);
>>>> +	spin_lock(&myhead->lock);
>>>> +	node->lockptr =&myhead->lock;
>>>> +	list_add(&node->list,&myhead->list);
>>>> +	spin_unlock(&myhead->lock);
>>>> +	put_cpu_ptr(head);
>>>> +}
>>> I wonder whether it'd be better to use irqsafe operations.  lists tend
>>> to be often used from irq contexts.
>> The current use case only need to use the regular lock functions. You are
>> right that future use cases may require an irqsafe version of locks. I can
>> either modify the code now to allow lock type selection at init time, for
>> example, or defer it as a future enhancement when the need arises. What do
>> you think?
> The bulk of performance gain of dlist would come from being per-cpu
> and I don't think it's likely that we'd see any noticeable difference
> between irq and preempt safe operations.  Given that what's being
> implemented is really low level operations, I'd suggest going with
> irqsafe from the get-go.
>
>>>> +void dlock_list_del(struct dlock_list_node *node)
>>>> +{
>>>> +	spinlock_t *lock = READ_ONCE(node->lockptr);
>>>> +
>>>> +	if (unlikely(!lock)) {
>>>> +		WARN(1, "dlock_list_del: node 0x%lx has no associated lock\n",
>>>> +			(unsigned long)node);
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	spin_lock(lock);
>>>> +	if (likely(lock == node->lockptr)) {
>>>> +		list_del_init(&node->list);
>>>> +		node->lockptr = NULL;
>>>> +	} else {
>>>> +		/*
>>>> +		 * This path should never be executed.
>>>> +		 */
>>> What if it races against someone else removing and adding back?
>>> Shouldn't it retry on those cases?
>> If it is racing with another thread removing and adding back, retrying the
>> deletion may not be the right thing to do. It really depends on what kind of
>> deletion the callers are doing. I will clarify in the comment further change
>> may be needed in the future if future callers are doing complicated deletion
>> that requires more elaborate exception handling.
> I haven't really thought about it but it could be that in the racing
> cases the order doesn't matter and just skipping the operation is
> fine.  I'm not sure triggering warning is the right answer tho.
>
> Thanks.

I don't think it is normal to have concurrent deletion of the same 
entry. Most likely it is a bug if this happens. Having the warning 
message in the kernel log will help to catch those errors.

Cheers,
Longman