linux-kernel - Re: [PATCH -mm 1/2] irq_work, Use llist in irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110901130038.GA23048@Krystal>
Date:	Thu, 1 Sep 2011 09:00:38 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Huang Ying <ying.huang@...el.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH -mm 1/2] irq_work, Use llist in irq_work

* Mathieu Desnoyers (mathieu.desnoyers@...icios.com) wrote:
> * Huang Ying (ying.huang@...el.com) wrote:
> > On 09/01/2011 03:58 PM, Peter Zijlstra wrote:
> > > On Thu, 2011-09-01 at 11:20 +0800, Huang Ying wrote:
> > >> On 09/01/2011 09:46 AM, Huang Ying wrote:
> > >>>>> -static void __irq_work_queue(struct irq_work *entry)
> > >>>>> +static void __irq_work_queue(struct irq_work *work)
> > >>>>>  {
> > >>>>> -	struct irq_work *next;
> > >>>>> +	struct irq_work_list *irq_work_list;
> > >>>>>  
> > >>>>> -	preempt_disable();
> > >>>>> +	irq_work_list = &get_cpu_var(irq_work_lists);
> > >>>>>  
> > >>>>> -	do {
> > >>>>> -		next = __this_cpu_read(irq_work_list);
> > >>>>> -		/* Can assign non-atomic because we keep the flags set. */
> > >>>>> -		entry->next = next_flags(next, IRQ_WORK_FLAGS);
> > >>>>> -	} while (this_cpu_cmpxchg(irq_work_list, next, entry) != next);
> > >>>>> +	llist_add(&work->llnode, &irq_work_list->llist);
> > >>>>>  
> > >>>>>  	/* The list was empty, raise self-interrupt to start processing. */
> > >>>>> -	if (!irq_work_next(entry))
> > >>>>> +	if (!test_and_set_bit(LIST_NONEMPTY_BIT, &irq_work_list->flags))
> > >>>>>  		arch_irq_work_raise();
> > >>>>
> > >>>> So why can't you simply test work->llnode->next?
> > >>>
> > >>> Yes.  That is better.  Even if there may be a small race window, it is
> > >>> not a big issue to raise one extra self interrupt seldom.
> > >>
> > >> Remember something about this.  I didn't test work->llnode->next here
> > >> because I didn't want expose the implementation details like that here.
> > >>  How about make llist_add() return whether list is empty before adding?
> > >>  Because it will be an inline function, that should be optimized out if
> > >> the caller do not need the information.
> 
> Yes, that would make sense.
> 
> something like
> 
> static inline
> struct llist_node *llist_add(struct llist_node *new, struct llist_head *head)
> {
> 	struct llist_node *entry, *old_entry;
> 
> #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> 	BUG_ON(in_nmi());
> #endif
> 
> 	entry = head->first;
> 	do {
> 		old_entry = entry;
> 		new->next = entry;
> 		cpu_relax();
> 	} while ((entry = cmpxchg(&head->first, old_entry, new)) != old_entry);
>         return old_entry;
> }
> 
> Where llist_add returns NULL if the list was initially empty, else
> returns non-null. This return value usage should be documented in the
> function header, of course.
> 
> BUT, big warning here: this whole thing _should_ be renamed as a
> "lockless stack" rather than a "lockless list". Really. I said it in the
> past, and here I see another example showing why we should. The only
> reason why we can get away this easily with knowing atomically if the
> structure was empty prior to the insertion is because this "list"
> hehaves like a stack (LIFO). I foresee we'll need to add "lockless
> queues" at some point (FIFO), which has the ability to enqueue/dequeue
> concurrently without sharing the same cache-lines when the queue is
> non-empty. Within that kind of queue structure, knowing if the queue was
> empty prior to insertion would become a bottleneck, so I would not
> advise to make that part of _that_ API, which would require to add a new
> "llqueue" API. And at that point, the "llist" vs "llqueue" semantic
> would become really confusing. Maybe "llstack" would be more appropriate
> here instead of llist ? Or llfifo ? The API we can provide with a

I meant lllifo here, sorry. (3 'l's in a row is kind of messy though).

Thanks,

Mathieu

> lock-less structure does not only depend on how the elements are
> organized, but also on the operations allowed on the structure. So the
> API should reflect that. So I'm saying this warning again: if we don't
> fix that now, I think we're heading into a lock-free structure
> namespacing trainwreck that will limit our ability to add other
> lock-free operations due vague naming that does not take the operations
> allowed on the structure into consideration, combined with API
> constraints permitted by a specific given behavior (e.g. FIFO semantic)
> that tend to define these lock-free APIs.
> 
> I've been working on lock-free structures myself in Userspace RCU: I
> have lock-free stack and queue, wait-free stack, wait-free queue, and
> (work in progress), RCU red-black tree (not lock-free), and lock-free
> RCU expandable hash table. If we plan namespacing of lock-free data
> structure correctly, I think there will be room for very interesting
> performance and scalability improvements in the future.
> 
> 
> > > 
> > > You could also use llist_empty() although that brings me to that
> > > ACCESS_ONCE thing in there, what's the point?
> > 
> > Something as follow with llist_empty() seems not work.
> > 
> >   empty = llist_empty(irq_work_list);
> >   llist_add(&work->llnode, irq_work_list);
> >   if (empty)
> >           arch_irq_work_raise();
> > 
> > Because irq_work IRQ handler or timer IRQ handler may be executed just
> > before "llist_add(&work->llnode, irq_work_list)", so that, although
> > "empty == false", arch_irq_work_raise() still should be executed.
> > 
> > Can you tell me how to that with llist_empty()?
> > 
> > 
> > For ACCESS_ONCE, Mathiew suggest me to add it,
> > 
> > Hi, Mathiew,
> > 
> > Can you explain why ACCESS_ONCE should be used here?
> 
> It was mainly to force the compiler to re-fetch the head->first value
> from memory in busy-waiting loops. So even though the right practice is
> to have a cpu_relax() in the body of the loop (which would force the
> refetch due to the compiler barrier), having the ACCESS_ONCE in
> llist_empty() should not hurt much. It also seems to be what atomic*.h
> does (volatile read), so I guess the expected behavior wrt atomic
> accesses are to behave as volatile, hence my recommendation to make this
> llist_empty a volatile access. Quoting my previous email on that topic:
> 
> "Would it make sense to do:
> 
>  return ACCESS_ONCE(head->first) == NULL;
> 
> instead ? Otherwise the compiler can choose to keep the result around in
> registers without re-reading (e.g. busy waiting loop)."
> 
> So I was not implying it was strictly required, merely wondering whether
> it should be added.
> 
> Best regards,
> 
> Mathieu
> 
> > 
> > Best Regards,
> > Huang Ying
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/