linux-kernel - Re: [PATCH RFCv2 2/8] rcu: More rcu-variants for list manipulation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1312292371.1147.178.camel@twins>
Date:	Tue, 02 Aug 2011 15:39:31 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Jan Schönherr <schnhrr@...tu-berlin.de>
Cc:	Paul Turner <pjt@...gle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Dipankar Sarma <dipankar@...ibm.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFCv2 2/8] rcu: More rcu-variants for list manipulation

On Fri, 2011-07-29 at 10:41 +0200, Jan Schönherr wrote:
> Am 27.07.2011 21:10, schrieb Jan H. Schönherr:
> >  /**
> > + * list_add_tail_nobackref - add element to list without setting up
> > + *			     back references
> > + * @new: element to add
> > + * @head: list to add the element to
> > + *
> > + * @new is appended to @head.
> > + *
> > + * In contrast to list_add_tail(), this function does not maintain the
> > + * references back to @head. So, neither @new->next will be changed, nor
> > + * -- in case @new becomes the first entry of @head -- @new->pref.
> > + *
> > + * This is useful when reorganizing deleted elements of a RCU-protected
> > + * list as concurrent readers must always find their way back to the list.
> > + * When used in this context, special care must be taken in order to not
> > + * disturb concurrent readers on @head (or @new):
> > + * a) @new->next must lead back to the list
> > + * b) @new->next must not lead to any node already on @head
> > + * c) @new must be already known to existing readers on @head
> 
> This might actually race with physical deletion. Consider a list:
> 
>    HEAD --> A --> B --> C --> D --> HEAD
> 
> 1. Remove C from list, then D:
> 
>    HEAD --> A --> B --------------> HEAD
>                         C --> D ----^
> 
> 2. Attempt to physically delete D by using call_rcu() or similar.
>    This is delayed until all already running readers have finished.
> 
> 3. Let a new reader begin to traverse the list, advancing until A.
> 
> 4. Remove A.
> 
>    HEAD --------> B --------------> HEAD
>             A ----^      C --> D ----^
> 
> 5. Call list_add_tail_nobackref() for A, then C.
> 
>    HEAD --------> B --------------> HEAD
>    LIST --> A ---------> C --> D ----^
> 
> 6. Finish step 2, really deleting D.
> 
> 7. Let the new reader continue its traversal.
> 
> This reader will hit D, which is obviously bad.
> 
> 
> I think, we can avoid this by modifying the next pointers
> of the deleted elements and letting them point to HEAD.
> That is:
> 
> 5a. Call list_add_tail_nobackref() for A.
> 
>    HEAD --------> B --------------> HEAD
>                         C --> D ----^
>    LIST --> A -----------------------^
> 
> 5b. Call list_add_tail_nobackref() for C.
> 
>    HEAD --------> B --------------> HEAD
>                               D ----^
>    LIST --> A --------> C -----------^
> 
> 
> 
> However, this (and also the previous version) might
> cause readers to skip elements that are on the list
> (B in the example above). Can we tolerate that?
> 
> My current guess would be: no.
> 
> Thinking more about that, we already have that case:
> 
> 
>    HEAD --> A --> B --> C --> HEAD
> 
> 1. Let a reader traverse until A.
> 
> 2. Remove A.
> 
>    HEAD --------> B --> C --> HEAD
>             A ----^
> 
> 3. Add A after B.
> 
>    HEAD --------> B     C --> HEAD
>                   +> A -^
> 
> 4. Let reader continue traversal, skipping B.
> 
> (Similarly, if we had removed C and re-added it between A and B,
> we could have traversed B twice.)
> 
> So either the presented solution in this patch series
> is valid, or -- when we are not allowed to re-add
> elements which might still have readers -- the current
> handling of CFS leaf runqueues has an additional problem
> besides mixing up the order sometimes...
> 
> Paul? [Both of you, actually. :)]

*head still reels a little*

Right so traditional wisdom dictates not re-using entries that might
possibly still be in used. I remember us having had a lot of 'fun' with
that in kernel/smp.c..

See commit 6dc19899958e420a931274b94019e267e2396d3e and the follow ups
from Milton Miller.

Our use-case makes this hard because we need to re-insert the entry as
soon as a task wakes up again.. ho humm..

I don't think we really mind missing an entry or visiting a few twice,
as long as there's a solid termination condition for each iteration. Its
load-balancing after all, if we mess up, we'll fix up next time
round ;-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/