[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1575627084926.26450@mentor.com>
Date: Fri, 6 Dec 2019 10:11:25 +0000
From: "Schmid, Carsten" <Carsten_Schmid@...tor.com>
To: Davidlohr Bueso <dave@...olabs.net>,
Peter Zijlstra <peterz@...radead.org>
CC: "mingo@...hat.com" <mingo@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"walken@...gle.com" <walken@...gle.com>
Subject: AW: Crash in fair scheduler
> Von: Davidlohr Bueso [mailto:dave@...olabs.net]
> Gesendet: Donnerstag, 5. Dezember 2019 18:41
>
> Yeah I had never seen this either, and would expect the world to fall
> appart if leftmost is buggy (much less a one time occurance), but the
> following certainly raises a red flag:
>
> &cfs_rq->tasks_timeline->rb_leftmost
> tasks_timeline = {
> rb_root = {
> rb_node = 0xffff99a9502e0d10
> },
> rb_leftmost = 0x0
> },
>
Meanwhile i am diving a bit deeper into the kernel dump.
I can see that for this rb_root we have a node structure with 2 nodes:
crash> p -x *(struct rb_node *)0xffff99a9502e0d10
$7 = {
__rb_parent_color = 0xffff99a9502e0d10, <- points to SELF
rb_right = 0xffff99a9502e0d10, <- points to self
rb_left = 0xffff99a9502e1990 <- and we have a node left
}
The rb_left node:
crash> p -x *(struct rb_node *)0xffff99a9502e1990
$6 = {
__rb_parent_color = 0xffff99a9502e0d11, <- points to the rb_root node (bit 0 is color)
rb_right = 0x0, <- no leaf
rb_left = 0x0 <- no leaf
}
I'm currently trying to extract the information what se (scheduling entity)
covers these nodes.
Anyway, the cfs_rq->tasks_timeline.rb_leftmost should point to 0xffff99a9502e1990
as far as i understand the rb_tree, right?
> >
> >I suppose one approach is to add code to both __enqueue_entity() and
> >__dequeue_entity() that compares ->rb_leftmost to the result of
> >rb_first(). That'd incur some overhead but it'd double check the logic.
>
> We could benefit from improved debugging in rbtrees, not only the cached
> flavor. Perhaps we can start with the following -- this would at least
> let us know if the case where the tree is non-empty and leftmost is nil
> was hit, whether in the scheduler or another user...
>
> Thanks,
> Davidlohr
>
That's what i will do too, add some debugging stuff.
Add that to the project i'm on here, not upstream; and try
to log as much debug data as possible if a similar case occurs again.
But as rb_tree is excessively used i need to be careful where
to add debug code due to performance impact.
The approach you do with a configurable rb_tree debug
might help me here, yes; i would have taken a similar approach.
Thanks,
Carsten
Powered by blists - more mailing lists