[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191203140153.GP2844@hirez.programming.kicks-ass.net>
Date: Tue, 3 Dec 2019 15:01:53 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: "Schmid, Carsten" <Carsten_Schmid@...tor.com>
Cc: "mingo@...hat.com" <mingo@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
walken@...gle.com, dave@...olabs.net
Subject: Re: Crash in fair scheduler
On Tue, Dec 03, 2019 at 10:51:46AM +0000, Schmid, Carsten wrote:
> > > struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
> > > {
> > > struct rb_node *left = rb_first_cached(&cfs_rq->tasks_timeline);
> > >
> > > if (!left)
> > > return NULL; <<<<<<<<<< the case
> > >
> > > return rb_entry(left, struct sched_entity, run_node);
> > > }
> >
> > This the problem, for some reason the rbtree code got that rb_leftmost
> > thing wrecked.
> >
> Any known issue on rbtree code regarding this?
I don't recall ever having seen this before. :/ Adding Davidlohr and
Michel who've poked at the rbtree code 'recently'.
> > > Is this a corner case nobody thought of or do we have cfs_rq data that is
> > unexpected in it's content?
> >
> > No, the rbtree is corrupt. Your tree has a single node (which matches
> > with nr_running), but for some reason it thinks rb_leftmost is NULL.
> > This is wrong, if the tree is non-empty, it must have a leftmost
> > element.
> Is there a chance to find the left-most element in the core dump?
If there is only one entry in the tree, then that must also be the
leftmost entry. See your own later question :-)
> Maybe i can dig deeper to find the root c ause then.
> Does any of the structs/data in this context point to some memory
> where i can continue to search?
There are only two places where rb_leftmost are updated,
rb_insert_color_cached() and rb_erase_cached() (the scheduler does not
use rb_replace_nod_cached).
We can 'forget' to set leftmost on insertion if @leftmost is somehow
false, and we can eroneously clear leftmost on erase if rb_next()
malfunctions.
No clues on which of those two cases happened.
> Where should rb_leftmost point to if only one node is in the tree?
> To the node itself?
Exatly.
I suppose one approach is to add code to both __enqueue_entity() and
__dequeue_entity() that compares ->rb_leftmost to the result of
rb_first(). That'd incur some overhead but it'd double check the logic.
Powered by blists - more mailing lists