lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191203103046.GJ2827@hirez.programming.kicks-ass.net>
Date:   Tue, 3 Dec 2019 11:30:46 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Schmid, Carsten" <Carsten_Schmid@...tor.com>
Cc:     "mingo@...hat.com" <mingo@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Crash in fair scheduler

On Tue, Dec 03, 2019 at 09:11:14AM +0000, Schmid, Carsten wrote:
> Hi maintainers of the fair scheduler,
> 
> we had a crash in the fair scheduler and analysis shows that this could happen again.
> Happened on 4.14.86 (LTS series) but failing code path still exists in 5.4-rc2 (and 4.14.147 too).

Please, do try if you can reproduce with Linus' latest git. I've no idea
what is, or is not, in those stable trees.

> crash> * cfs_rq ffff99a96dda9800
> struct cfs_rq {
>   load = {  weight = 1048576,  inv_weight = 0  }, 
>   nr_running = 1, 
>   h_nr_running = 1, 
>   exec_clock = 0, 
>   min_vruntime = 190894920101, 
>   tasks_timeline = {  rb_root = {    rb_node = 0xffff99a9502e0d10  },   rb_leftmost = 0x0  }, 
>   curr = 0x0, 
>   next = 0x0, 
>   last = 0x0, 
>   skip = 0x0, 


> &cfs_rq->tasks_timeline->rb_leftmost
>   tasks_timeline = {
>     rb_root = {
>       rb_node = 0xffff99a9502e0d10
>     }, 
>     rb_leftmost = 0x0
>   }, 

> include/linux/rbtree.h:91:#define rb_first_cached(root) (root)->rb_leftmost

> struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
> {
> 	struct rb_node *left = rb_first_cached(&cfs_rq->tasks_timeline);
> 
> 	if (!left)
> 		return NULL; <<<<<<<<<< the case
> 
> 	return rb_entry(left, struct sched_entity, run_node);
> }

This the problem, for some reason the rbtree code got that rb_leftmost
thing wrecked.

> Is this a corner case nobody thought of or do we have cfs_rq data that is unexpected in it's content?

No, the rbtree is corrupt. Your tree has a single node (which matches
with nr_running), but for some reason it thinks rb_leftmost is NULL.
This is wrong, if the tree is non-empty, it must have a leftmost
element.

Can you reproduce at will? If so, can you please try the latest kernel,
and or share the reproducer?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ