lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1575627084926.26450@mentor.com>
Date:   Fri, 6 Dec 2019 10:11:25 +0000
From:   "Schmid, Carsten" <Carsten_Schmid@...tor.com>
To:     Davidlohr Bueso <dave@...olabs.net>,
        Peter Zijlstra <peterz@...radead.org>
CC:     "mingo@...hat.com" <mingo@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "walken@...gle.com" <walken@...gle.com>
Subject: AW: Crash in fair scheduler

> Von: Davidlohr Bueso [mailto:dave@...olabs.net]
> Gesendet: Donnerstag, 5. Dezember 2019 18:41
> 
> Yeah I had never seen this either, and would expect the world to fall
> appart if leftmost is buggy (much less a one time occurance), but the
> following certainly raises a red flag:
> 
>     &cfs_rq->tasks_timeline->rb_leftmost
>   tasks_timeline = {
>     rb_root = {
>       rb_node = 0xffff99a9502e0d10
>     },
>     rb_leftmost = 0x0
>   },
> 
Meanwhile i am diving a bit deeper into the kernel dump.
I can see that for this rb_root we have a node structure with 2 nodes:
crash> p -x *(struct rb_node *)0xffff99a9502e0d10
$7 = {
  __rb_parent_color = 0xffff99a9502e0d10, <- points to SELF
  rb_right = 0xffff99a9502e0d10, <- points to self
  rb_left = 0xffff99a9502e1990 <- and we have a node left
}

The rb_left node:
crash> p -x *(struct rb_node *)0xffff99a9502e1990
$6 = {
  __rb_parent_color = 0xffff99a9502e0d11, <- points to the rb_root node (bit 0 is color)
  rb_right = 0x0, <- no leaf
  rb_left = 0x0 <- no leaf
}

I'm currently trying to extract the information what se (scheduling entity)
covers these nodes.
Anyway, the cfs_rq->tasks_timeline.rb_leftmost should point to 0xffff99a9502e1990
as far as i understand the rb_tree, right?

> >
> >I suppose one approach is to add code to both __enqueue_entity() and
> >__dequeue_entity() that compares ->rb_leftmost to the result of
> >rb_first(). That'd incur some overhead but it'd double check the logic.
> 
> We could benefit from improved debugging in rbtrees, not only the cached
> flavor. Perhaps we can start with the following -- this would at least
> let us know if the case where the tree is non-empty and leftmost is nil
> was hit, whether in the scheduler or another user...
> 
> Thanks,
> Davidlohr
> 
That's what i will do too, add some debugging stuff.
Add that to the project i'm on here, not upstream; and try
to log as much debug data as possible if a similar case occurs again.
But as rb_tree is excessively used i need to be careful where
to add debug code due to performance impact.

The approach you do with a configurable rb_tree debug
might help me here, yes; i would have taken a similar approach.

Thanks,
Carsten

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ