lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1901301423130.5537@nanos.tec.linutronix.de>
Date:   Wed, 30 Jan 2019 14:25:20 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Heiko Carstens <heiko.carstens@...ibm.com>
cc:     Sebastian Sewior <bigeasy@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        LKML <linux-kernel@...r.kernel.org>, linux-s390@...r.kernel.org,
        Stefan Liebler <stli@...ux.ibm.com>
Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggered

On Wed, 30 Jan 2019, Heiko Carstens wrote:
> On Wed, Jan 30, 2019 at 01:15:18PM +0100, Thomas Gleixner wrote:
> > On Wed, 30 Jan 2019, Heiko Carstens wrote:
> > > On Tue, Jan 29, 2019 at 06:16:53PM +0100, Sebastian Sewior wrote:
> > > >  	if (unlikely(p->flags & PF_KTHREAD)) {
> > > >  		put_task_struct(p);
> > > 
> > > Last lines of the trace with your additional patch (full log attached):
> > > 
> > >            <...>-50539 [003] ....  2376.398223: sys_futex -> 0x0
> > >            <...>-50539 [003] ....  2376.398223: sys_futex(uaddr: 3ffb7700208, op: 6, val: 1, utime: 0, uaddr2: 3, val3: 0)
> > >            <...>-50539 [003] ....  2376.398225: attach_to_pi_owner: Missing pid 50734
> > >            <...>-50539 [003] ....  2376.398226: handle_exit_race: uval2 vs uval 8000c62e vs 8000c62e (-1)
> > 
> > So the user space value is: 8000c62e. FUTEX_WAITER bit is set and the owner
> > of the futex is PID 50734, which exited long time ago:
> > 
> >            <...>-50734 [000] ....  2376.394936: sched_process_exit: comm=ld64.so.1 pid=50734 prio=120
> > 
> > But at least from the kernel view 50734 has released it last:
> > 
> >            <...>-50734 [000] ....  2376.394930: sys_futex(uaddr: 3ffb7700208, op: 7, val: 3ff00000007, utime: 3ffb3ef8910, uaddr2: 3ffb3ef8910, val3: 3ffc0afe987)
> >            <...>-50539 [003] ....  2376.398223: sys_futex(uaddr: 3ffb7700208, op: 6, val: 1, utime: 0, uaddr2: 3, val3: 0)
> > 
> > Now, if it would have acquired it in userspace again before exiting, then
> > the robust list exit code should have set the OWNER_DIED bit as well, but
> > that's not set....
> > 
> > debug patch for the robust list exit handling below.
> 
> Last lines of trace below (full log attached):

SNIP...

It's the same picture as last time and the only occurence of the futex in
question in the context of the dead task is:

           <...>-56956 [007] ....   658.804018: sys_futex(uaddr: 3ff9e880050, op: 7, val: 3ff00000007, utime: 3ff9b078910, uaddr2: 3ff9b078910, val3: 3ffea67e3f7)

The robust list exit of that task does not contain the user space address 3ff9e880050.

Confused and of course the problem does not reproduce on x86. Sigh.

I'll think about it some more.

Thanks,

	tglx


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ