linux-kernel - Re: process 'stuck' at exit.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.02.1312102157230.28330@ionos.tec.linutronix.de>
Date:	Tue, 10 Dec 2013 22:17:22 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	Dave Jones <davej@...hat.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Mel Gorman <mgorman@...e.de>, Oleg Nesterov <oleg@...hat.com>
Subject: Re: process 'stuck' at exit.

On Tue, 10 Dec 2013, Linus Torvalds wrote:

> On Tue, Dec 10, 2013 at 12:33 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
> >
> > The -EAGAIN is when the user value changed, simplified:
> 
> No it's not.
> 
> Thomas, stop this crap already. Look at the f*cking code carefully
> instead of just dismissing cases.
> 
> The worrisome EAGAIN case is
> 
>   futex_requeue
>     futex_proxy_trylock_atomic
>       futex_lock_pi_atomic
>         lookup_pi_state:
>           ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
> 
> and now futex_requeue() will do "goto repeat" for that EAGAIN case.
> 
> So, Christ, Thomas, you have now *twice* dismissed a real concern with
> totally bogus "that can never happen" by explaining some totally
> unrelated *simple* case rather than the much more complex case.
> 
> So please. Really. Truly look at the code and thing about it, or shut
> the f*ck up. No more of this shit where you glance at the code, find
> some simple case, and say "that can't happen", and dismiss the
> bug-report.

Well, I spent a fricking long time to work on that code and find the
absurdest bugs in it and I'm well aware of the exit issue.

> So far Dave's bug-reports have generally pretty much universally shown
> real bugs. Being dismissive about it is not helpful, quite the
> reverse.

I know and I used that information more than once to carefully track
down the real reason. I never dismissed a single report.

> Maybe the loop I'm pointing at cannot happen, but *your* explanation
> for why it couldn't happen was pure and utter garbage, and was clearly
> because you hadn't even bothered to look at all the cases.

I might have been sloppy and not really explaining why I think, that
the requeue_pi exit case is not likely.

To make that loop happen it requires the following:

1) An actual user of requeue_pi, which can only be the fuzzer as glibc
   does not support it. But Daves last fuzzed syscall was something
   else.

2) The report said, that the last fuzzing test was already done and
   the fuzzer app is about to exit.

   That involves futex_requeue from deep inside glibc thread
   library. And that is NOT using REQUEUE_PI.

3) So now even IF it would use requeue_pi then this would require,
   that the outer PI lock is already held by some other task which is
   already in the process of exiting. IOW, this lock would be held by
   something which did not release it before exit and already set
   PF_EXITING in exit_signals() and then got stuck for ever before
   reaching: tsk->flags |= PF_EXITPIDONE;

I still think, that it is highly unlikely, but to make sure I already
asked Dave before reading your rant to fire up the tracer, so we know
for sure where hell the thing is looping.

Thanks,

	tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/