linux-kernel - Re: [RFC,PATCH 5/5] exec: RT sub-thread can livelock and monopolize CPU on exec

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070820150727.GA304@tv-sign.ru>
Date:	Mon, 20 Aug 2007 19:07:27 +0400
From:	Oleg Nesterov <oleg@...sign.ru>
To:	Roland McGrath <roland@...hat.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Gautham R Shenoy <ego@...ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC,PATCH 5/5] exec: RT sub-thread can livelock and monopolize CPU on exec

On 08/19, Roland McGrath wrote:
>
> I had in mind unifying this need with what's now done by the notify_count
> check that's in __exit_signal.  Aside from those BUG_ON's, I'm not sure
> de_thread really cares whether the other non-leader threads are finished
> self reaping, or only if they are dead.  Adding some field to signal_struct
> for this new mechanism is certainly fine if it goes along with removing a
> word or two from task_struct (notify_count, group_exit_task).  (The single
> other use overloaded on group_exit_task is for a similar need in the
> pre-coredump synchronization.  So maybe that can be done more cleanly too.)
> Any new field could be kept to a single pointer; since it's only needed
> while one given thread is blocking, it can point to something larger on its
> stack if necessary.

I seem to understand what you mean... Yes, at first glance, we can consider
the sub-thread with ->exit_state != 0 as "dead enough" for de_thread. This
allows us to simplify the logic greatly.

But: we should somehow change the ->group_leader for all sub-threads which
didn't do release_task(self) yet, nasty. (Perhaps it makes sense to move
->group_leader into signal_struct. It is not safe to use it if tsk already
did __exit_signal() anyway).

Another problem, it is not easy to define when the ->group_exit_task (or
whatever) should be notified from exit_notify(), we need another atomic_t
for that (like signal_struct.live).

In fact, it is not necessary to put this counter into signal_struct,
de_thread() can count sub-threads (say, modify zap_other_threads() to
return "int") and put this info into the structure on stack, as you
pointed out.

Imho, this definitely worth thinking. See also below.

But what do you think about this patch? Should we go with it for now,
or we should ignore the problem until we can cleanup the whole thing?
I do not claim this problem is very much important, but this yield()
is really buggy (perhaps it is always buggy).

> While we are considering cleaning up the exec synchronization, there is a
> long-standing issue it would be nice to address.  That is, the race of a
> group fatal signal with exec. (I've mentioned this before, but never come
> up with an adequate solution.)
>
> An obvious path to a fix for that is to avoid overloading SIGNAL_GROUP_EXIT
> in de_thread.  In coming up with different synchronization method we might
> find a way that cleans that up too.

Yes, yes, yes, these problems are really connected, and I also thought about
that.

But can't we first cleanup some other oddities with signal->flags? For example,
it would be nice to kill SIGNAL_STOP_DEQUEUED, but we can't because
handle_stop_signal(SIGKILL) clears ->signal->flags. And this is done because
tkill() doesn't do what __group_complete_signal() does for sig_fatal() signals
(I mean this big "if (sig_fatal(p, sig) ...").

Why? can't we split __group_complete_signal() into 2 functions, the new one
is used both by do_tkill() and __group_send_sig_info() ?

Just one example. Suppose that an application does

	signal(SIGTERM, SIG_DFL);
	sigtimedwait( {SIGTERM} );

Now,
	tkill(SIGTERM)	=> sigtimedwait() succeeds

	kill(SIGTERM)	=> application is killed UNLESS it has TIF_SIGPENDING

This looks really strange for me.

While we are here, I'd like to ask another question (sorry for the long and
somewhat off-topic post :)

Suppose that we have sub-threads T1 and T2, both do not block SIGXXX.
kill(SIGXXX) choose T1 as a target and does signal_wake_up(T1).
T1 can enter sys_sigprocmask(SIG_BLOCK, {SIGXXX}) before it sees the signal.
Now SIGXXX is delayed until T2 does recalc_sigpending().

Is it ok? This is easy to fix, for example sys_sigprocmask() can check
signal_pending() and return ERESTARTNOINTR.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/