linux-kernel - Re: [rfc] "fair" rw spinlocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20091210062220.GC6720@linux.vnet.ibm.com>
Date:	Wed, 9 Dec 2009 22:22:20 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	Christoph Hellwig <hch@...radead.org>,
	Nick Piggin <npiggin@...e.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [rfc] "fair" rw spinlocks

On Wed, Dec 09, 2009 at 04:37:09PM +0100, Oleg Nesterov wrote:
> On 12/07, Eric W. Biederman wrote:
> >
> > Oleg Nesterov <oleg@...hat.com> writes:
> >
> > > On 12/05, Eric W. Biederman wrote:
> > >>
> > >> Atomically sending signal to every member of a process group, is the
> > >> big fly in the ointment I am aware of.  Last time I looked I could
> > >> not see how to convert it rcu.
> > >
> > > I am not sure, but iirc we can do this lockless (under rcu_lock).
> > > We need to modify pid_link to use list_entry and attach_pid() should
> > > add the new task to the end. Of course we need more changes, but
> > > (again iirc) this is not too hard.
> >
> > The problem is that even adding to the end of the list, we could run
> > into a deleted entry and not see the new end of the list.
> >
> > Suppose when we start iterating the list we have:
> >
> >   A -> B -> C -> D
> >
> > Then someone deletes some of the entries while we are iterating the list.
> >
> > A ->
> >  B' -> C' -> D'
> >
> > We will continue on traversing through the deleted entries.
> >
> > Then someone adds a new entry to the end of the list.
> >
> > A-> N
> >
> > Since we are at B', C' or D' we will never see the new entry on the
> > end of the list.
> 
> Yes, but who can add the new entry?
> 
> Let's forget about setpgrp/etc for the moment, I think we have "races"
> with or without tasklist. Say, setpgrp() can add the new process to the
> already "killed" pgrp.
> 
> Then, I think the only important case is SIGKILL/SIGSTOP (or other
> signals which can't be blockes/ignored). We must kill/stop the entire
> pgrp, we must not race with fork() and miss a child.
> 
> In this case I _think_ rcu_read_lock() is enough,
> 
> 	rcu_read_lock()
> 
> 	list_for_each_entry_rcu(task, pid->tasks[PIDTYPE_PGID)
> 		group_send_sig_info(sig, task);
> 
> 	rcu_read_unlock();
> 
> except group_send_sig_info() can race with mt-exec, but this is simple
> to fix.
> 
> If we send a signal (not necessary SIGKILL) to a process P, we must see
> all childs which were forked by P, both send_signal() and copy_process()
> take the same ->siglock, we must see the result of list_add_tail_rcu().
> And, after we sent SIGKILL/SIGSTOP, it can't fork the new child.
> 
> If list_for_each_entry() does not see the exited process P, this means
> we see the result of list_del_rcu(). But this also means we must the
> the result of the previous list_add_rcu().
> 
> IOW, fork+exit means list_add_rcu() + wmb() + list_del_rcu(), if we
> don't see the new entry on list, we must see the new one, right?
> 
> (I am ignoring the case when list_for_each_entry_rcu() sees a process
>  P but lock_task_sighand(P) fails, I think this is the same as if we
>  we missed P)
> 
> Now suppose a signal is blocked/ignored or has a handler. In this case
> we can miss a child, but I think this is OK, we can pretend the new
> child was forked after kill_pgrp() completes. Say, this child C was
> forked by some process P. We can miss C only if it was forked after
> we already sent the signal to P.
> 
> However. I do not pretend the reasoning above is "complete", and
> perhaps I missed something else.

My main concern would be "fork storms", where each CPU in a large
system is spawning children in a pgrp that some other CPU is attempting
to kill.  The CPUs spawning children might be able to keep ahead of
the single CPU, so that the pgrp never is completely killed.

Enlisting the aid of the CPUs doing the spawning (e.g., by having them
consult a list of signals being sent) prevents this fork-storm scenario.

							Thanx, Paul

> > Additionally we have the other possibility that if a child is forking
> > we send the signal to the parent after the child forks away but before
> > the child joins whichever list we are walking, and we complete our
> > traversal without seeing the child.
> 
> Not sure I understand... But afaics this case is covered above.
> ->siglock should serialize this, copy_process() does attach_pid()
> under this lock.
> 
> Oleg.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/