[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150203200916.GA10545@redhat.com>
Date: Tue, 3 Feb 2015 21:09:16 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Darren Hart <darren@...art.com>,
Thomas Gleixner <tglx@...utronix.de>,
Jerome Marchand <jmarchan@...hat.com>,
Larry Woodman <lwoodman@...hat.com>,
Mateusz Guzik <mguzik@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/1] futex: check PF_KTHREAD rather than !p->mm to
filter out kthreads
Peter,
I am getting more confused when I re-read your email today ;) see below.
Btw, do you agree with 1/1? Can you ack/nack it?
On 02/02, Peter Zijlstra wrote:
>
> On Mon, Feb 02, 2015 at 03:05:15PM +0100, Oleg Nesterov wrote:
>
> > And another question. Lets forget about this ->mm check. I simply can not
> > understand this
> >
> > ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN
> >
> > I must have missed something but this looks buggy, I do not see any
> > preemption point in this "retry" loop. Suppose that max_cpus=1 and rt_task()
> > preempts the non-rt PF_EXITING owner. Looks like futex_lock_pi() can spin
> > forever in this case? (OK, ignoring RT throttling).
>
> So yes, I do like your proposal of putting PF_EXITPIDONE under the
> ->pi_lock section that handles exit_pi_state_list().
Probably I was not clear... Let try again just in case.
I believe that the whole "spin waiting for PF_EXITING -> PF_EXITPIDONE
transition" idea is simply wrong. See the test-case I sent.
I think that attach_to_pi_owner() should never check PF_EXITING and never
return -EAGAIN. It should either proceed and add pi_state to the list or
return -ESRCH if exit_pi_state_list() was called.
Do you agree?
Perhaps we can set PF_EXITPIDONE lockless and avoid the unconditional
lock(pi_lock) but this is minor.
The main problem is that I fail to understand why this logic was added
in the first place... To avoid the race with exit_robust_list() ? I do
not see why this is needed...
> As for the recursive fault; I think the safer option is to set
> EXITPIDONE and not register more PI states, as opposed to allowing more
> and more states to be added. Yes we'll leak whatever currently is there,
> but no point in allowing it to get worse.
Not sure I understand... If you mean recursive do_exit() then yes, I think
that we should simply set EXITPIDONE lockless in a best-effort manner, this
is what the current code does. Just the comment should be updated in any
case imo.
But mostly I was confused by the pseudo-code below. Heh, because I thought
that it describes the changes in kernel/futex.c you think we should do. Now
that I finally realized that it outlines the current code I am unconfused a
bit ;)
Oleg.
> do_exit()
> {
> exit_signals(tsk); /* sets PF_EXITING */
>
> smp_mb();
> raw_spin_unlock_wait(&tsk->pi_lock);
>
> exit_mm() {
> mm_release() {
> exit_pi_state_list();
> }
> }
>
> tsk->flags |= PF_EXITPIDONE;
> }
>
> vs
>
> futex_lock_pi()
> {
> retry:
> ...
>
> ret = futex_lock_pi_atomic() {
> attach_to_pi_owner() {
> raw_spin_lock(&tsk->pi_lock);
> if (PF_EXITING) {
> ret = PF_EXITPIDONE ? -ESRCH : -AGAIN;
> raw_spin_unlock(&tsk->pi_lock);
> return ret;
> }
> }
> }
> if (ret) {
> switch(ret) {
> ...
>
> case -EAGAIN:
> ...
> cond_resched();
> goto retry;
> }
> }
> }
>
> vs
>
> futex_requeue()
> {
> retry:
> ...
>
> ret = futex_proxy_trylock_atomic() {
> ret = futex_lock_pi_atomic() {
> attach_to_pi_owner() {
> raw_spin_lock(&tsk->pi_lock);
> if (PF_EXITING) {
> ret = PF_EXITPIDONE ? -ESRCH : -AGAIN;
> raw_spin_unlock(&tsk->pi_lock);
> return ret;
> }
> }
> }
> }
>
> if (ret > 0) {
> ret = lookup_pi_state() {
> attach_to_pi_owner() {
> raw_spin_lock(&tsk->pi_lock);
> if (PF_EXITING) {
> ret = PF_EXITPIDONE ? -ESRCH : -AGAIN;
> raw_spin_unlock(&tsk->pi_lock);
> return ret;
> }
> }
> }
> }
>
> ...
> switch(ret) {
> ...
> case -EAGAIN:
> ...
> cond_resched();
> goto retry;
> }
> }
>
> vs
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists