[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+55aFzavWN3gGSRtd3Lp6mzY4ndg1af4DhXy1QOCy4HQ8CA9w@mail.gmail.com>
Date: Mon, 30 Jan 2012 08:27:11 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: mingo@...hat.com, hpa@...or.com, linux-kernel@...r.kernel.org,
a.p.zijlstra@...llo.nl, y-goto@...fujitsu.com,
akpm@...ux-foundation.org, tglx@...utronix.de, mingo@...e.hu,
linux-tip-commits@...r.kernel.org
Subject: Re: [tip:sched/core] sched: Fix ancient race in do_exit()
On Sun, Jan 29, 2012 at 10:59 AM, Oleg Nesterov <oleg@...hat.com> wrote:
>
> If we remove set_task_state() from the main waiting loop we can never race
> with __rwsem_do_wake()->try_to_wake_up() seeing us in UNINTERRUPTIBLE state.
> rwsem_down_failed_common() simply can't return until UNINTERRUPTIBLE->RUNNING
> transition is finished (__rwsem_do_wake does wakeup first).
>
> And since we do not play with current->state after spin_unlock(), it is
> fine to "race" with waiter->task clearing, just we can do the unnecessary
> but harmless schedule() in TASK_RUNNING.
So the more I think about this code, the more nervous I get.
I did a version that got the sem->wait_lock if it hit the race case
("task" still non-NULL after the schedule), and I could convince
myself that it was race-free and safe. But I couldn't actually really
convince myself that it was better than the old code, because I worry
that we'd actually hit that case much too often (ie another CPU woke
us up, but hasn't cleared "task" yet, so spin on the spinlock etc).
So I do hate the bug in rwsem, but I'm now willing to entertain the
possibility that the bug actually means that unlocking of an rwsem is
"nicely decoupled" from the process that got the semaphore, and may be
a performance advantage.
To be honest, I don't *really* believe that, but changing the rwsem.c
code at this point just scares me enough that without lots and lots of
people looking at it and actually doing some performance analysis, I'm
willing to just say "screw it, I'll take Yasunori Goto's solution".
I'm definitely still not a huge fan of that patch, but I'm not a huge
fan of the alternative either. So Ack on the patch, and I'll just try
to think about this all some more.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists