[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130304073908.GA982@kroah.com>
Date: Mon, 4 Mar 2013 15:39:08 +0800
From: Greg KH <gregkh@...uxfoundation.org>
To: Russ Dill <russ.dill@...il.com>
Cc: Al Viro <viro@...iv.linux.org.uk>,
linux-kernel <linux-kernel@...r.kernel.org>,
Nick Kossifidis <mickflemm@...il.com>,
Theodore Ts'o <tytso@....edu>
Subject: Re: fasync race in fs/fcntl.c
On Sun, Mar 03, 2013 at 10:16:10PM -0800, Russ Dill wrote:
> On Sat, Mar 2, 2013 at 4:09 PM, Russ Dill <russ.dill@...il.com> wrote:
> > On Sat, Mar 2, 2013 at 11:49 AM, Al Viro <viro@...iv.linux.org.uk> wrote:
> >> On Sat, Mar 02, 2013 at 03:00:28AM -0800, Russ Dill wrote:
> >>> I'm seeing a race in fs/fcntl.c. I'm not sure exactly how the race is
> >>> occurring, but the following is my best guess. A kernel log is
> >>> attached.
> >>
> >> [snip the analysis - it's a different lock anyway]
> >>
> >> The traces below are essentially sys_execve() getting to get_random_bytes(),
> >> to kill_fasync(), to send_sigio(), which spins on tasklist_lock.
> >>
> >> Could you rebuild it with lockdep enabled and try to reproduce that?
> >> I very much doubt that this execve() is a part of deadlock - it's
> >> getting caught on one, but it shouldn't be holding any locks that
> >> nest inside tasklist_lock at that point, so even it hadn't been there,
> >> the process holding tasklist_lock probably wouldn't have progressed any
> >> further...
> >
> > ok, I did screw up the analysis quite badly, luckily, lockdep got it right away.
> >
>
> So lockdep gives some clues, but seems a bit confused, so here's what happened.
>
> mix_pool_bytes /* takes nonblocking_pool.lock */
> add_device_randomness
> posix_cpu_timers_exit
> __exit_signal
> release_task /* takes write lock on tasklist_lock */
> do_exit
> __module_put_and_exit
> cryptomgr_test
>
> send_sigio /* takes read lock on tasklist_lock */
> kill_fasync_rcu
> kill_fasync
> account /* takes nonblocking_pool.lock */
> extract_entropy
> get_random_bytes
> create_elf_tables
> load_elf_binary
> load_elf_library
> search_binary_handler
>
> This would mark the culprit as 613370549 'random: Mix cputime from
> each thread that exits to the pool'. So long as I'm not as crazy on
> the last analysis as this one, may I suggest a revert of this commit
> for 3.8.3?
I'll revert it, but shouldn't we fix this properly upstream in Linus's
tree as well? I'd rather take the fix than a revert so that we don't
have a problem that no one remembers to fix until 3.9-final is out.
thanks,
greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists