[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTikRJ3QBRJ_VsoMG4Q+HMs=a9C8XkJ31dtdKrfpO@mail.gmail.com>
Date: Sun, 21 Nov 2010 09:42:57 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Pekka Enberg <penberg@...nel.org>
Cc: oleg@...hat.com, LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [PROBLEM] WARNING: at kernel/exit.c:910 do_exit
On Sun, Nov 21, 2010 at 7:35 AM, Pekka Enberg <penberg@...nel.org> wrote:
>
> The following warning triggered on me while I was browsing the web:
>
> http://twitpic.com/38vxxg
>
> [ Click on the "Rotate photo" button for landscape version. ]
>
> It's
>
> WARN_ON(atomic_read(&tsk->fs_excl));
>
> in do_exit(). There was a prior oops in __pipe_free_info() called in
> sys_recvmsg() paths that unfortunately scrolled away.
That WARN_ON() is almost certainly due to the previous oops.
The previous oops may have scrolled away, but you can see the
call-chain, since it's part of the later oops. Except the photo is
hard to read ;)
In fact, you can see that there has been _two_ oopses before that. The
"free_pipe_info()" oops comes from the "do_exit()" path of the _first_
oops.
So the original oops seems to be around here:
(*probably* oopsed in __scm_destroy)
(the fd_install on the stack is likely from scm_detach_fds calling
it before calling __scm_destroy - just a stale pointer remaining on
the stack)
scm_detach_fds
unix_stream_recvmsg
sock_recvmsg
__sys_recvmsg
sys_recvmsg
which means that this is almost certainly in networking. Then, when
that oops caused us to die, do_exit() tried to clean up the state, and
_that_ caused us to oops again (now in free_pipe_info). That second
oops is the partial one you see. And then the _third_ oops is the one
you actually caught.
The free_pipe_info() oops in turn must be because we passed in an
invalid "inode" pointer. It's almost certainly the "inode->i_pipe"
dereference, so inode was NULL or something. I don't see why that
would happen, but with a previous oops it's not necessarily clear that
there _is_ a reason.
And who knows? It may be that the networking oops was due to some
other earlier problem that isn't part of this particular callchain and
that has long since scrolled away. I don't see any unix domain changes
since -rc1.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists