[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <FB8A4655DFD2B34DB16AE06DDDD6C0E231A71CE3@SJEXCHMB12.corp.ad.broadcom.com>
Date: Wed, 5 Nov 2014 19:16:09 +0000
From: "Charley (Hao Chuan) Chu" <charley.chu@...adcom.com>
To: Cong Wang <cwang@...pensource.com>,
Daniel Borkmann <borkmann@...earbox.net>
CC: netdev <netdev@...r.kernel.org>
Subject: RE: Kernel Oops in __inet_twsk_kill()
Thanks Daniel and Cong,
The problem has been fixed. It is introduced by a third party patch, which decreases the refcnt of timewait socket.
Charley
-----Original Message-----
From: Cong Wang [mailto:cwang@...pensource.com]
Sent: Wednesday, November 05, 2014 10:00 AM
To: Daniel Borkmann
Cc: Charley (Hao Chuan) Chu; netdev
Subject: Re: Kernel Oops in __inet_twsk_kill()
On Wed, Nov 5, 2014 at 8:00 AM, Daniel Borkmann <borkmann@...earbox.net> wrote:
> [ moving to netdev ]
>
> -------- Original Message --------
> Subject: Kernel Oops in __inet_twsk_kill()
> Date: Tue, 4 Nov 2014 23:47:18 +0000
> From: Charley (Hao Chuan) Chu <charley.chu@...adcom.com>
> To: linux-kernel@...r.kernel.org <linux-kernel@...r.kernel.org>
>
> We have situation on our system. It brings the network interface up and down
> every
> a few seconds. Eventually, it brings down the system - the kernel crashed
> due to BUG
> on in __inet_twsk_kill(). The debug message show following call flow.
>
> 1) time-wait socket is created by tcp_time_wait() when the socket gets into
> "TIME_WAIT" state.
> inet_twsk_alloc() - refcnt= 0
> inet_twsk_hashdance() - refcnt = 3
> inet_twsk_schedule() - refcnt = 4
> inet_twsk_put() - refcnt = 3
> 2) tcp_v4_timewait_ack() is called when sync is received
> inet_twsk_put() - refcnt= 2 <== where we thing the
> problem is
> occasionally, second sync is received, so the inet_twsk_put is called
> twice - refcnt = 1
> 3) twdr_do_twkill_work() is called when timed out
> call __inet_twsk_kill - BUG_ON!!! as refcnt=2 (supposed to be 3).
> call inet_twsk_put()
>
> In a normal case, the callflow only has step 1 and step 3. Our
> understanding is
> the time-wait socket has three references - ehash, bhash and timer death
> row. In
> step 2, none of them are touched. Can anyone here explain to us why the
> inet_twsk_put()
> is called in tcp_v4_timewait_ack()?
>
It has been there for a rather long time, but this doesn't mean it is
correct. Its caller calls inet_twsk_put() on error path, so smells wrong
to call it on non-error path. But I don't look into this.
Powered by blists - more mailing lists