lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 11 Jun 2008 18:13:09 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	David Miller <davem@...emloft.net>
cc:	mingo@...e.hu, mcmanus@...ksong.com, peterz@...radead.org,
	LKML <linux-kernel@...r.kernel.org>,
	Netdev <netdev@...r.kernel.org>, rjw@...k.pl,
	Andrew Morton <akpm@...ux-foundation.org>, johnpol@....mipt.ru
Subject: Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections,
 v2.6.26-rc3+

On Tue, 10 Jun 2008, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@...sinki.fi>
> Date: Wed, 4 Jun 2008 02:22:16 +0300 (EEST)
> 
> > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk
> > 
> > It seems that replacement of DA code also moved parts outside
> > of appropriate locking. The Ingo's problem seems to come from
> > the fact that two flows could now race in
> > (inet_csk_)reqsk_queue_add corrupting the queue. ...This can
> > leave dangling socks around which won't resolve themselves
> > without stimuli from outside (e.g., external RST would help
> > I think).
> > 
> > Then some details I'm not too sure of:
> > I guess we want to put listen_sk->sk_state checking under the
> > lock as well. I've not evaluated if ->sk_data_ready too
> > requires locking but assumed it does.
> > 
> > I'm by no means familiar with all locking variants, requirements,
> > etc.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
> 
> I took a close look at this, it seems this patch adds
> an ABBA deadlock.  But I might be wrong.
> 
> Normally the locking order is:
> 
> 	listen_sk --> some_child_sk
> 
> And this can be seen by the code paths that flow down
> to tcp_child_process() in net/ipv4/tcp_minisocks.c (via
> tcp_v4_do_rcv() for example).
> 
> However here in this patch we will lock in the:
> 
> 	child_sk --> listen_sk
> 
> order.
>
> Unless... these defer accepted sockets live outside of the
> normal socket collection found by tcp_v{4,6}_hnd_req().

All these collection/queues make the analysis a bit complex :-), but
I think I finally got a hold of it anyway and my analysis agrees with 
yours. To me it seems that when DAed TCP sk moves to established, there's 
no longer a connection from listen_sk to child_sk, we keep them alive by 
incrementing those refcounts and the child sk is responsible about the 
binding between them (until the DA gets resolution). But the child_sk 
locking during the creation is also a suspect, but only after 
tcp_rcv_established() part can run for the child sock.

Am I right that after tcp_v4_syn_recv_sock() it is possible to get 
tcp_rcv_established() to execute for the created child? It's called 
by this path:
  tcp_v4_hnd_req ->
    tcp_check_req -> 
      tcp_v4_syn_recv_sock (inet_csk(sk)->icsk_af_ops->syn_recv_sock)
...and the listen_sk part acquires that child's lock only after that in 
tcp_v4_hnd_req(). ...Allowing it to deadlock during that short window!

...But from that point onward nothing in listen_sk needs to lock the 
child. Then this connection from listen_sk to child_sk comes back once 
we've reinserted the DAed child_sk back to the queue (in that racy part I 
was trying to fix) but at that point of time we won't ever need to lock in 
child_sk -> listen_sk order again because we've already passed that 
point.

...But like you, I don't understand all of it that well... Btw, your 
sk_callback_lock notes explained some mystery part for me as I came
across with it too while looking around... :-)

> If that is the case, that ought to make this locking order OK
> but I fear that lockdep will likely complain because it has
> no way to see this distinction.

I definately agree that the locks are taken in different order no matter 
what (I actually referring to that already in the first mail about the 
deadlock) so probably lockdep is not going to be too friendly :-), whether 
it could cause a real deadlock, I was not that sure at that point of time.

> If we cannot find a simple and easy way to deal with this locking
> problem, I am reverting the defer-accept changes entirely.

To avoid all problem, I was already thinking of another approach (though 
my time constraints hardly allows finishing it any time soon):

Because DA code resides quite late on the TCP path it would be quite easy
to do some preparatory work, drop child sk's lock and re-acquire both 
locks in the usual order (listen->child) but that would require handling 
correctly reset & other things that could intervene (luckily that pesky 
userspace isn't there yet to mess around so not that many things can 
happen :-)). But ipv6 seems to do some additional processing after 
tcp_rcv_established() which I'd expect to choke if child sk's lock was 
dropped there for a moment, while ipv4 part seems quite doable. I don't 
even know what exactly are the requirements for that ipv6 part (see the 
stuff after ipv6_pktoptions label in tcp_v6_do_rcv). ...I tried to do
this but came across those two things mentioned above.

IMHO, changing locking this late in the release cycle would be quite
risky anyway... And we would also be fiddling with TCP state machine.

> It's not the end of the world if this feature has to be deferred to 
> 2.6.27

Agreed. Especially in the light of the another issue that has been 
raised.

> and this bug has been known for several weeks already.

...That's partially because Ingo didn't even test my fix on the receiver 
which got stuck but used 2.6.25 and got some other bug which looked alike 
but couldn't be the same because these DA problems weren't in 2.6.25. What 
could I've done for that :-).

-- 
 i.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ