[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 6 Jun 2014 16:24:39 +0000
From: David Laight <David.Laight@...LAB.COM>
To: David Laight <David.Laight@...LAB.COM>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: SCTP seems to lose its socket state.
From: David Laight
> From: David Laight
> > I've been looking at an ethernet trace from one of our customers.
> > They seem to have got an SCTP socket into a rather confused state.
> >
> > There seem to be a significant number of transmit ethernet frames
> > that don't read the far end.
> > This shouldn't cause a real problem, but we end up with the following:
> > This trace was taken on the linux system:
> >
> > 39964 0.304473 -> SCTP INIT
> > 39965 0.292669 <- SCTP INIT (I think this has an invalid checksum)
> > 39968 0.467935 <- SCTP INIT
> > 39969 0.000093 -> SCTP INIT_ACK
> > 39970 0.003947 <- SCTP COOKIE_ECHO
> > 39971 0.000072 -> SCTP COOKIE_ACK
> > 39972 0.000337 -> M3UA ASPUP
> > 39979 0.809659 <- SCTP COOKIE_ECHO
> > 39980 0.000058 -> SCTP COOKIE_ACK
> > shutdown() called here - seems to be ignored
> > 39983 0.949471 <- SCTP COOKIE_ECHO
> > 39984 0.000053 -> SCTP COOKIE_ACK
> > 39986 0.730072 -> M3UA ASPUP Same TSN as above
> > 40002 0.270589 -> M3UA ASPUP Same TSN as above
> > 40008 3.689088 <- SCTP HEARTBEAT
> > 40009 0.000027 -> SCTP HEARTBEAT_ACK
> > 40014 0.261152 <- SCTP HEARTBEAT
> > 40015 0.000033 -> SCTP HEARTBEAT_ACK
> > 40026 0.123048 <- SCTP HEARTBEAT
> > 40027 0.000030 -> SCTP HEARTBEAT_ACK
> > 40036 1.615048 -> M3UA ASPUP Same TSN as above
> >
> > There are no signs of any SACKs for the ASPUP, I think they have the
> > correct TSN (the same value as in the INIT_ACK).
> > No signs of any shutdowns or aborts from either system.
> >
> > As seems to be typical for M3UA the source and destination ports are
> > the same. No additional IP addresses appear in the INIT (etc) messages.
>
> I think I've reproduced this on a 3.14.0 kernel.
>
> System A: Bind to port 1234, connect to B:1234.
> If the connect fails, retry 10 seconds later.
> When the connection completes send some data.
> Disconnect if the reflected data isn't received within 2 seconds.
> System B: Bind to port 1234, connect to A:1234.
> If the connect fails, retry 10 seconds later.
> Reflect any received data.
>
> Initially the INIT chunks generate ABORTs (no listener) so both
> programs just retry every 10 seconds.
>
> On B run:
> iptables -A INPUT -p sctp --chunk-types any INIT -j DROP
> iptables -A INPUT -p sctp --chunk-types any DATA -j DROP
> The first allows the connection to complete.
> The second stops B acking the data.
> The data is resent on timeout, and the systems exchange HBs.
>
> I'd expect that a SHUTDOWN or ABORT be sent reasonably quickly.
> But the systems just exchange HBs for over 5 minutes.
> (I'm seeing an ABORT because B gives up waiting for the message.)
Seems I wasn't waiting long enough.
A does eventually send an ABORT after about 7 minutes.
In the customer's trace the remote (B) system has silently given up
and then resends an INIT well before the 7 minutes have elapsed.
I'm not sure how to go about reproducing that (without major
kernel hacking).
Maybe I can suppress the ABORT from B.
David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists