lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 16 Apr 2018 22:28:11 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Florian Westphal <fw@...len.de>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Michal Kubecek <mkubecek@...e.cz>, netdev@...r.kernel.org
Subject: Re: tcp hang when socket fills up ?



On 04/16/2018 08:52 PM, Dominique Martinet wrote:
> Thank you for the hints. Out of order reply.
> 
> Florian Westphal wrote on Mon, Apr 16, 2018:
>> echo 1 > /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal
>>
>> which stops conntrack from marking packets with out-of-window
>> acks as invalid.
> 
> That worked, the connection stays established with this set.
> 
> 
>> echo 6 > /proc/sys/net/netfilter/nf_conntrack_log_invalid
> 
> Took me a while to figure how to log that properly (why do distros not
> ship ulogd?!) but that is helpful.
> 
> I have attached the full logs to this mail as text attachments in case I
> missed something in my excerpts
> 
> 
> Specifically, there are many outbound invalid packets, but as Marcelo
> pointed out there is no conntrack netfilter for outbound packets...
> The first invalid incoming packet is that one (json output seemed to be
> the most convenient as it has tcp seq/ackseq and the oob.prefix message,
> feel free to ask for some other format)
> 
> {"timestamp": "2018-04-17T10:29:14.956485", "dvc": "Netfilter",
> "raw.pktlen": 52, "raw.pktcount": 1, "oob.prefix": "nf_ct_tcp: ACK is
> over the upper bound (ACKed data not seen yet) ", "oob.time.sec":
> 1523928554, "oob.time.usec": 956485, "oob.mark": 0, "oob.hook": 0,
> "oob.family": 2, "oob.protocol": 2048, "raw.label": 0, "ip.protocol": 6,
> "ip.tos": 0, "ip.ttl": 52, "ip.totlen": 52, "ip.ihl": 5, "ip.csum":
> 50909, "ip.id": 30328, "ip.fragoff": 16384, "src_port": 30280,
> "dest_port": 29543, "tcp.seq": 4048786673, "tcp.ackseq": 3024392506,
> "tcp.window": 781, "tcp.offset": 0, "tcp.reserved": 0, "tcp.urg": 0,
> "tcp.ack": 1, "tcp.psh": 0, "tcp.rst": 0, "tcp.syn": 0, "tcp.fin": 0,
> "tcp.res1": 0, "tcp.res2": 0, "tcp.csum": 17752, "oob.in": "",
> "oob.out": "", "src_ip": "client", "dest_ip": "server"}
> 
> which is the first ack that isn't seen alright (sent packet, blocked
> ack, replays):
> 10:29:14.926115 IP server.29543 > client.30280: Flags [.], seq 3024391132:3024392506, ack 4048786673, win 307, options [nop,nop,TS val 681965809 ecr 1634528339], length 1374
> ...
> 10:29:14.956485 IP client.30280 > server.29543: Flags [.], ack 3024392506, win 781, options [nop,nop,TS val 1634528369 ecr 681965809], length 0
> ...
> 10:29:15.255489 IP server.29543 > client.30280: Flags [.], seq 3024391132:3024392506, ack 4048786673, win 307, options [nop,nop,TS val 681966138 ecr 1634528369], length 1374
> 10:29:15.281581 IP client.30280 > server.29543: Flags [.], ack 3024436474, win 1444, options [nop,nop,TS val 1634528697 ecr 681965904,nop,nop,sack 1 {3024391132:3024392506}], length 0
> 10:29:15.555466 IP client.30280 > server.29543: Flags [P.], seq 4048786673:4048786709, ack 3024436474, win 1444, options [nop,nop,TS val 1634528869 ecr 681965904], length 36
> 10:29:15.719510 IP server.29543 > client.30280: Flags [.], seq 3024391132:3024392506, ack 4048786673, win 307, options [nop,nop,TS val 681966602 ecr 1634528369], length 1374
> 
> So now I guess the question is why netfilter think this is over the
> upper bound.
> ..And surely enough the answer is within the invalid outbound packets
> that aren't considered in the conntrack state machine, the first of whih
> is this one:
> {"timestamp": "2018-04-17T10:29:14.926143", "dvc": "Netfilter",
> "raw.pktlen": 2800, "raw.pktcount": 1, "oob.prefix": "nf_ct_tcp: SEQ is
> over the upper bound (over the window of the receiver)", "oob.time.sec":
> 1523928554, "oob.time.usec": 926143, "oob.mark": 0, "oob.hook": 0,
> "oob.family": 2, "oob.protocol": 2048, "oob.uid": 1000, "oob.gid": 100,
> "raw.label": 0, "ip.protocol": 6, "ip.tos": 8, "ip.ttl": 64,
> "ip.totlen": 2800, "ip.ihl": 5, "ip.csum": 47600, "ip.id": 27809,
> "ip.fragoff": 16384, "src_port": 29543, "dest_port": 30280, "tcp.seq":
> 3024391132, "tcp.ackseq": 4048786673, "tcp.window": 307, "tcp.offset":
> 0, "tcp.reserved": 0, "tcp.urg": 0, "tcp.ack": 1, "tcp.psh": 0,
> "tcp.rst": 0, "tcp.syn": 0, "tcp.fin": 0, "tcp.res1": 0, "tcp.res2": 0,
> "tcp.csum": 5201, "oob.in": "", "oob.out": "", "src_ip": "server",
> "dest_ip": "client"}
> 
> which sequence matches the start seq of the first packet we don't ack,
> so that's why conntrack doesn't recongnize that as a valid ack.
> 
> 
> Here's some context before that packet:
> 10:29:14.921305 IP server.29543 > client.30280: Flags [.], seq 3024378314:3024379688, ack 4048786673, win 307, options [nop,nop,TS val 681965804 ecr 1634528336], length 1374
> 10:29:14.921311 IP server.29543 > client.30280: Flags [.], seq 3024379688:3024381062, ack 4048786673, win 307, options [nop,nop,TS val 681965804 ecr 1634528336], length 1374
> 10:29:14.922166 IP client.30280 > server.29543: Flags [.], ack 3024365948, win 329, options [nop,nop,TS val 1634528336 ecr 681965778], length 0
> 10:29:14.922178 IP server.29543 > client.30280: Flags [.], seq 3024381062:3024382436, ack 4048786673, win 307, options [nop,nop,TS val 681965805 ecr 1634528336], length 1374
> 10:29:14.922182 IP server.29543 > client.30280: Flags [.], seq 3024382436:3024383810, ack 4048786673, win 307, options [nop,nop,TS val 681965805 ecr 1634528336], length 1374
> 10:29:14.923848 IP client.30280 > server.29543: Flags [.], ack 3024367322, win 352, options [nop,nop,TS val 1634528338 ecr 681965778], length 0
> 10:29:14.923860 IP server.29543 > client.30280: Flags [.], seq 3024383810:3024385184, ack 4048786673, win 307, options [nop,nop,TS val 681965807 ecr 1634528338], length 1374
> 10:29:14.923863 IP server.29543 > client.30280: Flags [.], seq 3024385184:3024386558, ack 4048786673, win 307, options [nop,nop,TS val 681965807 ecr 1634528338], length 1374
> 10:29:14.924175 IP client.30280 > server.29543: Flags [.], ack 3024368696, win 375, options [nop,nop,TS val 1634528338 ecr 681965778], length 0
> 10:29:14.924187 IP server.29543 > client.30280: Flags [.], seq 3024386558:3024387932, ack 4048786673, win 307, options [nop,nop,TS val 681965807 ecr 1634528338], length 1374
> 10:29:14.924190 IP server.29543 > client.30280: Flags [.], seq 3024387932:3024389306, ack 4048786673, win 307, options [nop,nop,TS val 681965807 ecr 1634528338], length 1374
> 10:29:14.925410 IP client.30280 > server.29543: Flags [.], ack 3024370070, win 397, options [nop,nop,TS val 1634528339 ecr 681965779], length 0
> 10:29:14.925422 IP server.29543 > client.30280: Flags [.], seq 3024389306:3024390680, ack 4048786673, win 307, options [nop,nop,TS val 681965808 ecr 1634528339], length 1374
> 10:29:14.925425 IP server.29543 > client.30280: Flags [P.], seq 3024390680:3024391132, ack 4048786673, win 307, options [nop,nop,TS val 681965808 ecr 1634528339], length 452
> 10:29:14.926030 IP client.30280 > server.29543: Flags [.], ack 3024371444, win 420, options [nop,nop,TS val 1634528339 ecr 681965779], length 0
> 10:29:14.926115 IP server.29543 > client.30280: Flags [.], seq 3024391132:3024392506, ack 4048786673, win 307, options [nop,nop,TS val 681965809 ecr 1634528339], length 1374
> 
> But the way I see it, (3024392506-3024371444) = 21064 is smaller than
> 420*64 = 26880.
> Even looking at the previous (ack 3024370070, win 397) we get 22436 and
> 25408 which still work, same with ack 3024368696, win 375 (23810 <
> 24000)
> So that means conntrack is looking at an ack at least as old as ack
> 3024367322, win 352?...
> Is there any way to confirm that? In the code it looks like we match
> ! before(seq, sender->td_maxend + 1)
> 
> I turned pr_debug on in tcp_in_window() for another try and it's a bit
> mangled because the information on multiple lines and the function is
> called in parallel but it looks like I do have some seq > maxend +1
> 
> Although it's weird, the maxend was set WAY earlier apparently?
> Apr 17 11:13:14 res=1 sender end=1913287798 maxend=1913316998 maxwin=29312 receiver end=505004284 maxend=505033596 maxwin=29200
> then window decreased drastically e.g. previous ack just before refusal:
> Apr 17 11:13:53 seq=1913292311 ack=505007789+(0) sack=505007789+(0) win=284 end=1913292311
> Apr 17 11:13:53 sender end=1913292311 maxend=1913331607 maxwin=284 scale=0 receiver end=505020155 maxend=505033596 maxwin=39296 scale=7

scale=0 is suspect.

Really if conntrack does not see SYN SYNACK packets, it should not make any window check,
since windows can be scaled up to 14 :/

> 
> first res=0 (refused) packet:
> Apr 17 11:13:53 odin kernel: seq=505042139 ack=1913292311+(0) ack=1913292311+(0) win=307 end=505044887
> Apr 17 11:13:53 res=0 sender end=505033895 maxend=505033596 maxwin=39296 receiver end=1913292311 maxend=1913331607 maxwin=674
> 
> 
>> (Earlier email implies this is related to timestamps, but unfortunately
>>  to best of my knowledge conntrack doesn't look at tcp timestamps).
> 
> tcp_timestamp introduces other changes in behaviour so it might still be
> relateed.
> 
> 
> 
> Recap of attachments:
>  - new tcpdump output of a failed attempt
>  - matching invalid packet dumped in json format
>  - logs of tcp_in_window function unfortunately of a different attempt
> 
> Happy to provide new/different logs if it would be useful. Once again
> thanks for the extra eyes and useful commands.
> 

Powered by blists - more mailing lists