netdev - Re: [PATCH net-next 2/3] tcp: implement coalescing on backlog queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+afn_7D+dpk1dd_pjN=OFT=_8xJrr8iybC+oORt2QUoA@mail.gmail.com>
Date:   Wed, 24 Apr 2019 08:47:27 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Bruno Prémont <bonbons@...ophe.eu>
Cc:     richard.purdie@...uxfoundation.org,
        Neal Cardwell <ncardwell@...gle.com>,
        Yuchung Cheng <ycheng@...gle.com>,
        "David S. Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Alexander Kanavin <alex.kanavin@...il.com>,
        Bruce Ashfield <bruce.ashfield@...il.com>
Subject: Re: [PATCH net-next 2/3] tcp: implement coalescing on backlog queue

On Wed, Apr 24, 2019 at 7:51 AM Bruno Prémont <bonbons@...ophe.eu> wrote:
>
> Hi Eric,
>
> I'm seeing issues with this patch as well, not as regular as for
> Richard but still (about up to one in 30-50 TCP sessions).
>
> In my case I have a virtual machine (on VMWare) with this patch where
> NGINX as reverse proxy misses part (end) of payload from its upstream
> and times out on the upstream connection (while according to tcpdump all
> packets including upstream's FIN were sent and the upstream did get
> ACKs from the VM).
>
> From when browsers get from NGINX it feels as if at some point reading
> from the socket or waiting for data using select() never returned data
> that arrived as more than just EOF is missing.
>
> The upstream is a hardware machine in the same subnet.
>
> My VM is using VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
> as network adapter which lists the following features:
>

Hi Bruno.

I suspect a EPOLLIN notification being lost by the application.

Fact that TCP backlog contains 1 instead of 2+ packets should not
change stack behavior,
this packet should land into socket receive queue eventually.

Are you using epoll() in Edge Trigger mode. You mention select() but
select() is a rather old and inefficient API.

Could you watch/report the output of " ss -temoi "  for the frozen TCP flow ?

This migtht give us a clue about packets being dropped, say the the
accumulated packet became too big.



> rx-checksumming: on
> tx-checksumming: on
>         tx-checksum-ipv4: off [fixed]
>         tx-checksum-ip-generic: on
>         tx-checksum-ipv6: off [fixed]
>         tx-checksum-fcoe-crc: off [fixed]
>         tx-checksum-sctp: off [fixed]
> scatter-gather: on
>         tx-scatter-gather: on
>         tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
>         tx-tcp-segmentation: on
>         tx-tcp-ecn-segmentation: off [fixed]
>         tx-tcp-mangleid-segmentation: off
>         tx-tcp6-segmentation: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: on
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: on
> rx-vlan-filter: on [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: off [fixed]
> netns-local: off [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: off [fixed]
> tx-gre-csum-segmentation: off [fixed]
> tx-ipxip4-segmentation: off [fixed]
> tx-ipxip6-segmentation: off [fixed]
> tx-udp_tnl-segmentation: off [fixed]
> tx-udp_tnl-csum-segmentation: off [fixed]
> tx-gso-partial: off [fixed]
> tx-sctp-segmentation: off [fixed]
> tx-esp-segmentation: off [fixed]
> tx-udp-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> hw-tc-offload: off [fixed]
> esp-hw-offload: off [fixed]
> esp-tx-csum-hw-offload: off [fixed]
> rx-udp_tunnel-port-offload: off [fixed]
> tls-hw-tx-offload: off [fixed]
> tls-hw-rx-offload: off [fixed]
> rx-gro-hw: off [fixed]
> tls-hw-record: off [fixed]
>
>
> I can reproduce the issue with kernels 5.0.x and as recent as 5.1-rc6.
>
> Cheers,
> Bruno
>
> On Sunday, April 7, 2019 11:28:30 PM CEST, richard.purdie@...uxfoundation.org wrote:
> > Hi,
> >
> > I've been chasing down why a python test from the python3 testsuite
> > started failing and it seems to point to this kernel change in the
> > networking stack.
> >
> > In kernels beyond commit 4f693b55c3d2d2239b8a0094b518a1e533cf75d5 the
> > test hangs about 90% of the time (I've reproduced with 5.1-rc3, 5.0.7,
> > 5.0-rc1 but not 4.18, 4.19 or 4.20). The reproducer is:
> >
> > $ python3 -m test test_httplib -v
> > == CPython 3.7.2 (default, Apr 5 2019, 15:17:15) [GCC 8.3.0]
> > == Linux-5.0.0-yocto-standard-x86_64-with-glibc2.2.5 little-endian
> > == cwd: /var/volatile/tmp/test_python_288
> > == CPU count: 1
> > == encodings: locale=UTF-8, FS=utf-8
> > [...]
> > test_response_fileno (test.test_httplib.BasicTest) ...
> >
> > and it hangs in test_response_fileno.
> >
> > The test in question comes from Lib/test/test_httplib.py in the python
> > source tree and the code is:
> >
> >     def test_response_fileno(self):
> >         # Make sure fd returned by fileno is valid.
> >         serv = socket.socket(
> >             socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP)
> >         self.addCleanup(serv.close)
> >         serv.bind((HOST, 0))
> >         serv.listen()
> >
> >         result = None
> >         def run_server():
> >             [conn, address] = serv.accept()
> >             with conn, conn.makefile("rb") as reader:
> >                 # Read the request header until a blank line
> >                 while True:
> >                     line = reader.readline()
> >                     if not line.rstrip(b"\r\n"):
> >                         break
> >                 conn.sendall(b"HTTP/1.1 200 Connection established\r\n\r\n")
> >                 nonlocal result
> >                 result = reader.read()
> >
> >         thread = threading.Thread(target=run_server)
> >         thread.start()
> >         self.addCleanup(thread.join, float(1))
> >         conn = client.HTTPConnection(*serv.getsockname())
> >         conn.request("CONNECT", "dummy:1234")
> >         response = conn.getresponse()
> >         try:
> >             self.assertEqual(response.status, client.OK)
> >             s = socket.socket(fileno=response.fileno())
> >             try:
> >                 s.sendall(b"proxied data\n")
> >             finally:
> >                 s.detach()
> >         finally:
> >             response.close()
> >             conn.close()
> >         thread.join()
> >         self.assertEqual(result, b"proxied data\n")
> >
> > I was hoping someone with more understanding of the networking stack
> > could look at this and tell whether its a bug in the python test, the
> > kernel change or otherwise give a pointer to where the problem might
> > be? I'll freely admit this is not an area I know much about.
> >
> > Cheers,
> >
> > Richard
> >
> >
> >