lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 6 Aug 2022 07:24:35 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Jiri Slaby <jirislaby@...nel.org>
Cc:     Wei Wang <weiwan@...gle.com>, David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
        Soheil Hassas Yeganeh <soheil@...gle.com>,
        Yuchung Cheng <ycheng@...gle.com>,
        LemmyHuang <hlm3280@....com>, stable <stable@...r.kernel.org>
Subject: Re: [PATCH net v2] Revert "tcp: change pingpong threshold to 3"

On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby <jirislaby@...nel.org> wrote:
>
> On 21. 07. 22, 22:44, Wei Wang wrote:
> > This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c.
> >
> > This to-be-reverted commit was meant to apply a stricter rule for the
> > stack to enter pingpong mode. However, the condition used to check for
> > interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
> > jiffy based and might be too coarse, which delays the stack entering
> > pingpong mode.
> > We revert this patch so that we no longer use the above condition to
> > determine interactive session, and also reduce pingpong threshold to 1.
> >
> > Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
> > Reported-by: LemmyHuang <hlm3280@....com>
> > Suggested-by: Neal Cardwell <ncardwell@...gle.com>
> > Signed-off-by: Wei Wang <weiwan@...gle.com>
>
>
> This breaks python-eventlet [1] (and was backported to stable trees):
> ________________ TestHttpd.test_018b_http_10_keepalive_framing
> _________________
>
> self = <tests.wsgi_test.TestHttpd
> testMethod=test_018b_http_10_keepalive_framing>
>
>      def test_018b_http_10_keepalive_framing(self):
>          # verify that if an http/1.0 client sends connection: keep-alive
>          # that we don't mangle the request framing if the app doesn't
> read the request
>          def app(environ, start_response):
>              resp_body = {
>                  '/1': b'first response',
>                  '/2': b'second response',
>                  '/3': b'third response',
>              }.get(environ['PATH_INFO'])
>              if resp_body is None:
>                  resp_body = 'Unexpected path: ' + environ['PATH_INFO']
>                  if six.PY3:
>                      resp_body = resp_body.encode('latin1')
>              # Never look at wsgi.input!
>              start_response('200 OK', [('Content-type', 'text/plain')])
>              return [resp_body]
>
>          self.site.application = app
>          sock = eventlet.connect(self.server_addr)
>          req_body = b'GET /tricksy HTTP/1.1\r\n'
>          body_len = str(len(req_body)).encode('ascii')
>
>          sock.sendall(b'PUT /1 HTTP/1.0\r\nHost:
> localhost\r\nConnection: keep-alive\r\n'
>                       b'Content-Length: ' + body_len + b'\r\n\r\n' +
> req_body)
>          result1 = read_http(sock)
>          self.assertEqual(b'first response', result1.body)
>          self.assertEqual(result1.headers_original.get('Connection'),
> 'keep-alive')
>
>          sock.sendall(b'PUT /2 HTTP/1.0\r\nHost:
> localhost\r\nConnection: keep-alive\r\n'
>                       b'Content-Length: ' + body_len + b'\r\nExpect:
> 100-continue\r\n\r\n')
>          # Client may have a short timeout waiting on that 100 Continue
>          # and basically immediately send its body
>          sock.sendall(req_body)
>          result2 = read_http(sock)
>          self.assertEqual(b'second response', result2.body)
>          self.assertEqual(result2.headers_original.get('Connection'),
> 'close')
>
>  >       sock.sendall(b'PUT /3 HTTP/1.0\r\nHost:
> localhost\r\nConnection: close\r\n\r\n')
>
> tests/wsgi_test.py:648:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _
> eventlet/greenio/base.py:407: in sendall
>      tail = self.send(data, flags)
> eventlet/greenio/base.py:401: in send
>      return self._send_loop(self.fd.send, data, flags)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _
>
> self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0>
> send_method = <built-in method send of socket object at 0x7f5f2f73d520>
> data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection: close\r\n\r\n'
> args = (0,), _timeout_exc = timeout('timed out'), eno = 32
>
>      def _send_loop(self, send_method, data, *args):
>          if self.act_non_blocking:
>              return send_method(data, *args)
>
>          _timeout_exc = socket_timeout('timed out')
>          while True:
>              try:
>  >               return send_method(data, *args)
> E               BrokenPipeError: [Errno 32] Broken pipe
>
> eventlet/greenio/base.py:388: BrokenPipeError
> ====================
>
> Reverting this revert on the top of 5.19 solves the issue.
>
> Any ideas?

Interesting. This revert should return the kernel back to the delayed
ACK behavior it had for many years before May 2019 and Linux 5.1,
which contains the commit it is reverting:

  4a41f453bedfd tcp: change pingpong threshold to 3

It sounds like perhaps this test you mention has an implicit
dependence on the timing of delayed ACKs.

A few questions:

(1) What are the timeout values in this test? If there is some
implicit or explicit timeout value less than the typical Linux TCP
40ms delayed ACK timer value then this could be the problem. If you
make sure all timeouts are at least, say, 300ms then this should
remove dependencies on delayed ACK behavior (and make the test more
portable).

(2) Does this test use the TCP_NODELAY socket option to disable
Nagle's algorithm? Presumably it should, given that it's a network app
that cares about latency. Omitting the TCP_NODELAY socket option can
cause request/response traffic to depend on delayed ACK behavior.

(3) If (1) and (2) do not fix the test, would you be able to provide
binary .pcap traces of the behavior with the test (a) passing and (b)
failing? For example:
   sudo tcpdump -i any -w /tmp/trace.pcap -s 100 port 80 &
   # run test
   killall tcpdump

thanks,
neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ