lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170827204725.GA8625@amazon.com>
Date:   Sun, 27 Aug 2017 20:47:25 +0000
From:   Vallish Vaidyeshwara <vallish@...zon.com>
To:     David Miller <davem@...emloft.net>
CC:     <shuah@...nel.org>, <richardcochran@...il.com>,
        <xiyou.wangcong@...il.com>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <eduval@...zon.com>,
        <anchalag@...zon.com>
Subject: Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

On Tue, Aug 22, 2017 at 09:30:30PM -0700, David Miller wrote:
> From: Vallish Vaidyeshwara <vallish@...zon.com>
> Date: Wed, 23 Aug 2017 00:10:25 +0000
> 
> > I am submitting 2 patch series to enable hires timer to timeout
> > datagram sockets (AF_UNIX & AF_INET domain) and test code to test
> > timeout accuracy on these sockets.
> 
> This is not reasonable.
> 
> If you want high resolution events with real guarantees, please use
> the kernel interfaces which provide this as explained to you as
> feedback by other reviewers.
> 
> I'm not applying this, sorry.

Hello David,

I respect the decision not to upstream this patch series, however I
wanted to provide additional details. Application wanting high
resolution events with real guarantees is not the case, but the case
here is regression in system call behavior:

1) Change in system call behavior:
strace from 4.4 test run of waiting for 180 seconds on datagram socket:
10:25:48.239685 setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
10:25:48.239755 recvmsg(3, 0x7ffd0a3beec0, 0) = -1 EAGAIN (Resource temporarily unavailable)
10:28:48.236989 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0

strace from 4.9 test run of waiting for 180 seconds on datagram socket times out close to 195 seconds:
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0 <0.000028>
recvmsg(3, 0x7ffd6a2c4380, 0)           = -1 EAGAIN (Resource temporarily unavailable) <194.852000>
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 <0.000018>

This is the change in behavior of system call that is causing our application
to regress on 4.9 kernel. There are events which need to be run on timeouts
and now response time for such timeouts on 4.9 kernel are being triggered
with extended delay of close to 195 seconds as in one of the test runs
shown above.

2) Comparison with MacOS:
I ran the same test on OS X El Capitan version 10.11.6 and the behavior is
consistent with Linux 4.4 Kernel behavior. I have not tested the program on
other flavors of OS like HPUX or AIX or Solaris, but I guess if these OS
implement SO_RCVTIMEO and tested, this behavior will not be different than
Linux 4.4 kernel.
  
3) Standards Specification:
Opengroups standard does not talk about how quick SO_RCVTIMEO need to respond
for timeouts. However, the standards for select system call do mention that
timeout need to respond quickly. It would be good to restore SO_RCVTIMEO
behavior to 4.4 kernel and have SO_RCVTIMEO be consistent with select timeout.

4) Changing application code:
Any change to application code to accommodate this change of behavior in system
call breaks application migration between 4.4 kernel and 4.9 kernel.
Moreover, making application code change is not feasible in all cases as in
the case where the source code is not available (third party vendor).

Thanks.
-Vallish

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ