lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <303c8600e4964d1593b038239779ba4b@AcuMS.aculab.com>
Date:   Thu, 27 Feb 2020 13:19:15 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     netdev <netdev@...r.kernel.org>
Subject: sys_sendto() spinning for 1.6ms

I'm looking into unexpected delays in some real time (RTP audio) processing.
Mostly they are a few 100 usecs in the softint code.

However I've just triggered in sendto() on a raw IPV4 socket taking 1.6ms.
Most of the sends take less than 32us.
The process isn't sleeping and there are no interrupts.
Any idea where it might be spinning?

I'm running ftrace monitoring scheduler events, system calls and
interrupts (hard and soft) but can't turn on anything other than
selective function trace (the trace cost is far too high).

The ftrace at the time of the sendto() is (I'm only running 4 cpu):
    pid-28219 [002] .... 1979891.159912: sys_sendto(fd: 394, buff: 7ffbcb2e2940, len: c8, flags: 0, addr: 7ffbccdc9490, addr_len: 10)
    pid-28217 [003] .... 1979891.159912: sys_recvfrom(fd: 23c, ubuf: 7ffbcb1db100, size: 748, flags: 0, addr: 7ffbcddcb5b0, addr_len: 7ffbcddcb5ac)
    pid-28218 [001] .... 1979891.159912: sys_futex(uaddr: 113d8e0, op: 80, val: 2, utime: 0, uaddr2: 113d8e0, val3: 6e3a)
    pid-28216 [000] .... 1979891.159912: sys_futex(uaddr: 113d8e0, op: 81, val: 1, utime: 7ffbce5cc920, uaddr2: 113d8e0, val3: 6e38)
    pid-28217 [003] .... 1979891.159912: sys_recvfrom -> 0xac

Cpu 2 trace is:
    pid-28219 [002] .... 1979891.159910: sys_futex(uaddr: 113d8e0, op: 81, val: 1, utime: 7ffbccdc9920, uaddr2: 113d8e0, val3: 6e3b)
    pid-28219 [002] .... 1979891.159911: sys_futex -> 0x0
    pid-28219 [002] .... 1979891.159912: sys_sendto(fd: 394, buff: 7ffbcb2e2940, len: c8, flags: 0, addr: 7ffbccdc9490, addr_len: 10)
    pid-28219 [002] .... 1979891.161647: sys_sendto -> 0xc8
    pid-28219 [002] .... 1979891.161648: sys_pread64(fd: 3ae, buf: 7ffbccdc9420, count: 1, pos: 0)
    pid-28219 [002] .... 1979891.161650: sys_pread64 -> 0x1
    pid-28219 [002] .... 1979891.161651: sys_pwrite64(fd: 3ae, buf: d495e8, count: 1, pos: 0)

The previous send on fd 394 was much earlier and that socket is only used
by that process, and only for sends (to avoid sleeping on the socket lock).

It can't be an problem with the socket send buffer - because that would sleep.

The other 3 cpu carry on processing (the futex calls return immediately).
In fact they all sleep 1.4ms before the sento() finishes.

At the moment the trace is a sample of 1 - so I don't know if the concurrent
sento() and recvfrom() are significant.
They will both be the same remote IP, and might be the same port.

Ideas before I start trying to bisect the call graph?

A rough histogram of the time (ns) for the sendto() call.
0k: 71866866
48k: 99213
80k: 3583
112k: 371
144k: 245
176k: 694
208k: 382
240k: 214
272k: 20
304k: 13
336k: 15
368k: 13
400k: 11
432k: 14
464k: 12
496k: 7
528k: 10
560k: 2
592k: 1
624k: 2
656k: 1
688k: 0 (twice)
752k: 1
784k: 0 (twice)
848k: 1
880k: 1
912k: 0 (24 times)
1680k: 1

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ