lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3c44d27e-27d9-e9c4-04b3-17c0366e60d9@gmail.com>
Date:   Fri, 25 Aug 2017 16:18:21 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     netdev@...r.kernel.org, edumazet@...il.com, pabeni@...hat.com,
        willemb@...gle.com, davem@...emloft.net
Subject: Re: UDP sockets oddities

On 08/23/2017 07:23 PM, Florian Fainelli wrote:
> On 08/23/2017 05:43 PM, Eric Dumazet wrote:
>> On Wed, 2017-08-23 at 17:03 -0700, Florian Fainelli wrote:
>>
>>> Attached is the perf report --stdio of:
>>>
>>> # perf record -a -g -e skb:kfree_skb iperf -c 192.168.1.23 -b 800M -t 60
>>> WARNING: option -b implies udp testing
>>> ------------------------------------------------------------
>>> Client connecting to 192.168.1.23, UDP port 5001
>>> Sending 1470 byte datagrams
>>> UDP buffer size:  160 KByte (default)
>>> ------------------------------------------------------------
>>> [  4] local 192.168.1.66 port 36169 connected with 192.168.1.23 port 5001
>>> [ ID] Interval       Transfer     Bandwidth
>>> [  4]  0.0-60.0 sec   685 MBytes  95.8 Mbits/sec
>>> [  4] Sent 488633 datagrams
>>> [  4] Server Report:
>>> [  4]  0.0-74.4 sec   685 MBytes  77.2 Mbits/sec   0.187 ms  300/488632
>>> (0.061%)
>>> [  4]  0.0-74.4 sec  58 datagrams received out-of-order
>>>
>>> It measured 488644 events which is greater than the number of packets
>>> transmitted by iperf but I only count 8 non-IP packets being sent
>>> (protocl=2054 -> ETH_P_ARP) so I am not sure what the other 4 were and
>>> why there are not accounted for.
>>>
>>> Almost all events came from net_tx_action() except 2 that came from
>>> net/core/neighbour.c::neigh_probe() and 65 that came from
>>> arch/arm/include/asm/irqflags.h::arch_local_irq_save() ?!?
>>
>> Oh you have too much noise and need this fix :
>>
>> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
>> index dc3052751bc1..fcfa8d991850 100644
>> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
>> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
>> @@ -597,7 +597,7 @@ static int bcm_sysport_set_coalesce(struct net_device *dev,
>>  
>>  static void bcm_sysport_free_cb(struct bcm_sysport_cb *cb)
>>  {
>> -	dev_kfree_skb_any(cb->skb);
>> +	dev_consume_skb_any(cb->skb);
>>  	cb->skb = NULL;
>>  	dma_unmap_addr_set(cb, dma_addr, 0);
>>  }
>>
> 
> Yay, now I am getting some sensible information, thanks!
> 
> iperf reports 143 packets lost, and perf report gets me 146 entries for
> kfree_skb() looking like that:
> #
>      2.74%     2.74%  skbaddr=0xee5f30c0 protocol=2048 location=0xc0855cdc
>             |
>             ---0x2d
>                write
>                0xe9fc2368
>                kfree_skb
> 
> 
> What is annoying is that 0xc0855cdc is resolved to
> arch/arm/include/asm/irqflags.h::arch_local_irq_save() which does not
> really help telling me where exactly in the stack the drop is happening,
> although now I know it is somewhere down the path from write(2)... of
> course it is :)
> 
> Now what is disturbing too is that iperf does  have its write() system
> call get an error returned, write happily returned the number of bytes
> written even though the perf trace indicates there was packet drops down
> the road..

Because perf was not able to track down exactly where the location was
with builtin_return_address(0) and always pointed to
arch_local_irq_save(), I added a WARN() in kfree_skb(), and of course
this made it impossible to see packet loss under the same conditions
anymore. After increasing dramatically wmem_{default,max} I could start
seeing significant drop made largely worse by the printks, so not very
conclusive either.

I took another approach and recorded net_dev_queue() events, and I can
see only 976704 (gphy + eth0 devices) / 2 = 488352 dev_queue_events()
were recorded. The total sent by iperf was 488898 and we lost 558
packets according to the server 488352 + 558 > 488898 which makes sense
because neigh_probe() and others still run. This still confirms there is
a drop occurring between UDP socket sendmsg and the SKB enqueuing...

So my take away is that the CPU has a bursty write(2) behavior, and the
larger the socket write buffer size, the more datagrams you can burst
into and the more of these SKBs can be flat out being dropped when the
transmit queue is congested. Adding a WARN() is enough of a slow down
for the CPU that we are not able to reliably reproduce this burst.

Eric, are there areas of the stack where we are allowed to drop packets,
not propagate that back to write(2) and also not increment any counter
either, or maybe I am not looking where I should...

Thanks!
-- 
Florian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ