lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 2 May 2024 10:10:03 +0000
From: <Parthiban.Veerasooran@...rochip.com>
To: <ramon.nordin.rodriguez@...roamp.se>, <andrew@...n.ch>
CC: <davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
	<pabeni@...hat.com>, <horms@...nel.org>, <saeedm@...dia.com>,
	<anthony.l.nguyen@...el.com>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <corbet@....net>,
	<linux-doc@...r.kernel.org>, <robh+dt@...nel.org>,
	<krzysztof.kozlowski+dt@...aro.org>, <conor+dt@...nel.org>,
	<devicetree@...r.kernel.org>, <Horatiu.Vultur@...rochip.com>,
	<ruanjinjie@...wei.com>, <Steen.Hegelund@...rochip.com>,
	<vladimir.oltean@....com>, <UNGLinuxDriver@...rochip.com>,
	<Thorsten.Kummermehr@...rochip.com>, <Pier.Beruto@...emi.com>,
	<Selvamani.Rajagopal@...emi.com>, <Nicolas.Ferre@...rochip.com>,
	<benjamin.bigler@...nformulastudent.ch>
Subject: Re: [PATCH net-next v4 05/12] net: ethernet: oa_tc6: implement error
 interrupts unmasking

Hi Ramon,

On 01/05/24 11:59 pm, Ramón Nordin Rodriguez wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>>> n  |  name     |  min  |  avg  |  max  |  rx dropped  |  samples
>>> 1  |  no mod   |  827K |  846K |  891K |      945     |     5
>>> 2  |  no log   |  711K |  726K |  744K |      562     |     5
>>> 3  |  less irq |  815K |  833K |  846K |      N/A     |     5
>>> 4  |  no irq   |  914K |  924K |  931K |      N/A     |     5
>>> 5  |  simple   |  857K |  868K |  879K |      615     |     5
>>
>> That is odd.
>>
>> Side question: What CONFIG_HZ= do you have? 100, 250, 1000?  Try
>> 1000. I've seen problems where the driver wants to sleep for a short
>> time, but the CONFIG_HZ value limits how short a time it can actually
>> sleep. It ends up sleeping much longer than it wants.
>>
> 
> I have been doing my best to abuse the link some more. In brief tweaking
> CONFIG_HZ has some but limited effect.
> Saturating the link with the rx buffer interrupt enabled breaks the driver.
> Saturating the link with the rx buffer interrupt disabled has poor
> performance.
> 
> The following scenario has been tested. Both ends of the link run:
> * server.py
> * client.py
> 
> One end is an arm64 quad core running at 1.2GHz with the lan8650 macphy.
> The other end is an amd 3950x running the lan8670 usb eval board.
> Both systems should be fast enough that running python should not be a
> limiting factor.
> 
> -- The test code --
> server.py
> #!/bin/env python3
> import socket
> 
> def serve(sock: socket.socket):
>      while True:
>          client, addr = sock.accept()
>          print(f'connection from: {addr}')
>          while len(client.recv(2048)) > 0:
>              pass
>          print('client disconnected')
>          client.close()
> 
> if __name__ == '__main__':
>      sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>      sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
>      sock.bind(('0.0.0.0', 4040))
>      sock.listen(1)
>      serve(sock)
>      print("something went wrong")
> 
> client.py
> #!/bin/env python3
> import socket
> import sys
> 
> if __name__ == '__main__':
>      sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>      sock.connect((sys.argv[1], 4040))
> 
>      while True:
>          sock.sendall(b'0'*2048)
> 
> -- test runs --
> run 1 - all interrupts enabled
> Time to failure:
> 1 min or less
> 
> Kernel output:
> [   94.361312] sched: RT throttling activated
> 
> top output:
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>   145 root     -51   0       0      0      0 R  95.5   0.0   1:11.22 oa-tc6-spi-thread
> 
> link stats:
> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
>      link/ether 32:c2:7e:22:93:99 brd ff:ff:ff:ff:ff:ff
>      RX:  bytes packets errors dropped  missed   mcast
>         3371902    7186      0      48       0       0
>      RX errors:  length    crc   frame    fifo overrun
>                       0      0       0       0       0
>      TX:  bytes packets errors dropped carrier collsns
>        10341438    8071      0       0       0       0
>      TX errors: aborted   fifo  window heartbt transns
>                       0      0       0       0       1
> state:
> Completly borked, can't ping in or out, bringing the interface down then up
> has no effect.
> There is no SPI clock and no interrupts generated by the mac-phy.
> The worker thread seems to have live locked.
> 
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> run 2 - RX_BUFFER_OVERLOW interrupt disabled
> 
> state:
> Runs just fine but the oa-tc6-spi-thread is consuming 10-20% cpu
> Ping times have increased from 1-2ms to 8-35ms
> 
> 
> -- additional notes --
> When tweaking CONFIG_HZ I do get some changes in behaviour, the cpu
> consumption stays stable at 20%+-2 with CONFIG_HZ=250, when increased to
> CONFIG_HZ=1000 it jumps up and down between 10-20%.
> 
> I don't have access to a logic analyzer but my old oscilloscope is
> almost reliable. I could confirm that the spi clock is indeed running at
> the expected 25MHz, but I could observe some gaps of up to 320µs so
> that's 8k spi cycles spent doing something else.
> These gaps were observed on the SPI clock and the macphy interrupt was
> active for the same ammount of time(though this was measured independently
> and not on the same trigger).
> I've been drinking way to much coffe, so soldering is not gonna happen
> today (shaky hands), but if it helps I can solder wires to attach both
> probes to confirm that the gap in the SPI clock happens at the same time
> or not as the interrupt is active.
> 
> I'd be keen on hearing what Microchips plans to address. If tracking
> down performance issues is a priority I'll probably not spend any time
> on it, if not then I'll definetly dig into it more.
I tried to reproduce the issue in my setup with your above applications.
But surprisingly I couldn't reproduce the issue you are facing.

One end is Raspberry Pi 4 with lan8651 MAC-PHY and the other end is 
Raspberry Pi 4 with EVB-LAN8670-USB Stick.

lan8651 MAC-PHY side:
---------------------

pi@...pberrypi:~/lan865x/v4_mainline $ python3 server.py
connection from: ('192.168.5.101', 46608)

Top output:
-----------
top - 14:28:14 up  1:14,  4 users,  load average: 0.68, 0.67, 0.63
Tasks: 201 total,   1 running, 200 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  1.1 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.1 si, 
0.0 st
MiB Mem :   7810.0 total,   7110.5 free,    392.3 used,    432.4 
buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.   7417.7 avail Mem

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
COMMAND 

    7219 root     -51   0       0      0      0 S   2.7   0.0   1:13.11 
oa-tc6-spi-thread 

   35307 pi        20   0   16280   9216   5248 S   0.7   0.1   0:16.53 
python3 

   88831 pi        20   0   11728   4864   2816 R   0.3   0.1   0:00.22 
top 

   89819 vnc       20   0    2320    384    384 S   0.3   0.0   0:00.04 
sh 

       1 root      20   0  168652  11580   8436 S   0.0   0.1   0:03.85 
systemd 

       2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 
kthreadd 

       3 root      20   0       0      0      0 S   0.0   0.0   0:00.00 
pool_workqueue_release 

       4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 
kworker/R-rcu_g 

       5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 
kworker/R-rcu_p 

       6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 
kworker/R-slub_ 

       7 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 
kworker/R-netns 

      11 root      20   0       0      0      0 I   0.0   0.0   0:00.03 
kworker/u8:0-ext4-rsv-conversion 

      12 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 
kworker/R-mm_pe

ifconfig output:
----------------
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 192.168.5.100  netmask 255.255.255.0  broadcast 0.0.0.0
         ether 04:05:06:01:02:03  txqueuelen 1000  (Ethernet)
         RX packets 1589879  bytes 2391045582 (2.2 GiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 1022419  bytes 71954905 (68.6 MiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 55

dmesg output:
-------------
pi@...pberrypi:~/lan865x/v4_mainline $ dmesg
[  234.019968] LAN865X Rev.B0 Internal Phy spi0.0:00: attached PHY 
driver (mii_bus:phy_addr=spi0.0:00, irq=POLL)
[  234.069387] lan865x spi0.0 eth1: Link is Up - 10Mbps/Half - flow 
control off

EVB-LAN8670-USB stick side:
---------------------------
pi@...rpi-tp2:~/microchip/t1s-usb $ python3 client.py 192.168.5.100

ifconfig output:
----------------
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 192.168.5.101  netmask 255.255.255.0  broadcast 192.168.5.255
         ether 00:1e:c0:d1:ca:39  txqueuelen 1000  (Ethernet)
         RX packets 1034606  bytes 58330335 (55.6 MiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 1722400  bytes 2628199197 (2.4 GiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Let me know if you want to try anything more.

Best regards,
Parthiban V
> 
> Let me know if anything is unclear or if I can help out with anything
> specific.
> 
> R

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ