[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZjKJ93uPjSgoMOM7@builder>
Date: Wed, 1 May 2024 20:29:11 +0200
From: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@...roamp.se>
To: Andrew Lunn <andrew@...n.ch>
Cc: Parthiban Veerasooran <Parthiban.Veerasooran@...rochip.com>,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, horms@...nel.org, saeedm@...dia.com,
anthony.l.nguyen@...el.com, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, corbet@....net,
linux-doc@...r.kernel.org, robh+dt@...nel.org,
krzysztof.kozlowski+dt@...aro.org, conor+dt@...nel.org,
devicetree@...r.kernel.org, horatiu.vultur@...rochip.com,
ruanjinjie@...wei.com, steen.hegelund@...rochip.com,
vladimir.oltean@....com, UNGLinuxDriver@...rochip.com,
Thorsten.Kummermehr@...rochip.com, Pier.Beruto@...emi.com,
Selvamani.Rajagopal@...emi.com, Nicolas.Ferre@...rochip.com,
benjamin.bigler@...nformulastudent.ch
Subject: Re: [PATCH net-next v4 05/12] net: ethernet: oa_tc6: implement error
interrupts unmasking
> > n | name | min | avg | max | rx dropped | samples
> > 1 | no mod | 827K | 846K | 891K | 945 | 5
> > 2 | no log | 711K | 726K | 744K | 562 | 5
> > 3 | less irq | 815K | 833K | 846K | N/A | 5
> > 4 | no irq | 914K | 924K | 931K | N/A | 5
> > 5 | simple | 857K | 868K | 879K | 615 | 5
>
> That is odd.
>
> Side question: What CONFIG_HZ= do you have? 100, 250, 1000? Try
> 1000. I've seen problems where the driver wants to sleep for a short
> time, but the CONFIG_HZ value limits how short a time it can actually
> sleep. It ends up sleeping much longer than it wants.
>
I have been doing my best to abuse the link some more. In brief tweaking
CONFIG_HZ has some but limited effect.
Saturating the link with the rx buffer interrupt enabled breaks the driver.
Saturating the link with the rx buffer interrupt disabled has poor
performance.
The following scenario has been tested. Both ends of the link run:
* server.py
* client.py
One end is an arm64 quad core running at 1.2GHz with the lan8650 macphy.
The other end is an amd 3950x running the lan8670 usb eval board.
Both systems should be fast enough that running python should not be a
limiting factor.
-- The test code --
server.py
#!/bin/env python3
import socket
def serve(sock: socket.socket):
while True:
client, addr = sock.accept()
print(f'connection from: {addr}')
while len(client.recv(2048)) > 0:
pass
print('client disconnected')
client.close()
if __name__ == '__main__':
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('0.0.0.0', 4040))
sock.listen(1)
serve(sock)
print("something went wrong")
client.py
#!/bin/env python3
import socket
import sys
if __name__ == '__main__':
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((sys.argv[1], 4040))
while True:
sock.sendall(b'0'*2048)
-- test runs --
run 1 - all interrupts enabled
Time to failure:
1 min or less
Kernel output:
[ 94.361312] sched: RT throttling activated
top output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
145 root -51 0 0 0 0 R 95.5 0.0 1:11.22 oa-tc6-spi-thread
link stats:
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 32:c2:7e:22:93:99 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
3371902 7186 0 48 0 0
RX errors: length crc frame fifo overrun
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
10341438 8071 0 0 0 0
TX errors: aborted fifo window heartbt transns
0 0 0 0 1
state:
Completly borked, can't ping in or out, bringing the interface down then up
has no effect.
There is no SPI clock and no interrupts generated by the mac-phy.
The worker thread seems to have live locked.
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
run 2 - RX_BUFFER_OVERLOW interrupt disabled
state:
Runs just fine but the oa-tc6-spi-thread is consuming 10-20% cpu
Ping times have increased from 1-2ms to 8-35ms
-- additional notes --
When tweaking CONFIG_HZ I do get some changes in behaviour, the cpu
consumption stays stable at 20%+-2 with CONFIG_HZ=250, when increased to
CONFIG_HZ=1000 it jumps up and down between 10-20%.
I don't have access to a logic analyzer but my old oscilloscope is
almost reliable. I could confirm that the spi clock is indeed running at
the expected 25MHz, but I could observe some gaps of up to 320µs so
that's 8k spi cycles spent doing something else.
These gaps were observed on the SPI clock and the macphy interrupt was
active for the same ammount of time(though this was measured independently
and not on the same trigger).
I've been drinking way to much coffe, so soldering is not gonna happen
today (shaky hands), but if it helps I can solder wires to attach both
probes to confirm that the gap in the SPI clock happens at the same time
or not as the interrupt is active.
I'd be keen on hearing what Microchips plans to address. If tracking
down performance issues is a priority I'll probably not spend any time
on it, if not then I'll definetly dig into it more.
Let me know if anything is unclear or if I can help out with anything
specific.
R
Powered by blists - more mailing lists