[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fb948d50-8b7a-bbff-861c-efd03b3d687a@denx.de>
Date: Fri, 27 Mar 2020 20:25:18 +0100
From: Marek Vasut <marex@...x.de>
To: Lukas Wunner <lukas@...ner.de>
Cc: netdev@...r.kernel.org, "David S . Miller" <davem@...emloft.net>,
Petr Stetiar <ynezz@...e.cz>,
YueHaibing <yuehaibing@...wei.com>, Andrew Lunn <andrew@...n.ch>
Subject: Re: [PATCH V2 00/14] net: ks8851: Unify KS8851 SPI and MLL drivers
On 3/27/20 7:18 PM, Marek Vasut wrote:
[...]
>> The performance degredation with this series is as follows:
>>
>> Latency (ping) without this series:
>> rtt min/avg/max/mdev = 0.982/1.776/3.756/0.027 ms, ipg/ewma 2.001/1.761 ms
>> With this series:
>> rtt min/avg/max/mdev = 1.084/1.811/3.546/0.040 ms, ipg/ewma 2.020/1.814 ms
>>
>> Throughput (scp) without this series:
>> Transferred: sent 369780976, received 66088 bytes, in 202.0 seconds
>> Bytes per second: sent 1830943.5, received 327.2
>> With this series:
>> Transferred: sent 369693896, received 67588 bytes, in 210.5 seconds
>> Bytes per second: sent 1755952.6, received 321.0
>
> Maybe some iperf would be better here ?
>
>> SPI clock is 25 MHz. The chip would allow up to 40 MHz, but the board
>> layout limits that.
>>
>> I suspect the performance regression is not only caused by the
>> suboptimal 16 byte instead of 8 byte accesses (and 2x16 byte instead
>> of 32 byte accesses), but also because the accessor functions cannot
>> be inlined. It would be better if they were included from a header
>> file as static inlines. The performance regression would then likely
>> disappear.
>
> I did another measurement today and I found out that while RX on the old
> KS8851-MLL driver runs at ~50 Mbit/s , TX runs at ~80 Mbit/s . With this
> new driver, RX still runs at ~50 Mbit/s, but TX runs also at 50 Mbit/s .
> That's real bad. Any ideas how to debug/profile this one ?
So this schedule_work in start_xmit is the problem I have. If I hack it
up to do what the ks8851-mll does -- basically write packet into HW and
wait until it's transmitted -- then I get my 75 Mbit/s back.
I think we should implement some napi here, but for TX ? Basically
buffer up a few packets and then write them to the hardware in bulk.
There has to be something like that in the network stack , no ?
Powered by blists - more mailing lists