netdev - Re: [PATCH V2 00/14] net: ks8851: Unify KS8851 SPI and MLL drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fb948d50-8b7a-bbff-861c-efd03b3d687a@denx.de>
Date:   Fri, 27 Mar 2020 20:25:18 +0100
From:   Marek Vasut <marex@...x.de>
To:     Lukas Wunner <lukas@...ner.de>
Cc:     netdev@...r.kernel.org, "David S . Miller" <davem@...emloft.net>,
        Petr Stetiar <ynezz@...e.cz>,
        YueHaibing <yuehaibing@...wei.com>, Andrew Lunn <andrew@...n.ch>
Subject: Re: [PATCH V2 00/14] net: ks8851: Unify KS8851 SPI and MLL drivers

On 3/27/20 7:18 PM, Marek Vasut wrote:

[...]

>> The performance degredation with this series is as follows:
>>
>> Latency (ping) without this series:
>>   rtt min/avg/max/mdev = 0.982/1.776/3.756/0.027 ms, ipg/ewma 2.001/1.761 ms
>> With this series:
>>   rtt min/avg/max/mdev = 1.084/1.811/3.546/0.040 ms, ipg/ewma 2.020/1.814 ms
>>
>> Throughput (scp) without this series:
>>   Transferred: sent 369780976, received 66088 bytes, in 202.0 seconds
>>   Bytes per second: sent 1830943.5, received 327.2
>> With this series:
>>   Transferred: sent 369693896, received 67588 bytes, in 210.5 seconds
>>   Bytes per second: sent 1755952.6, received 321.0
> 
> Maybe some iperf would be better here ?
> 
>> SPI clock is 25 MHz.  The chip would allow up to 40 MHz, but the board
>> layout limits that.
>>
>> I suspect the performance regression is not only caused by the
>> suboptimal 16 byte instead of 8 byte accesses (and 2x16 byte instead
>> of 32 byte accesses), but also because the accessor functions cannot
>> be inlined.  It would be better if they were included from a header
>> file as static inlines.  The performance regression would then likely
>> disappear.
> 
> I did another measurement today and I found out that while RX on the old
> KS8851-MLL driver runs at ~50 Mbit/s , TX runs at ~80 Mbit/s . With this
> new driver, RX still runs at ~50 Mbit/s, but TX runs also at 50 Mbit/s .
> That's real bad. Any ideas how to debug/profile this one ?

So this schedule_work in start_xmit is the problem I have. If I hack it
up to do what the ks8851-mll does -- basically write packet into HW and
wait until it's transmitted -- then I get my 75 Mbit/s back.

I think we should implement some napi here, but for TX ? Basically
buffer up a few packets and then write them to the hardware in bulk.
There has to be something like that in the network stack , no ?