lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 17 May 2020 14:36:59 +0200
From:   Marek Vasut <marex@...x.de>
To:     Lukas Wunner <lukas@...ner.de>, David Miller <davem@...emloft.net>
Cc:     netdev@...r.kernel.org, ynezz@...e.cz, yuehaibing@...wei.com
Subject: Re: [PATCH V6 00/20] net: ks8851: Unify KS8851 SPI and MLL drivers

On 5/17/20 9:13 AM, Lukas Wunner wrote:
> On Sat, May 16, 2020 at 07:02:25PM -0700, David Miller wrote:
>>> The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
>>> of silicon, except the former has an SPI interface, while the later has a
>>> parallel bus interface. Thus far, Linux has two separate drivers for each
>>> and they are diverging considerably.
>>>
>>> This series unifies them into a single driver with small SPI and parallel
>>> bus specific parts. The approach here is to first separate out the SPI
>>> specific parts into a separate file, then add parallel bus accessors in
>>> another separate file and then finally remove the old parallel bus driver.
>>> The reason for replacing the old parallel bus driver is because the SPI
>>> bus driver is much higher quality.
>>
>> What strikes me in these changes is all of the new indirect jumps in
>> the fast paths of TX and RX packet processing.  It's just too much for
>> my eyes. :-)
>>
>> Especially in the presence of Spectre mitigations, these costs are
>> quite non-trivial.
>>
>> Seriously, I would recommend that instead of having these small
>> indirect helpers, just inline the differences into two instances of
>> the RX interrupt and the TX handler.
> 
> I agree.

I do not.

> However in terms of performance there's a bigger problem:
> 
> Previously ks8851.c (SPI driver) had 8-bit and 32-bit register accessors.
> The present series drops them and performs a 32-bit access as two 16-bit
> accesses and an 8-bit access as one 16-bit access because that's what
> ks8851_mll.c (16-bit parallel bus driver) does.  That has a real,
> measurable performance impact because in the case of 8-bit accesses,
> another 8 bits need to be transferred over the SPI bus, and in the case
> of 32-bit accesses, *two* SPI transfers need to be performed.
> 
> The 8-bit and 32-bit accesses happen in ks8851_rx_pkts(), i.e. in the
> RX hotpath.  I've provided numbers for the performance impact and even
> a patch to solve them but it was dismissed and not included in the
> present series:
> 
> https://lore.kernel.org/netdev/20200420140700.6632hztejwcgjwsf@wunner.de/
> 
> The reason given for the dismissal was that I had performed the measurements
> on 4.19 which is allegedly "long dead" (in Andrew Lunn's words).
> However I can assure you that performing two SPI transfers has not
> magically become as fast as performing one SPI transfer since 4.19.
> So the argument is nonsense.

I invested time and even obtained the SPI variant of the card to perform
actual comparative measurements on linux-next both on the SPI and
parallel variant with iperf, both for latency and throughput, and I do
not observe this problem.

A month ago, I even provided you a branch with all the patches and the
DT patch for RPi3 (the platform you claim to use for these tests, so I
used the same) so you can perform the same test as I did, with the same
hardware and the same software. So it should have been trivial to
reproduce the tests I did and their results.

> Nevertheless I was going to repeat the performance measurements on a
> recent kernel but haven't gotten around to that yet because the
> measurements need to be performed with CONFIG_PREEMPT_RT_FULL to
> be reliable (a vanilla kernel is too jittery), so I have to create
> a new branch with RT patches on the test machine, which is fairly
> involved and time consuming.
> 
> I think it's fair that the two drivers are unified, but the performance
> for the SPI variant shouldn't be unnecessarily diminished in the process.

Could it be that your problem is related to this huge out-of-tree patch
you use then ?

-- 
Best regards,
Marek Vasut

Powered by blists - more mailing lists