lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 12 Sep 2018 15:23:43 +0100
From:   Jose Abreu <Jose.Abreu@...opsys.com>
To:     Florian Fainelli <f.fainelli@...il.com>,
        Jose Abreu <Jose.Abreu@...opsys.com>, <netdev@...r.kernel.org>,
        Tal Gilboa <talgi@...lanox.com>
CC:     Jerome Brunet <jbrunet@...libre.com>,
        Martin Blumenstingl <martin.blumenstingl@...glemail.com>,
        "David S. Miller" <davem@...emloft.net>,
        Joao Pinto <Joao.Pinto@...opsys.com>,
        "Giuseppe Cavallaro" <peppe.cavallaro@...com>,
        Alexandre Torgue <alexandre.torgue@...com>
Subject: Re: [PATCH net-next v2 1/2] net: stmmac: Rework coalesce timer and
 fix multi-queue races

Hi Florian,

Thanks for your input.

On 10-09-2018 20:22, Florian Fainelli wrote:
> On 09/10/2018 02:14 AM, Jose Abreu wrote:
>> This follows David Miller advice and tries to fix coalesce timer in
>> multi-queue scenarios.
>>
>> We are now using per-queue coalesce values and per-queue TX timer.
>>
>> Coalesce timer default values was changed to 1ms and the coalesce frames
>> to 25.
>>
>> Tested in B2B setup between XGMAC2 and GMAC5.
> Why not revert the entire features for this merge window and work on
> getting it to work over the next weeks/merge windows?

It was already reverted but the performance drops a little bit
(not that much but I'm trying to fix it).

>
> The idea of using a timer to coalesce TX path when there is not a HW
> timer is a good idea and if this is made robust enough, you could even
> promote that as being a network stack library/feature that could be used
> by other drivers. In fact, this could be a great addition to the net DIM
> library (Tal, what do you think?)
>
> Here's a quick drive by review of things that appear wrong in the
> current driver (without your patches):
>
> - in stmmac_xmit(), in case we hit the !is_jumbo branch and we fail the
> DMA mapping, there is no timer cancellation, don't we want to abort the
> whole transmission?

I don't think this is needed because then tx pointer will not
advance and in stmmac_tx_clean we just won't perform any work.
Besides, we can have a pending timer from previous packets
running so canceling it can cause some problems.

>
> - stmmac_tx_clean() should probably use netif_lock_bh() to guard against
> the timer (soft IRQ context) and the the NAPI context (also soft IRQ)
> running in parallel on two different CPUs. This may not explain all
> problems, but these two things are fundamentally exclusive, because the
> timer is meant to emulate the interrupt after N packets, while NAPI
> executes when such a thing did actually occur

Ok, and now I'm also using __netif_tx_lock_bh(queue) to just lock
per queue instead of the whole TX.

>
> - stmmac_poll() should cancel pending timer(s) if it was able to reclaim
> packets, likewise stmmac_tx_timer() should re-enable TX interrupts if it
> reclaimed packets, since TX interrupts could have been left disabled
> from a prior NAPI run. These could be considered optimizations, since
> you could leave the TX timer running all the time, just adjust the
> deadline (based on line rate, MTU, IPG, number of fragments and their
> respective length), worst case, both NAPI and the timer clean up your TX
> ring, so you should always have room to push more packets

In next version I'm dropping the direct call to stmmac_tx_clean()
in the timer function and just scheduling NAPI instead.

Thanks and Best Regards,
Jose Miguel Abreu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ