lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 11 Nov 2015 11:24:23 -0500 (EST)
From:	David Miller <davem@...emloft.net>
To:	mans@...sr.com
Cc:	romieu@...zoreil.com, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, slash.tmp@...e.fr
Subject: Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800
 Ethernet controller

From: Måns Rullgård <mans@...sr.com>
Date: Wed, 11 Nov 2015 13:04:07 +0000

> Måns Rullgård <mans@...sr.com> writes:
> 
>> David Miller <davem@...emloft.net> writes:
>>
>>> From: Måns Rullgård <mans@...sr.com>
>>> Date: Wed, 11 Nov 2015 00:40:09 +0000
>>>
>>>> When the DMA complete interrupt arrives, the next chain should be
>>>> kicked off as quickly as possible, and I don't see why that would
>>>> benefit from being done in napi context.
>>>
>>> NAPI isn't about low latency, it's about fairness and interrupt
>>> mitigation.
>>>
>>> You probably don't even realize that all of the TX SKB freeing you do
>>> in the hardware interrupt handler end up being actually processed by a
>>> scheduled software interrupt anyways.
>>>
>>> So you are gaining almost nothing by not doing TX completion in NAPI
>>> context, whereas by doing so you would be gaining a lot including
>>> more simplified locking or even the ability to do no locking at all.
>>
>> TX completion is separate from restarting the DMA, and moving that to
>> NAPI may well be a good idea.  Should I simply napi_schedule() if the
>> hardware indicates TX is complete and do the cleanup in the NAPI poll
>> function?
> 
> I tried that, and throughput (as measured by iperf3) dropped by 2%.
> Maybe I did something wrong.

Did you fix all the locking in that change?

Since all of your TX handling runs in software interrupt context, you
can stop using IRQ locking and use BH locking driver-wide instead.

And actually, no locking is really needed for TX processing.  With
proper memory barriers and properly crafter queue state tests, you
can run completely lockless.

Again, look at example drivers.  I know, for example, that
drivers/net/ethernet/broadcom/tg3.c runs TX lockless.  You'll
see that tg3_tx() takes no locks at all.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ