netdev - Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20151111.112423.1751428739163066569.davem@davemloft.net>
Date:	Wed, 11 Nov 2015 11:24:23 -0500 (EST)
From:	David Miller <davem@...emloft.net>
To:	mans@...sr.com
Cc:	romieu@...zoreil.com, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, slash.tmp@...e.fr
Subject: Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800
 Ethernet controller

From: Måns Rullgård <mans@...sr.com>
Date: Wed, 11 Nov 2015 13:04:07 +0000

> Måns Rullgård <mans@...sr.com> writes:
> 
>> David Miller <davem@...emloft.net> writes:
>>
>>> From: Måns Rullgård <mans@...sr.com>
>>> Date: Wed, 11 Nov 2015 00:40:09 +0000
>>>
>>>> When the DMA complete interrupt arrives, the next chain should be
>>>> kicked off as quickly as possible, and I don't see why that would
>>>> benefit from being done in napi context.
>>>
>>> NAPI isn't about low latency, it's about fairness and interrupt
>>> mitigation.
>>>
>>> You probably don't even realize that all of the TX SKB freeing you do
>>> in the hardware interrupt handler end up being actually processed by a
>>> scheduled software interrupt anyways.
>>>
>>> So you are gaining almost nothing by not doing TX completion in NAPI
>>> context, whereas by doing so you would be gaining a lot including
>>> more simplified locking or even the ability to do no locking at all.
>>
>> TX completion is separate from restarting the DMA, and moving that to
>> NAPI may well be a good idea.  Should I simply napi_schedule() if the
>> hardware indicates TX is complete and do the cleanup in the NAPI poll
>> function?
> 
> I tried that, and throughput (as measured by iperf3) dropped by 2%.
> Maybe I did something wrong.

Did you fix all the locking in that change?

Since all of your TX handling runs in software interrupt context, you
can stop using IRQ locking and use BH locking driver-wide instead.

And actually, no locking is really needed for TX processing.  With
proper memory barriers and properly crafter queue state tests, you
can run completely lockless.

Again, look at example drivers.  I know, for example, that
drivers/net/ethernet/broadcom/tg3.c runs TX lockless.  You'll
see that tg3_tx() takes no locks at all.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html