netdev - Re: [net-next-2.6 PATCH][be2net] remove napi in the tx path and do tx completion processing in interrupt context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4A13CC98.60506@cosmosbay.com>
Date:	Wed, 20 May 2009 11:25:44 +0200
From:	Eric Dumazet <dada1@...mosbay.com>
To:	David Miller <davem@...emloft.net>
CC:	ajitk@...verengines.com, netdev@...r.kernel.org
Subject: Re: [net-next-2.6 PATCH][be2net] remove napi in the tx path and do
 tx completion processing in interrupt context

David Miller a écrit :
> From: Ajit Khaparde <ajitk@...verengines.com>
> Date: Tue, 19 May 2009 17:40:58 +0530
> 
>> This patch will remove napi in tx path and do Tx compleiton
>> processing in interrupt context.  This makes Tx completion
>> processing simpler without loss of performance.
>>
>> Signed-off-by: Ajit Khaparde <ajitk@...verengines.com>
> 
> This is different from how every other NAPI driver does this.
> 
> You should have a single NAPI context, that handles both TX and RX
> processing.  Except, that for TX processing, no work budget
> adjustments are made.  You simply unconditionally process all pending
> TX work without accounting it into the POLL call budget.
> 
> I have no idea why this driver tried to split the RX and TX
> work like this, it accomplishes nothing but add overhead.
> Simply add the TX completion code to the RX poll handler
> and that's all you need to do.  Also, make sure to run TX
> polling work before RX polling work, this makes fresh SKBs
> available for responses generated by RX packet processing.
> 
> I bet this is why you really saw performance problems, rather than
> something to do with running it directly in interrupt context.  There
> should be zero gain from that if you do the TX poll work properly in
> the RX poll handler.  When you free TX packets in hardware interrupt
> context using dev_kfree_skb_any() that just schedules a software
> interrupt to do the actual SKB free, which adds just more overhead for
> TX processing work.  You aren't avoiding software IRQ work by doing TX
> processing in the hardware interrupt handler, in fact you
> theoretically are doing more.
> 
> So the only conclusion I can come to is that what is important is
> doing the TX completion work before the RX packets get processed in
> the NAPI poll handler, and you accomplish that more efficiently and
> more properly by simply moving the TX completion work to the top of
> the RX poll handler code.
> 

Thanks David for this analysis

I would like to point a scalability problem we currently have with non
multiqueue devices, and multi core host with the schem you described/advocated.

(this has nothing to do with the be2net patch, please forgive me for jumping in)

When a lot of network trafic is handled by one device, we enter in a
ksofirqd/napi mode, where one cpu is almost dedicated in handling
both TX completions and RX completions, while other cpus
run application code (and some parts of TCP/UDP stack )

Thats really expensive because of many cache line ping pongs occurring.

In that case, it would make sense to transfert most part of the TX completion work
to the other cpus (cpus that order the xmits actually). skb freeing of course,
and sock_wfree() callbacks...

So maybe some NIC device drivers could let their ndo_start_xmit()
do some cleanup work of previously sent skbs. If correctly done,
we could lower number of cache line ping pongs.

This would give a breath to the cpu that would only take care of RX completions,
and probably give better throughput. Some machines out there want to transmit
lot of frames, while receiving few ones...


There is also a minor latency problem with current schem :
Taking care of TX completion takes some time and delay RX handling, increasing latencies
of incoming trafic.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html