netdev - Re: [PATCH net-next 6/6] enetc: Add adaptive interrupt coalescing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200713153017.07caaf73@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Mon, 13 Jul 2020 15:30:17 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Claudiu Manoil <claudiu.manoil@....com>
Cc:     "David S . Miller" <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next 6/6] enetc: Add adaptive interrupt coalescing

On Mon, 13 Jul 2020 15:56:10 +0300 Claudiu Manoil wrote:
> Use the generic dynamic interrupt moderation (dim)
> framework to implement adaptive interrupt coalescing
> in ENETC.  With the per-packet interrupt scheme, a high
> interrupt rate has been noted for moderate traffic flows
> leading to high CPU utilization.  The 'dim' scheme
> implemented by the current patch addresses this issue
> improving CPU utilization while using minimal coalescing
> time thresholds in order to preserve a good latency.
> 
> Below are some measurement results for before and after
> this patch (and related dependencies) basically, for a
> 2 ARM Cortex-A72 @1.3Ghz CPUs system (32 KB L1 data cache),
> using netperf @ 1Gbit link (maximum throughput):
> 
> 1) 1 Rx TCP flow, both Rx and Tx processed by the same NAPI
> thread on the same CPU:
> 	CPU utilization		int rate (ints/sec)
> Before:	50%-60% (over 50%)		92k
> After:  just under 50%			35k
> Comment:  Small CPU utilization improvement for a single flow
> 	  Rx TCP flow (i.e. netperf -t TCP_MAERTS) on a single
> 	  CPU.
> 
> 2) 1 Rx TCP flow, Rx processing on CPU0, Tx on CPU1:
> 	Total CPU utilization	Total int rate (ints/sec)
> Before:	60%-70%			85k CPU0 + 42k CPU1
> After:  15%			3.5k CPU0 + 3.5k CPU1
> Comment:  Huge improvement in total CPU utilization
> 	  correlated w/a a huge decrease in interrupt rate.
> 
> 3) 4 Rx TCP flows + 4 Tx TCP flows (+ pings to check the latency):
> 	Total CPU utilization	Total int rate (ints/sec)
> Before:	~80% (spikes to 90%)		~100k
> After:   60% (more steady)		 ~10k
> Comment:  Important improvement for this load test, while the
> 	  ping test outcome was not impacted.
> 
> Signed-off-by: Claudiu Manoil <claudiu.manoil@....com>

Does it really make sense to implement DIM for TX?

For TX the only thing we care about is that no queue in the system
underflows. So the calculation is simply timeout = queue len / speed.
The only problem is which queue in the system is the smallest (TX 
ring, TSQ etc.) but IMHO there's little point in the extra work to
calculate the thresholds dynamically. On real life workloads the
scheduler overhead the async work structs introduce cause measurable
regressions.

That's just to share my experience, up to you to decide if you want 
to keep the TX-side DIM or not :)