lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1462911770.5333.11.camel@redhat.com>
Date:	Tue, 10 May 2016 22:22:50 +0200
From:	Paolo Abeni <pabeni@...hat.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jiri Pirko <jiri@...lanox.com>,
	Daniel Borkmann <daniel@...earbox.net>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Alexander Duyck <aduyck@...antis.com>,
	Tom Herbert <tom@...bertland.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...hat.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] net: threadable napi poll loop

On Tue, 2016-05-10 at 09:08 -0700, Eric Dumazet wrote:
> On Tue, 2016-05-10 at 18:03 +0200, Paolo Abeni wrote:
> 
> > If a single core host is under network flood, i.e. ksoftirqd is
> > scheduled and it eventually (after processing ~640 packets) will let the
> > user space process run. The latter will execute a syscall to receive a
> > packet, which will have to disable/enable bh at least once and that will
> > cause the processing of another ~640 packets. To receive a single packet
> > in user space, the kernel has to process more than one thousand packets.
> 
> Looks you found the bug then. Have you tried to fix it ?

The core functionality is implemented in ~100 lines of code, is that
the kind of bloat that do concerns you ?

That could probably be improved removing some code duplication, i.e.
factorizing napi_thread_wait() with irq_wait_for_interrupt() and
possibly napi_threaded_poll() with net_rx_action(). 

If the additional test inside napi_schedule() is really scaring, it can
be guarded with a static_key.

The ksoftirq and the local_bh_enable() design are the root of the
problem, they need to be touched/affected to solve it.

We actually experimented several different options.

Limiting the amount of work performed by local_bh_enable() somewhat
mitigate the issue, but it adds just another kernel parameter difficult
to be tuned.

Running the softirq loop exclusively inside the ksoftirqd will solve the
issue, but this is a very invasive approach, affecting all others
subsystem.

The above can be restricted to the net_rx_action only (i.e. running
net_rx_action always in ksoftirqd context). The related patch isn't
really much simpler than this and will add at least the same number of
additional tests in fast path.

Running the napi loop in a thread that can be migrated gives additional
benefit in the hyper-visor/VM scenario, which can't be achieved
elsewhere.

Would you consider the threaded irq alternative more viable ?

Cheers,

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ