netdev - Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20061130103240.GA25733@elte.hu>
Date:	Thu, 30 Nov 2006 11:32:40 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Evgeniy Polyakov <johnpol@....mipt.ru>
Cc:	Nick Piggin <nickpiggin@...oo.com.au>,
	David Miller <davem@...emloft.net>, wenji@...l.gov,
	akpm@...l.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

* Evgeniy Polyakov <johnpol@....mipt.ru> wrote:

> > David's line of thinking for a solution sounds better to me. This 
> > patch does not prevent the process from being preempted (for 
> > potentially a long time), by any means.
> 
> It steals timeslices from other processes to complete tcp_recvmsg() 
> task, and only when it does it for too long, it will be preempted. 
> Processing backlog queue on behalf of need_resched() will break 
> fairness too - processing itself can take a lot of time, so process 
> can be scheduled away in that part too.

correct - it's just the wrong thing to do. The '10% performance win' 
that was measured was against _9 other tasks who contended for the same 
CPU resource_. I.e. it's /not/ an absolute 'performance win' AFAICS, 
it's a simple shift in CPU cycles away from the other 9 tasks and 
towards the task that does TCP receive.

Note that even without the change the TCP receiving task is already 
getting a disproportionate share of cycles due to softirq processing! 
Under a load of 10.0 it went from 500 mbits to 74 mbits, while the 
'fair' share would be 50 mbits. So the TCP receiver /already/ has an 
unfair advantage. The patch only deepends that unfairness.

The solution is really simple and needs no kernel change at all: if you 
want the TCP receiver to get a larger share of timeslices then either 
renice it to -20 or renice the other tasks to +19.

The other disadvantage, even ignoring that it's the wrong thing to do, 
is the crudeness of preempt_disable() that i mentioned in the other 
post:

---------->

independently of the issue at hand, in general the explicit use of 
preempt_disable() in non-infrastructure code is quite a heavy tool. Its 
effects are heavy and global: it disables /all/ preemption (even on 
PREEMPT_RT). Furthermore, when preempt_disable() is used for per-CPU 
data structures then [unlike for example to a spin-lock] the connection 
between the 'data' and the 'lock' is not explicit - causing all kinds of 
grief when trying to convert such code to a different preemption model. 
(such as PREEMPT_RT :-)

So my plan is to remove all "open-coded" use of preempt_disable() [and 
raw use of local_irq_save/restore] from the kernel and replace it with 
some facility that connects data and lock. (Note that this will not 
result in any actual changes on the instruction level because internally 
every such facility still maps to preempt_disable() on non-PREEMPT_RT 
kernels, so on non-PREEMPT_RT kernels such code will still be the same 
as before.)

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html