linux-kernel - Re: [take25 1/6] kevent: Description.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45664160.6060504@cosmosbay.com>
Date:	Fri, 24 Nov 2006 01:48:32 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Ulrich Drepper <drepper@...hat.com>
CC:	Jeff Garzik <jeff@...zik.org>,
	Evgeniy Polyakov <johnpol@....mipt.ru>,
	David Miller <davem@...emloft.net>,
	Andrew Morton <akpm@...l.org>, netdev <netdev@...r.kernel.org>,
	Zach Brown <zach.brown@...cle.com>,
	Christoph Hellwig <hch@...radead.org>,
	Chase Venters <chase.venters@...entec.com>,
	Johann Borck <johann.borck@...sedata.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [take25 1/6] kevent: Description.

Ulrich Drepper a écrit :
> 
> You create worker threads to handle to work for the entire program. Look 
> at something like a web server.  When creating several queues, how do 
> you distribute all the connections to the different queues?  To ensure 
> every connection is handled as quickly as possible you stuff them all in 
> the same queue and then have all threads use this one queue. Whenever an 
> event is posted a thread is woken.  _One_ thread.  If two events are 
> posted, two threads are woken.  In this situation we have a few atomic 
> ops at userlevel to make sure that the two threads don't pick the same 
> event but that's all there is wrt "fighting".
> 
> The alternative is the sorry state we have now.  In nscd, for instance, 
> we have one single thread waiting for incoming connections and it then 
> has to wake up a worker thread to handle the processing.  This is done 
> because we cannot "park" all threads in the accept() call since when a 
> new connection is announced _all_ the threads are woken.  With the new 
> event handling this wouldn't be the case, one thread only is woken and 
> we don't have to wake worker threads.  All threads can be worker threads.

Having one specialized thread handling the distribution of work to worker 
threads is better most of the time. This thread can be a worker thread by 
itself (to avoid context switchs), but can decide to wake up 'slave threads' 
if he believes it has too (for example if he can notice that a *lot* of 
requests are pending)

This is because with moderate load, it's better to have only one CPU running 
80% of its time, keeping its cache hot, than 'distribute' the work on four 
CPU, that would be used 25% of their time, but with lot of cache line ping 
pongs and poor cache reuse.

If you let 'kevent'/'dumb kernel dispatcher'/'futex'/'whatever' decide to wake 
up one thread for each new event, you *may* have lower performance, because of 
higher system overhead (system means : system scheduler/internals, but also 
bus trafic)
  Only the application writer can have a clue of average use of its worker 
threads, and can decide to dynamically adjust parameters if needed to handle 
load spikes.

SMP machines are nice, but for many workloads, it's better to avoid spreading 
a working set on several CPUS that fight for common resources (memory).

Back to 'kevent':
-----------------
I think that having a syscall to commit events should not be mandatory. A 
syscall is needed only to wait for new events if the ring is empty. But then 
maybe we dont need yet a new syscall to perform a wait :
We already have nice synchronisations primitives (futex for example).

User program should be able to update a 'uidx' in user space (using atomic ops 
only if multi-threaded), and could just use futex infrastructure if ring 
buffer is empty (uidx == kidx) , and call FUTEX_WAIT( &kidx, current value = uidx)

I think I already gave my opinion on a ring buffer, but let just rephrase it :

One part should be read/write for application (to be able to change uidx)
(or User app just give at init time to kernel the address of a futex in its vm 
space)

One part could be read only for application (but could be read/write : we dont 
care if user application is stupid) : kernel writes its kidx (or a copy of it) 
and events.

For best performance, uidx and kidx should be on different cache lines (basic 
isolation of producer / consumer)

When kernel wants to queue a new event in a ring buffer it can :

See if user program did consume some events since last invocation (kernel 
fetches uidx and compare it with its own uidx value : no syscall needed)
Check if a slot is available in ring buffer.
Copy the event in ring buffer, perform a memory barrier, then increment kidx.
call futex_wake(&kidx, 1 thread)

User application is free to have one thread/process or several 
threads/processes waiting for new events (or even no thread at all :) )

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/