[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <465D4DF1.5040500@gmail.com>
Date: Wed, 30 May 2007 19:12:01 +0900
From: Tejun Heo <htejun@...il.com>
To: Willy Tarreau <w@....eu>
CC: davids@...master.com, linux-kernel@...r.kernel.org
Subject: Re: epoll,threading
Hello,
Willy Tarreau wrote:
> In my experience, it's not much the context switch by itself which
> causes performance degradation, but the fact that with threads, you
> have to put mutexes everywhere. And frankly, walking a list with
> locks everywhere is quite slower than doing it in one run at a rate
> of 3 or 4 cycles per entry. Also, local storage in function returns
> is not possible anymore, and some functions even need to malloc()
> instead of returning statically allocated data. I believe this is the
> reason for openssl being twice as slow when compiled thread-safe than
> in native mode.
>
> So in fact, converting a threaded program to a pure async model
> should not improve it much because of the initial architectural
> design. But a program written from scratch to be purely async should
> perform better simply because it has less operations to perform. And
> there's no magics here : less cycles spend synchronizing and locking
> = more cycles available for the real job.
The thing is that the synchronization overhead is something you'll have
to pay anyway to support multiple processors. Actually, supporting
multiple processors on an async program is beyond painful. Either you
have to restrict all locking to busy locks or introduce new state for
each possibly blocking synchronization point and what happens if they
have to nest? You kind of end up with stackable state thingie - an
extremely restricted stack.
If you're really serious about performance and scalability, you just
have to support multiple processors and if you do it right the
performance overhead shouldn't be too high. Common servers will soon
have 8 cores on two physical processors - paying some overhead for
synchronization is pretty good deal for scalability.
>> In my experience with web caches, epoll or similar for idle clients
>> and thread per active client scaled and performed pretty well - it
>> needed more memory but the performance wasn't worse than
>> asynchronous design and doing complex server in async model is a
>> lot of pain.
>
> It's true that an async model is a lot of pain. But it's always where
> I got the best performance. For instance, with epoll(), I can achieve
> 20000 HTTP reqs/s with 40000 concurrent sessions. The best
> performance I have observed from threaded competitors was an order of
> magnitude below on either value (sometimes both).
Well, it all depends on how you do it but an order of magnitude
performance difference sounds too much to me. Memory-wise scalability
can be worse by orders of magnitude. You need to restrict per-thread
stack size and use epoll for idle threads, if you wanna scale. Workers
+ async monitoring of idle clients scale pretty well.
> However, I agree that few uses really require to spend time writing
> and debugging async programs.
Yeap, also there are several things which just are too painful in async
server - e.g. adding coordination with another server (virus scan,
sharing cached data), implementing pluggable extension framwork for
third parties (and what happens if they should be able to stack!), and
maintaining the damn thing while trying to add a few features. :-)
IMHO, complex pure async server doesn't really make sense anymore.
Thanks.
--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists