linux-kernel - Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070225190414.GB6460@elte.hu>
Date:	Sun, 25 Feb 2007 20:04:15 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Evgeniy Polyakov <johnpol@....mipt.ru>
Cc:	Ulrich Drepper <drepper@...hat.com>, linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	Christoph Hellwig <hch@...radead.org>,
	Andrew Morton <akpm@....com.au>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Zach Brown <zach.brown@...cle.com>,
	"David S. Miller" <davem@...emloft.net>,
	Suparna Bhattacharya <suparna@...ibm.com>,
	Davide Libenzi <davidel@...ilserver.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

* Evgeniy Polyakov <johnpol@....mipt.ru> wrote:

> Kevent is a _very_ small entity and there is _no_ cost of requeueing 
> (well, there is list_add guarded by lock) - after it is done, process 
> can start real work. With rescheduling there are _too_ many things to 
> be done before we can start new work. [...]

actually, no. For example a wakeup too is fundamentally a list_add 
guarded by a lock. Take a look at try_to_wake_up(). The rest you see 
there is just extra frills that relate to things like 'load-balancing 
the requests over multiple CPUs [which i'm sure kevent users would 
request in the future too]'.

> [...] We have to change registers, change address space, various tlb 
> bits and so on - we have to do it, since task describes very heavy 
> entity - the whole process. [...]

but ... 'threadlets' are called thread-lets because they are not full 
processes, they are threads. There's no TLB state in that case. There's 
indeed register state associated with them, and currently there can 
certainly be quite a bit of overhead in a context switch - but not in 
register saving. We do user-space register saving not in the scheduler 
but upon /every system call/. Fundamentally a kernel thread is just its 
EIP/ESP [on x86, similar on other architectures] - which can be 
saved/restored in near zero time. All the rest is something we added for 
good /work queueing/ reasons - and those same extras should either be 
eliminated if they turn out to be not so good reasons after all, or they 
will be wanted for kevents too eventually, once it matures as a work 
queueing solution.

> I think it is _too_ heavy to have such a monster structure like 
> task(thread/process) and related overhead just to do an IO.

i think you are really, really mistaken if you believe that the fact 
that whole tasks/threads or processes can be 'monster structures', 
somehow has any relevance to scheduling/task-queueing performance and 
scalability. It does not matter how large a task's address space is - 
scheduling only relates to the minimal context that is in the CPU. And 
most of that context we save upon /every system call entry/, and restore 
it upon every system call return. If it's so expensive to manipulate, 
why can the Linux kernel do a full system call in ~150 cycles? That's 
cheaper than the access latency to a single DRAM page.

for the same reason has it no relevance that the full kevent-based 
webserver is a 'monster structure' - still a single request's basic 
queueing operation is cheap. The same is true to tasks/threads.

Really, you dont even have to know or assume anything about the 
scheduler, just lets do some elementary math here:

the reqs/sec your sendfile+kevent based webserver can do is 7900 per 
sec. Lets assume you will write further great kevent code which will 
optimize it further and it goes up to 10,100 reqs per sec (100 usecs per 
request), ok? Then also try how many reschedules/sec can your Athon64 
3500 box do. My guess is: about a million per second (1 usec per 
reschedule), perhaps a bit more.

Now lets assume that a threadlet based server would have to 
context-switch for /every single/ request served. That's totally 
over-estimating it, even with lots of slow clients, but lets assume it, 
to judge the worst-case impact.

So if you had to schedule once per every request served, you'd have to 
add 1 usec to your 100 usecs cost, making it 101 usecs. That would bring 
your 10,100 requests per sec to 10,000 requests/sec, under a threadlet 
model of operation. Put differently: it will cost you only 1% in 
performance to schedule once for every request. Or lets assume the task 
is totally cache-cold and you'd have to add 4 usecs for its scheduling - 
that'd still only be 4%. So where is the fat?

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/