linux-kernel - Re: [GIT PULL] kdbus for 4.1-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVCPy+UVUBxkD8qdEavfgpPArWof+2mguGjH3ZHJ-47PA@mail.gmail.com>
Date:	Tue, 28 Apr 2015 13:42:10 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	David Lang <david@...g.hm>
Cc:	Havoc Pennington <hp@...ox.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Lukasz Skalski <l.skalski@...sung.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Arnd Bergmann <arnd@...db.de>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
	Tom Gundersen <teg@...m.no>, Jiri Kosina <jkosina@...e.cz>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Daniel Mack <daniel@...que.org>,
	David Herrmann <dh.herrmann@...il.com>,
	Djalal Harouni <tixxdz@...ndz.org>
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 28, 2015 at 1:34 PM, David Lang <david@...g.hm> wrote:
> On Tue, 28 Apr 2015, Havoc Pennington wrote:
>
>> On Tue, Apr 28, 2015 at 1:19 PM, David Lang <david@...g.hm> wrote:
>>>
>>> If the examples that are being used to show the performance advantage of
>>> kdbus vs normal dbus are doing the wrong thing, then we need to get some
>>> other examples available to people who don't live and breath dbus that
>>> 'so
>>> things right' so that the kernel developers can see what you think is the
>>> real problem and how kdbus addresses it.
>>>
>>> So far, this 'wrong' example is the only thing that's been posted to show
>>> the performance advantage of kdbus.
>>
>>
>> I'm hopeful someone will do that.
>>
>> fwiw, I would be suspicious of a broken benchmark if it didn't show:
>>
>> * the bus daemon means an extra read/parse and marshal/write per
>> message, so 4 vs. 2
>> * the existence of the bus daemon therefore makes a message
>> send/receive take roughly twice as long
>>
>> https://lwn.net/Articles/580194/ has a bit more elaboration about
>> number of copies, validations, and context switches in each case.
>>
>> From what I can tell, the core performance claim for kdbus is that for
>> a userspace daemon to be a routing intermediary, it has to receive and
>> re-send messages. If the baseline performance of IPC is the cost to
>> send once and receive once, adding the daemon means there's twice as
>> much to do (1 more receive, 1 more send). However fast you make
>> send/receive, the daemon always means there are twice as many
>> send/receives as there would be with no daemon.
>
>
> there are twice as many context switches, nobody disputes that, the question
> is if it matters.
>
> It doesn't matter if the message router is in kernel space or user space, it
> still needs to read/parse, marshal/write the data, so you aren't saving that
> time due to it being in the kernel.
>
>> If that isn't what a benchmark shows, then there's a mystery to
>> explain... (one disruption to the ratio of course could be if the
>> clients use a much faster or slower dbus lib than the daemon)
>>
>> As noted many times, of course this 2x penalty for the daemon was a
>> conscious tradeoff - kdbus is trying to escape the tradeoff in order
>> to extend usage of dbus to more use cases. Given the tradeoff,
>> _existing_ uses of dbus seem to prefer the performance hit to the loss
>> of useful semantics, but potential new users would like to or need to
>> have both.
>
>
> If there is a 2x performance improvement for being in the kernel, but a 100x
> performance improvement from fixing the userspace code, the effort should be
> spent on the userspace code, not on moving things to kernel space.

I would guess that, if we compared a highly optimized userspace
implementation to a kernel implementation, we'd see less than 2x
difference.  After all, a userspace daemon doesn't really need to
unmarshal and re-marshal anything except headers.  For large messages,
we could use splice and avoid a couple of copies, too.

If the scheduler became a bottleneck, it could be interesting to add
something like a send-and-poll primitive.  I suspect that some
workloads currently do unnecessary context switches with only standard
POSIX primitives.  If A sends a message to B, then there's a brief
window in which both A and B are runnable.  Ideally we wouldn't
context switch until A calls poll or epoll_wait, but I don't know how
well that works in practice.

There's more room for generic improvements than just that.  At LSF/MM
we were talking about more scalable epoll variants that would allow a
multithreaded daemon to be woken up on the core that received incoming
data.  That would allow an efficient multi-queue dbus with fewer
migrations and IPIs.

At some point, I'd like to implement PCID on x86 (if no one beats me
to it, and this is a low priority for me), which will allow us to skip
expensive TLB flushes while context switching.  I have no idea whether
ARM can do something similar.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/