netdev - Re: [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Wed, 25 Nov 2015 11:26:21 -0500
From:	Sowmini Varadhan <sowmini.varadhan@...cle.com>
To:	Florian Westphal <fw@...len.de>
Cc:	David Miller <davem@...emloft.net>, tom@...bertland.com,
	hannes@...essinduktion.org, netdev@...r.kernel.org,
	kernel-team@...com, davejwatson@...com,
	alexei.starovoitov@...il.com
Subject: Re: [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)

On (11/24/15 17:25), Florian Westphal wrote:
> Its a well-written document, but I don't see how moving the burden of
> locking a single logical tcp connection (to prevent threads from
> reading a partial record) from userspace to kernel is an improvement.
> 
> If you really have 100 threads and must use a single tcp connection
> to multiplex some arbitrarily complex record-format in atomic fashion,
> then your requirements suck.

In the interest of providing some context from the rds-tcp use-case
here (without drifting into hyperbole).. RDS-TCP, like KCM,
provides a dgram-over-stream socket, with SEQPACKET semantics,
and an upper-bounded record-size per POSIX/SEQPACKET semantics. 
The major difference from kcm is that it does not use BPF, but 
instead has its own protocol header for each datagram.

There seems to be some misconception in this thread that this model
is about allowing application to be "lazy" and do a 1:1 mapping between
streams- that's not the case for RDS.

In the case of cluster apps, we have DB apps that want to have a single
dgram socket to talk to multiple peers (i.e., a star-network, with the
node in the center of the star wanting to have dgram sockets to everyone
else. Scale is more than a mere 100 threads).

If that central node wants reliable, ordered, congestion-managed
delivery, it would have to use UDP + bunch of its own code for
seq#, rexmit etc. And they are doing that today, but dont want the
to reinvent TCP's congavoid (and in fact, in the absence of congestion,
one complaint is that udp latency is 2x-3x better than rds-tcp for a
512 byte req, 8K resp that is typical for DB workloads. I'm still 
investigating)

>From the TCP standpoint of rds-tcp, we have a many-one mapping: 
multiple RDS sockets funneling to a single tcp connection, sharing
a single congestion state-machine.

I dont know if this is a "poorly designed application", I'm sure
its not perfect, but we have a ton of Oracle clustering s/w that's
already doing this with IB, so extending this with rds-tcp made
sense for us at this point.

--Sowmini
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html