[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1448307337.1628829.447962729.60B03DCC@webmail.messagingengine.com>
Date: Mon, 23 Nov 2015 20:35:37 +0100
From: Hannes Frederic Sowa <hannes@...essinduktion.org>
To: Tom Herbert <tom@...bertland.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Kernel Team <kernel-team@...com>, davewatson@...com,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Subject: Re: [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)
Hello Tom,
On Mon, Nov 23, 2015, at 18:33, Tom Herbert wrote:
> > For me this still looks a little bit like messages could be delimited by
> > TCP PSH flag, where we might need to have some more fine grained control
> > over and besides that just adding better fanout semantics to TCP, no?
> >
> The TCP PSH flag is not defined for message delineation (neither is
> urgent pointer). We can't change that (many people have tried to add
> message semantics to TCP protocol but have always failed miserably).
> The fact is TCP is always going to be a stream based protocol. Period!
> :-) It is up to the application to interpret the stream and extract
> messages. Even if we could somehow apply the PSH bit to "help" in
> message delineation, we would need to change senders to use the PSH
> bit in that fashion for it to be of benefit to receivers.
I see TCP PSH flags as an optimization and I agree it is hard to
properly make use of them in the internet. But in a datacenter where
everything is under control, this could be done?
Anyway, decoding arbitrary messages in the kernel with maybe huge
lengths could result in starvation problems if you adhere to the socket
receive buffer limits at all time. So I wonder if forward progress
guarantee can be achieved here agnostic of the eBPF program? I really
see this becoming a problem as soon as people use it for privilege
separation. Will there be central error handling?
Also, would a TCP option make sense here to add instead of using the TCP
PSH flag? Not sure, yet...
> > Do kcm sockets still allow streaming unlimited amounts of data? E.g. if
> > you want to pass a data stream attached to a rpc message? I think not
> > allowing streaming is a major shortcoming then (even though this will
> > induce head of line blocking).
> >
> RPC messages can be of arbitrary size and with SOCK_SEQPACKET,
> messages can be sent or received in multiple calls. No HOL blocking
> since message are constructed on KCM sockets before starting to send
> on TCP sockets. Socket buffer limits are respected. KCM does not
> enforce a maximum message size, if an applications does have a maximum
> then that can be checked in the BPF code.
I was referring to the receivers end HOL blocking, the same as in user
space TCP, where one data stream (or huge message) keeps the byte stream
busy so no other datagrams in there can be delivered. For low latency I
would actually use multiple streams or switch to UDP with user space
based retry.
I think this problem more and more comes down to improve epoll interface
with somewhat better CPU steered wake-up capabilities to make it more
agnostic. Some programs e.g. want also be woken up if a HTTP header is
received completely, SO_RCVLOWAT was made for this, FreeBSD has
accept_filter for this kind.
You want to use this in thrift which is mainly Java based and reuse the
existing NIO infrastructure?
> >> Future support:
> >>
> >> - Integration with TLS (TLS-in-kernel is a separate initiative).
> >
> > This is interesting:
> >
> > Regarding the last week's discussion about better OOB support in TCP
> > e.g. for SOCKET_DESTROY, do you already have a plan to handle TLS alerts
> > and do CHANGE_CIPHER on the socket synchronously?
> >
> Dave should be posting the basic TLS-in-the-kenel patches shortly,
> those will be a better context for discussion.
Thanks, I am looking at them right now. :)
Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists