[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151124222242.GD23215@breakpoint.cc>
Date: Tue, 24 Nov 2015 23:22:42 +0100
From: Florian Westphal <fw@...len.de>
To: Tom Herbert <tom@...bertland.com>
Cc: Florian Westphal <fw@...len.de>,
David Miller <davem@...emloft.net>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Kernel Team <kernel-team@...com>, davejwatson@...com,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Subject: Re: [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)
Tom Herbert <tom@...bertland.com> wrote:
> On Tue, Nov 24, 2015 at 12:55 PM, Florian Westphal <fw@...len.de> wrote:
> > Why anyone would invest such a huge amount of work in making this
> > kernel-based framing for single-stream tcp record (de)mux rather than
> > improving the userspace protocol to use UDP or SCTP or at least
> > one tcp connection per worker is beyond me.
> >
> From the /0 patch:
>
> Q: Why not use an existing message-oriented protocol such as RUDP,
> DCCP, SCTP, RDS, and others?
>
> A: Because that would entail using a completely new transport protocol.
Thats why I wrote 'or at least one tcp connection per worker'.
> > For TX side, why is writev not good enough?
>
> writev on a TCP stream does not guarantee atomicity of the operation.
Are you talking about short writes?
> It writes atomic without user space needing to implement locking when
> a socket is shared amongst threads.
Yes, I get that point, but I maintain that KCM is a strange workaround
for bad userspace design.
1 tcp connection per thread -> no userspace sockfd lock needed
Sender side can use writev, sendmsg, sendmmsg, etc to avoid sending
sub-record sized frames.
Is user space really so bad that instead of fixing it its simpler to
work around it with even more kernel bloat?
Since for KCM userspace has to be adjusted anyway I find that hard
to believe.
I don't know if the 'dynamic RCVLOWAT' that you want is needed
(you say 'yes', Eric reply seems to indicate its not (at least assuming
a sane/friendly peer that doesn't intentionally xmit byte-by-byte).
But assuming there would really be a benefit, maybe a RCVLOWAT2 could
be added? Of course we could only make it a hint and would have to
make a blocking read return with less data than desired when tcp rmem limit
gets hit. But at least we'd avoid the 'unbounded allocation of large
amount of kernel memory' thing that we have with current proposal.
Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists