lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 18 Jun 2014 16:38:44 +0000
From:	David Laight <David.Laight@...LAB.COM>
To:	David Laight <David.Laight@...LAB.COM>,
	'Vlad Yasevich' <vyasevich@...il.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: SCTP data chunk bundling when SCTP_NODELAY is set

From: David Laight
> From: Vlad Yasevich
> ...
> > >> I suppose we could implement SCTP_CORK to do the right thing.
> > >>
> > >> I thought is possibly utilizing something like sendmmsg() and passing
> > >> an extra flag to let it be know that this is a multi-message send
> > >> that should be queued up by sctp..
> > >
> > > It would be as easy to expose the extra flag to the 'application'
> > > allowing it to use sendmsg() or sendmmsg().
> > > While sendmmsg() saves a system call, it is fairly horrid to use.
> > > (and I'm sending from a kernel driver so don't care about the
> > > system call cost!)
> > >
> > > Possibly MSG_MORE with Nagle disabled could invoke the Nagle send
> > > delay - but you'd need to know whether any chunks in the queue
> > > had MSG_MORE clear.
> >
> > That's why doing this with cork would be simpler.  The ULP can just
> > queue up a bunch of small data and if we pass nagle checks, it will be
> > flushed.  If not, uncork will flush it.
> 
> I think you need only care about the 'MSG_MORE' flag of the last data chunk.
> Any earlier data (with MSG_MORE clear) will usually have been sent (unless
> prevented by Nagle or flow control), so you probably wouldn't be able to
> send it regardless of the state of MSG_MORE on a chunk being queued.
> There is also the expectation that another send without MSG_MORE will
> happen almost immediately.
> 
> So MSG_MORE could have the same effect as corking the socket.
> Although you'd need separate bits - but uncork could clear both.
> 
> What I would like to implement (from M3UA) is to hold data for a maximum
> of (say) 5ms awaiting M3UA data chunks. To do this properly requires
> knowledge of the actual ethernet packet boundaries.
> 
> The problem is there are (at least) three cases:
> 1) This data should be sent as soon as possible.
> 2) Send this data some time soonish.
> 3) I've got another data block I'm going to give you after this one.
> 
> > I could work up a patch for you if you want.
> 
> I was thinking I might try to write one.

Actually this might work for what I'm trying to do.
(untested).

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 0f4d15f..51030bc 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -691,7 +691,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet,
 	 * if any previously transmitted data on the connection remains
 	 * unacknowledged.
 	 */
-	if (!sctp_sk(asoc->base.sk)->nodelay && sctp_packet_empty(packet) &&
+	if (sctp_sk(asoc->base.sk)->nodelay != 1 && sctp_packet_empty(packet) &&
 	    inflight && sctp_state(asoc, ESTABLISHED)) {
 		unsigned int max = transport->pathmtu - packet->overhead;
 		unsigned int len = chunk->skb->len + q->out_qlen;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index fee06b9..084b957 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1928,7 +1928,10 @@ static int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
 	}
 
 	/* Break the message into multiple chunks of maximum size. */
+	if (msg->msg_flags & MSG_MORE)
+		sp->nodelay |= 2;
 	datamsg = sctp_datamsg_from_user(asoc, sinfo, msg, msg_len);
+	sp->nodelay &= 1;
 	if (IS_ERR(datamsg)) {
 		err = PTR_ERR(datamsg);
 		goto out_free;

Ideally MSG_MORE should delay sending even if 'inflight' is false.
But that would require 'flush on timeout'.
I'd prefer that, and with a configurable timeout.
But I can implement the timeout in the 'application'.

Given the way Nagle is implemented in sctp, I could keep flipping
it on and off - but that probably has undocumented behaviour
(ie it might suddenly change).

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ