[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <063D6719AE5E284EB5DD2968C1650D6D1725EB40@AcuExch.aculab.com>
Date: Wed, 18 Jun 2014 16:38:44 +0000
From: David Laight <David.Laight@...LAB.COM>
To: David Laight <David.Laight@...LAB.COM>,
'Vlad Yasevich' <vyasevich@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: SCTP data chunk bundling when SCTP_NODELAY is set
From: David Laight
> From: Vlad Yasevich
> ...
> > >> I suppose we could implement SCTP_CORK to do the right thing.
> > >>
> > >> I thought is possibly utilizing something like sendmmsg() and passing
> > >> an extra flag to let it be know that this is a multi-message send
> > >> that should be queued up by sctp..
> > >
> > > It would be as easy to expose the extra flag to the 'application'
> > > allowing it to use sendmsg() or sendmmsg().
> > > While sendmmsg() saves a system call, it is fairly horrid to use.
> > > (and I'm sending from a kernel driver so don't care about the
> > > system call cost!)
> > >
> > > Possibly MSG_MORE with Nagle disabled could invoke the Nagle send
> > > delay - but you'd need to know whether any chunks in the queue
> > > had MSG_MORE clear.
> >
> > That's why doing this with cork would be simpler. The ULP can just
> > queue up a bunch of small data and if we pass nagle checks, it will be
> > flushed. If not, uncork will flush it.
>
> I think you need only care about the 'MSG_MORE' flag of the last data chunk.
> Any earlier data (with MSG_MORE clear) will usually have been sent (unless
> prevented by Nagle or flow control), so you probably wouldn't be able to
> send it regardless of the state of MSG_MORE on a chunk being queued.
> There is also the expectation that another send without MSG_MORE will
> happen almost immediately.
>
> So MSG_MORE could have the same effect as corking the socket.
> Although you'd need separate bits - but uncork could clear both.
>
> What I would like to implement (from M3UA) is to hold data for a maximum
> of (say) 5ms awaiting M3UA data chunks. To do this properly requires
> knowledge of the actual ethernet packet boundaries.
>
> The problem is there are (at least) three cases:
> 1) This data should be sent as soon as possible.
> 2) Send this data some time soonish.
> 3) I've got another data block I'm going to give you after this one.
>
> > I could work up a patch for you if you want.
>
> I was thinking I might try to write one.
Actually this might work for what I'm trying to do.
(untested).
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 0f4d15f..51030bc 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -691,7 +691,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet,
* if any previously transmitted data on the connection remains
* unacknowledged.
*/
- if (!sctp_sk(asoc->base.sk)->nodelay && sctp_packet_empty(packet) &&
+ if (sctp_sk(asoc->base.sk)->nodelay != 1 && sctp_packet_empty(packet) &&
inflight && sctp_state(asoc, ESTABLISHED)) {
unsigned int max = transport->pathmtu - packet->overhead;
unsigned int len = chunk->skb->len + q->out_qlen;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index fee06b9..084b957 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1928,7 +1928,10 @@ static int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
}
/* Break the message into multiple chunks of maximum size. */
+ if (msg->msg_flags & MSG_MORE)
+ sp->nodelay |= 2;
datamsg = sctp_datamsg_from_user(asoc, sinfo, msg, msg_len);
+ sp->nodelay &= 1;
if (IS_ERR(datamsg)) {
err = PTR_ERR(datamsg);
goto out_free;
Ideally MSG_MORE should delay sending even if 'inflight' is false.
But that would require 'flush on timeout'.
I'd prefer that, and with a configurable timeout.
But I can implement the timeout in the 'application'.
Given the way Nagle is implemented in sctp, I could keep flipping
it on and off - but that probably has undocumented behaviour
(ie it might suddenly change).
David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists