netdev - Re: [PATCH net] sctp: change to save MSG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 4 Mar 2017 01:57:50 +0800
From:   Xin Long <lucien.xin@...il.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     network dev <netdev@...r.kernel.org>,
        "linux-sctp@...r.kernel.org" <linux-sctp@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Neil Horman <nhorman@...driver.com>,
        Vlad Yasevich <vyasevich@...il.com>
Subject: Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc

On Sat, Mar 4, 2017 at 12:31 AM, David Laight <David.Laight@...lab.com> wrote:
> From: Xin Long
>> Sent: 03 March 2017 15:43
> ...
>> > It is much more important to get MSG_MORE working 'properly' for SCTP
>> > than for TCP. For TCP an application can always use a long send.
>
>> "long send" ?, you mean bigger data, or keeping sending?
>> I didn't get the difference between SCTP and TCP, they
>> are similar when sending data.
>
> With tcp an application can always replace two send()/write()
> calls with a single call to writev().
> For sctp two send() calls must be made in order to generate two
> data chunks.
> So it is much easier for a tcp application to generate 'full'
> ethernet packets.
okay, it should not be a important reason, and sctp might also support
it one day. :-)

>
>>
>> >
>> > ...
>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>> >>        * breaks.
>> >>        */
>> >>       err = sctp_primitive_SEND(net, asoc, datamsg);
>> >> +     asoc->force_delay = 0;
>> >>       /* Did the lower layer accept the chunk? */
>> >>       if (err) {
>> >>               sctp_datamsg_free(datamsg);
>> >
>> > I don't think this is right - or needed.
>> > You only get to the above if some test has decided to send data chunks.
>> > So it just means that the NEXT time someone tries to send data all the
>> > queued data gets sent.
>
>> the NEXT time someone tries to send data with "MSG_MORE clear",
>> yes, but with "MSG_MORE set", it will still delay.
>>
>> > I'm guessing that the whole thing gets called in a loop (definitely needed
>> > for very long data chunks, or after the window is opened).
>
>> yes, if users keep sending data chunks with MSG_MORE set, no
>> data with "MSG_MORE clear" gap.
>>
>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
>> > set it would expect to see a lot of full ethernet frames be sent.
>
>> right.
>
>> > With the above a frame will be sent (containing all but 1 chunk) when the
>> > amount of queued data becomes too large for an ethernet frame, and immediately
>> > followed by a second ethernet frame with 1 chunk in it.
>
>> "followed by a second ethernet frame with 1 chunk in it.", I think this's
>> what you're really worried about, right ?
>> But sctp flush data queue NOT like what you think, it's not keep traversing
>> the queue untill the queue is empty.
>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
>> will return. it will pack chunks and send the next packet again untill some
>> other 'event' triggers it, like retransmission or data received from peer.
>> I don't think this is a problem.
>
> Erm.... that can't work.
> I think there is code to convert a large user send into multiple data chunks.
> So if the user does a 4k (say) send several large chunks get queued.
> These would need to all be sent at once.
>
> Similarly when the transmit window is received.
> So somewhere there ought to be a loop that will send more than one packet.
As far as I can see, no loop like you said, mostly, the incoming
chunk (like SACK) from peer will trigger the next flush out.
I can try to trace the path in kernel for sure tomorrow.

>
>> > Now it might be that the flag needs clearing when retransmissions are queued.
>> > OTOH they might get sent for other reasons.
>
>> Before we really overthought about MSG_MORE, no need to care about
>> retransmissions, define MSG_MORE, in my opinion, it works more for
>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.
>
> Eh? and when nagle disabled.
> If 'inflight' isn't 0 then most paths don't flush data.
I knew, but MSG_MORE is different thing, it should only try to work for the
current and following data.

>
>> We cannot let asoc's more_more flag work as global, it will block elsewhere
>> sending data chunks, not only sctp_sendmsg.
>
> If the connection was flow controlled off, and more 'credit' arrives and there
> is less that an ethernet frame's worth of data pending, and the last send
> said 'MSG_MORE' there is no point sending anything until the application
> does a send with MSG_MORE clear.
got you, I think you have different understanding about MSG_MORE
while this patch just try to make it work like TCP's msg_more, but what
you mentioned here is the same as TCP thing, seems you also want
to improve TCP's MSG_MORE :-)

>
> I'm not sure what causes a retransmission to send data, I suspect that 'inflight'
> can easily be non-zero at that time.
The thing that causes a retransmission to send data is that both tx and
rtx send data through sctp_outq_flush, in which it will try to send rtx queue,
then rx queue.

yes, once a packet is sent out and not yet be SACKed, "inflight" will not be
zero, so when retransmiting, "inflight" must be non-zero.

> Likely something causes a packet be generated - which then collects the data chunks.
>
>         David
>
>