netdev - Re: [PATCH net] sctp: change to save MSG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 7 Mar 2017 11:56:41 +0800
From:   Xin Long <lucien.xin@...il.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     network dev <netdev@...r.kernel.org>,
        "linux-sctp@...r.kernel.org" <linux-sctp@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Neil Horman <nhorman@...driver.com>,
        Vlad Yasevich <vyasevich@...il.com>
Subject: Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc

On Sat, Mar 4, 2017 at 12:51 PM, Xin Long <lucien.xin@...il.com> wrote:
> On Sat, Mar 4, 2017 at 1:57 AM, Xin Long <lucien.xin@...il.com> wrote:
>> On Sat, Mar 4, 2017 at 12:31 AM, David Laight <David.Laight@...lab.com> wrote:
>>> From: Xin Long
>>>> Sent: 03 March 2017 15:43
>>> ...
>>>> > It is much more important to get MSG_MORE working 'properly' for SCTP
>>>> > than for TCP. For TCP an application can always use a long send.
>>>
>>>> "long send" ?, you mean bigger data, or keeping sending?
>>>> I didn't get the difference between SCTP and TCP, they
>>>> are similar when sending data.
>>>
>>> With tcp an application can always replace two send()/write()
>>> calls with a single call to writev().
>>> For sctp two send() calls must be made in order to generate two
>>> data chunks.
>>> So it is much easier for a tcp application to generate 'full'
>>> ethernet packets.
>> okay, it should not be a important reason, and sctp might also support
>> it one day. :-)
>>
>>>
>>>>
>>>> >
>>>> > ...
>>>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>>>> >>        * breaks.
>>>> >>        */
>>>> >>       err = sctp_primitive_SEND(net, asoc, datamsg);
>>>> >> +     asoc->force_delay = 0;
>>>> >>       /* Did the lower layer accept the chunk? */
>>>> >>       if (err) {
>>>> >>               sctp_datamsg_free(datamsg);
>>>> >
>>>> > I don't think this is right - or needed.
>>>> > You only get to the above if some test has decided to send data chunks.
>>>> > So it just means that the NEXT time someone tries to send data all the
>>>> > queued data gets sent.
>>>
>>>> the NEXT time someone tries to send data with "MSG_MORE clear",
>>>> yes, but with "MSG_MORE set", it will still delay.
>>>>
>>>> > I'm guessing that the whole thing gets called in a loop (definitely needed
>>>> > for very long data chunks, or after the window is opened).
>>>
>>>> yes, if users keep sending data chunks with MSG_MORE set, no
>>>> data with "MSG_MORE clear" gap.
>>>>
>>>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
>>>> > set it would expect to see a lot of full ethernet frames be sent.
>>>
>>>> right.
>>>
>>>> > With the above a frame will be sent (containing all but 1 chunk) when the
>>>> > amount of queued data becomes too large for an ethernet frame, and immediately
>>>> > followed by a second ethernet frame with 1 chunk in it.
>>>
>>>> "followed by a second ethernet frame with 1 chunk in it.", I think this's
>>>> what you're really worried about, right ?
>>>> But sctp flush data queue NOT like what you think, it's not keep traversing
>>>> the queue untill the queue is empty.
>>>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
>>>> will return. it will pack chunks and send the next packet again untill some
>>>> other 'event' triggers it, like retransmission or data received from peer.
>>>> I don't think this is a problem.
>>>
>>> Erm.... that can't work.
>>> I think there is code to convert a large user send into multiple data chunks.
>>> So if the user does a 4k (say) send several large chunks get queued.
>>> These would need to all be sent at once.
>>>
>>> Similarly when the transmit window is received.
>>> So somewhere there ought to be a loop that will send more than one packet.
>> As far as I can see, no loop like you said, mostly, the incoming
>> chunk (like SACK) from peer will trigger the next flush out.
>> I can try to trace the path in kernel for sure tomorrow.
> okay, you are right, I missed sctp_packet_transmit_chunk also call
> sctp_packet_transmit to send the current packet. :)
>
> But if we keep sending data with "MSG_MORE", after one ethernet frame
> is sent, "followed by a second ethernet frame with 1 chunk in it" will NOT
> happen, as in this loop the asoc's msg_more flag is still set, and this flush
> is called by sctp_sendmsg(the function msg_more should care more).
>
> did I miss something ?
Hi, David Laight

As now sctp MSG_MORE is broken with msg->force_day, I hope we
can use this patch before you get a better one to fix it, so that users
would not be confused with the strange behaviour.

what do you say ?

>
>>
>>>
>>>> > Now it might be that the flag needs clearing when retransmissions are queued.
>>>> > OTOH they might get sent for other reasons.
>>>
>>>> Before we really overthought about MSG_MORE, no need to care about
>>>> retransmissions, define MSG_MORE, in my opinion, it works more for
>>>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.
>>>
>>> Eh? and when nagle disabled.
>>> If 'inflight' isn't 0 then most paths don't flush data.
>> I knew, but MSG_MORE is different thing, it should only try to work for the
>> current and following data.
>>
>>>
>>>> We cannot let asoc's more_more flag work as global, it will block elsewhere
>>>> sending data chunks, not only sctp_sendmsg.
>>>
>>> If the connection was flow controlled off, and more 'credit' arrives and there
>>> is less that an ethernet frame's worth of data pending, and the last send
>>> said 'MSG_MORE' there is no point sending anything until the application
>>> does a send with MSG_MORE clear.
>> got you, I think you have different understanding about MSG_MORE
>> while this patch just try to make it work like TCP's msg_more, but what
>> you mentioned here is the same as TCP thing, seems you also want
>> to improve TCP's MSG_MORE :-)
>>
>>>
>>> I'm not sure what causes a retransmission to send data, I suspect that 'inflight'
>>> can easily be non-zero at that time.
>> The thing that causes a retransmission to send data is that both tx and
>> rtx send data through sctp_outq_flush, in which it will try to send rtx queue,
>> then rx queue.
>>
>> yes, once a packet is sent out and not yet be SACKed, "inflight" will not be
>> zero, so when retransmiting, "inflight" must be non-zero.
>>
>>> Likely something causes a packet be generated - which then collects the data chunks.
>>>
>>>         David
>>>
>>>