netdev - RE: MCTP - Socket Queue Behavior

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: 
 <SJ0PR19MB4415EA14FC114942FC79953587502@SJ0PR19MB4415.namprd19.prod.outlook.com>
Date: Tue, 20 Feb 2024 04:51:23 +0000
From: "Ramaiah, DharmaBhushan" <Dharma.Ramaiah@...l.com>
To: Jeremy Kerr <jk@...econstruct.com.au>,
        "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>,
        "matt@...econstruct.com.au"
	<matt@...econstruct.com.au>
CC: "Rahiman, Shinose" <Shinose.Rahiman@...l.com>
Subject: RE: MCTP - Socket Queue Behavior

Hi Jeremy,

Thanks for the reply. I have few additional queries.


Internal Use - Confidential
> -----Original Message-----
> From: Jeremy Kerr <jk@...econstruct.com.au>
> Sent: 20 February 2024 07:51
> To: Ramaiah, DharmaBhushan <Dharma_Ramaiah@...l.com>;
> netdev@...r.kernel.org; matt@...econstruct.com.au
> Subject: Re: MCTP - Socket Queue Behavior
>
>
> [EXTERNAL EMAIL]
>
> Hi Dharma,
>
> > Linux implementation of MCTP uses socket for communication with MCTP
> > capable EP's. Socket calls can be made ASYNC by using fcntl. I have a
> > query based on ASYNC properties of the MCTP socket.
>
> Some of your questions aren't really specific to non-blocking sockets; it seems
> like you're assuming that the blocking send case will wait for a response before
> returning; that's not the case, as sendmsg() will complete once the outgoing
> message is queued (more on what that means below).
>
> So, you still have the following case, still using a blocking socket:
>
>   sendmsg(message1)
>   sendmsg(message2)
>
>   recvmsg() -> reply 1
>   recvmsg() -> reply 2
>
> - as it's entirely possible to have multiple messages in flight - either
>   as queued skbs, or having being sent to the remote endpoint.
>
> > 1. Does kernel internally maintain queue, for the ASYNC requests?
>
> There is no difference between blocking or non-blocking mode in the queueing
> implementation. There is no MCTP-protocol-specific queue for sent messages.
>
> (the blocking/nonblocking mode may affect how we wait to allocate a skb, but
> it doesn't sound like that's what you're asking here)
>
> However, once a message is packetised (possibly being fragmented into
> multiple packets), those those *packets* may be queued to the device by the
> netdev core. The transport device driver may have its own queues as well.
>
> In the case where you have multiple concurrent sendmsg() calls (typically
> through separate threads, and either on one or multiple sockets), it may be
> possible for packets belonging to two messages to be interleaved on the wire.
> That scenario is well-supported by the MCTP protocol through the packet tag
> mechanism.
>
> > a. If so, what is the queue depth (can one send multiple requests
> > without waiting for the response
>
> The device queue depth depends on a few things, but has no impact on
> ordering of requests to responses. It's certainly possible to have multiple
> requests in flight at any one time: just call sendmsg() multiple times, even in
> blocking mode.
>
> (the practical limit for pending messages is 8, limited by the number of MCTP
> tag values for any (remote-EID, local-EID, tag) tuple)
>
> > and expect reply in order of requests)?
>
> We have no control over reply ordering. It's entirely possible that replies are
> sent out of sequence by the remote endpoint:
>
>   local application          remote endpoint
>
>   sendmsg(message 1)
>   sendmsg(message 2)
>                              receives message 1
>                              receives message 2
>                              sends a reply 2 to message 2
>                              sends a reply 1 to message 1
>   recvmsg() -> reply 2
>   recvmsg() -> reply 1
>

Based on the above explanation I understand that the sendto allocates the skb (based on the blocking/nonblocking mode). mctp_i2c_tx_thread, dequeues the skb and transmits the message. And also sendto can interleave the messages on the wire with different message tag. My query here regarding the bus lock.

1. Is the bus lock taken for the entire duration of sendto and revcfrom (as indicated in one of the previous threads).  Assume a case where we have a two EP's (x and y) on I2C bus #1 and these EP's are on different segments. In this case, shoudn't the bus be locked for the entire duration till we receive the reply or else remote EP might drop the packet as the MUX is switched.

                         Local application                                                    remote endpoint

                Userspace                             Kernel Space

sendmsg(msg1)<ep - x, i2cbus -1, seg1>
sendmsg(msg2)<ep -y, i2cbus - 1, seg2>

                                               lock(bus)
                                                                               send(msg1)
                                                                receive(msg1)
                                                                                                                    sendreply(msg1)
                                                                               unlock(bus)
recvmsg(msg1)  <- Reply                                    lockbus(bus)
                                                                                send(msg1)
                                                                receive(msg1)
                                                                                                                    sendreply(msg1)
                                                                               unlock(bus)
recvmsg(msg2)  <- Reply

Also today, MCTP provides no mechanism to advertise if the remote EP can handle more than one request at a time. Ability to handle multiple messages is purely based on the device capability. In these cases shouldn't Kernel provide a way to lock the bus till the response is obtained?
Please let me know if I am missing something.

> So if a userspace application sends multiple messages concurrently, it must
> have some mechanism to correlate the incoming replies with the original
> request state. All of the upper-layer protocols that I have seen have facilities
> for this (Instance ID in MCTP Control protocol, Command Slot Index in NVMe-
> MI, Instance ID in PLDM, ...)
>
> (You could also use the MCTP tags to achieve this correlation, but there are
> very likely better ways in the upper-layer protocol)
>


> > b. Does the Kernel maintain queue per socket connection?
>
> MCTP is datagram-oriented, there is no "connection".
>
> In terms of per-socket queues: there is the incoming socket queue that holds
> received messages that are waiting for userspace to dequeue via recvmsg() (or
> similar). However, there is nothing MCTP-specific about this, it's all generic
> socket code.
>
> > 2. Is FASYNC a mechanism for handling asynchronous events associated
> > with a file descriptor and it doesn't provide parallelism for multiple
> > send operation?
>
> The non-blocking socket interfaces (FASYNC, O_NONBLOCK, MSG_DONTWAIT)
> are mostly unrelated to whether your application sends multiple messages at
> once. It's entirely possible to have multiple messages in flight while using the
> blocking interfaces.
>
> Cheers,
>
>
> Jeremy