[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFJW8X_1CNxe0PhxyaBjYSu0xNw0ymyDVFmsA0C5XsHK9F2D8Q@mail.gmail.com>
Date: Wed, 15 Feb 2012 11:43:35 +0100
From: Piergiorgio Beruto <piergiorgio.beruto@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: davem@...emloft.net, netdev@...r.kernel.org
Subject: Re: Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets
Il 15 febbraio 2012 07:25, Eric Dumazet <eric.dumazet@...il.com> ha scritto:
> Le mardi 14 février 2012 à 23:05 +0100, Piergiorgio Beruto a écrit :
>> Hello,
>> I would like to have your opinion regarding what I think could be a
>> misbehaviour of the SIOCINQ ioctl on AF_UNIX domain sockets.
>>
>> I was implementing an IPC user-space library which makes use of a
>> variety of sockets for sending data all around.
>> When dealing with datagram sockets (i.e. SOCK_DGRAM, SOCK_RDM and
>> SOCK_SEQPACKET) I used to implement the following paradigm:
>>
>> 1) issue the SIOCINQ (aka FIONREAD) ioctl and get the "size of the
>> first queued datagram"
>> 2) allocate memory of that size and fill it reading out data from the socket
>>
>> This works very fine for UDP and TIPC (including SOCK_SEQPACKET)
>> sockets but when using AF_UNIX (SEQPACKET) sockets the same ioctl
>> returns instead the number of bytes queued (sum of all pending
>> datagrams) .
>>
>> In short, there is no way (as far as I know) to work out how much
>> memory to allocate in userspace before reading out the datagram.
>> The workaround for this is to use a fixed swap buffer of the max
>> possible size of your packets (assuming you know / can retrieve it
>> somehow) and make a copy of your data (which is bad for performance
>> anyway).
>>
>
> I dont understand.... SIOCINQ returns an upper limit to you.
Yes, there's nothing that "doesn't work", it's a matter of performance
(I am working on strong embedded so I'm quite concerned about both
memory usage and "speed").
The problem is that when the socket queue is filled with short sized
packets, dequeue operation would allocate a lot of "big" chunks of
memory, progressively smaller (first one the size of the queue, second
one = queue size - first packet size and so on).
Besides the waste of memory, you get less perfromance as malloc()
would use the heap or mmap() depending on the size of the chunk
(usually 64 bytes) and the use of mmap is more memory efficient but
quite slower.
>
> So your app can malloc() a big enough buffer.
>
> Yes, SEQPACKET preserves message boundaries so your recvmsg() can fill
> part of the application buffer.
>
>> Looking at the kernel code I've found this in the
>> linux/net/unix/af_unix.c file (my comment with ////)
>>
>> case SIOCINQ:
>> {
>> struct sk_buff *skb;
>>
>> if (sk->sk_state == TCP_LISTEN) {
>> err = -EINVAL;
>> break;
>> }
>>
>> spin_lock(&sk->sk_receive_queue.lock);
>> if (sk->sk_type == SOCK_STREAM ||
>> sk->sk_type == SOCK_SEQPACKET) { //// <--------
>> treats SEQPACKET as SOCK_STREAM
>> skb_queue_walk(&sk->sk_receive_queue, skb)
>> amount += skb->len;
>> } else {
>> skb = skb_peek(&sk->sk_receive_queue);
>> if (skb)
>> amount = skb->len;
>> }
>> spin_unlock(&sk->sk_receive_queue.lock);
>> err = put_user(amount, (int __user *)arg);
>> break;
>> }
>>
>> In my opinion SOCK_SEQPACKETS are similar to SOCK_STREAM because they
>> are "connection oriented" sockets but from the I/O point of view are
>> more similar to datagram ones.
>>
>> My proposal is to remove the above condition from the test and let the
>> function process SOCK_SEQPACKETS as if they were SOCK_DGRAM for this
>> particular ioctl.
>>
>
> This behavior is undocumented in "man unix", so it is "implementation
> dependant"
>
> This change could break some applications.
Ok, that's why I asked :) But if you agree with my objection regarding
performance, what about adding a brand new (linux only) ioctl which
implements the other behaviour? So the code would look something like
this:
case SIOCINQ:
case SIOCPSZ: //// new packet size ioctl
{
struct sk_buff *skb;
if (sk->sk_state == TCP_LISTEN) {
err = -EINVAL;
break;
}
spin_lock(&sk->sk_receive_queue.lock);
if ((sk->sk_type == SOCK_STREAM ||
sk->sk_type == SOCK_SEQPACKET) &&
ioctl_code != SIOCPSZ) { //// have SIOCPSZ
behave as for datagram sockets
skb_queue_walk(&sk->sk_receive_queue, skb)
amount += skb->len;
} else {
skb = skb_peek(&sk->sk_receive_queue);
if (skb)
amount = skb->len;
}
spin_unlock(&sk->sk_receive_queue.lock);
err = put_user(amount, (int __user *)arg);
break;
}
Thank you again,
Piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists