netdev - Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0aabb09c-4f53-c581-1996-153072779108@gmail.com>
Date:   Thu, 18 Mar 2021 13:00:55 +0000
From:   Pavel Begunkov <asml.silence@...il.com>
To:     Stefan Metzmacher <metze@...ba.org>, Jens Axboe <axboe@...nel.dk>,
        io-uring@...r.kernel.org
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements

On 18/03/2021 00:15, Stefan Metzmacher wrote:
> Hi Pavel,
> 
>>>>>> here're patches which fix linking of send[msg]()/recv[msg]() calls
>>>>>> and make sure io_uring_enter() never generate a SIGPIPE.
>>>>
>>>> 1/2 breaks userspace.
>>>
>>> Can you explain that a bit please, how could some application ever
>>> have a useful use of IOSQE_IO_LINK with these socket calls?
>>
>> Packet delivery of variable size, i.e. recv(max_size). Byte stream
>> that consumes whatever you've got and links something (e.g. notification
>> delivery, or poll). Not sure about netlink, but maybe. Or some
>> "create a file via send" crap, or some made-up custom protocols
> 
> Ok, then we need a flag or a new opcode to provide that behavior?
> 
> For recv() and recvmsg() MSG_WAITALL might be usable.

Hmm, unrelated, but there is a good chance MSG_WAITALL with io_uring
is broken because of our first MSG_DONTWAIT attempt. 

> It's not defined in 'man 2 sendmsg', but should we use it anyway
> for IORING_OP_SEND[MSG] in order to activate the short send check
> as the low level sock_sendmsg() call seem to ignore unused flags,
> which seems to be the reason for the following logic in tcp_sendmsg_locked:
> 
> if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {

Yep, it maintains compatibility because of unchecked unsupported flags.
Alleviating an old design problem, IIRC.

> 
> You need to set SOCK_ZEROCOPY in the socket in order to give a meaning
> to MSG_ZEROCOPY.
> 
> Should I prepare an add-on patch to make the short send/recv logic depend
> on MSG_WAITALL?

IMHO, conceptually it would make much more sense with MSG_WAITALL.

> 
> I'm cc'ing netdev@...r.kernel.org in order to more feedback of
> MSG_WAITALL can be passed to sendmsg without fear to trigger
> -EINVAL.
> 
> The example for io_sendmsg() would look like this:
> 
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -4383,7 +4383,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>         struct io_async_msghdr iomsg, *kmsg;
>         struct socket *sock;
>         unsigned flags;
> -       int expected_ret;
> +       int min_ret = 0;
>         int ret;
> 
>         sock = sock_from_file(req->file);
> @@ -4404,9 +4404,11 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>         else if (issue_flags & IO_URING_F_NONBLOCK)
>                 flags |= MSG_DONTWAIT;
> 
> -       expected_ret = iov_iter_count(&kmsg->msg.msg_iter);
> -       if (unlikely(expected_ret == MAX_RW_COUNT))
> -               expected_ret += 1;
> +       if (flags & MSG_WAITALL) {
> +               min_ret = iov_iter_count(&kmsg->msg.msg_iter);
> +               if (unlikely(min_ret == MAX_RW_COUNT))
> +                       min_ret += 1;
> +       }
>         ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
>         if ((issue_flags & IO_URING_F_NONBLOCK) && ret == -EAGAIN)
>                 return io_setup_async_msg(req, kmsg);
> @@ -4417,7 +4419,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>         if (kmsg->free_iov)
>                 kfree(kmsg->free_iov);
>         req->flags &= ~REQ_F_NEED_CLEANUP;
> -       if (ret != expected_ret)
> +       if (ret < min_ret)
>                 req_set_fail_links(req);
>         __io_req_complete(req, issue_flags, ret, 0);
>         return 0;
> 
> Which means the default of min_ret = 0 would result in:
> 
>         if (ret < 0)
>                 req_set_fail_links(req);
> 
> again...
> 
>>>> Sounds like 2/2 might too, does it?
>>>
>>> Do you think any application really expects to get a SIGPIPE
>>> when calling io_uring_enter()?
>>
>> If it was about what I think I would remove lots of old garbage :)
>> I doubt it wasn't working well before, e.g. because of iowq, but
>> who knows
> 
> Yes, it was inconsistent before and now it's reliable.

-- 
Pavel Begunkov