netdev - Re: [PATCH RFC net-next 1/6] sock: MSG_PEEK support for sk_error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF=yD-K4auN9L=ijJpq+72XoUsmWiwiz2zCxkE7_7EJPBP=mjg@mail.gmail.com>
Date:   Thu, 18 Jan 2018 18:09:10 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Sowmini Varadhan <sowmini.varadhan@...cle.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        Network Development <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>, rds-devel@....oracle.com,
        santosh.shilimkar@...cle.com
Subject: Re: [PATCH RFC net-next 1/6] sock: MSG_PEEK support for sk_error_queue

On Thu, Jan 18, 2018 at 6:03 PM, Sowmini Varadhan
<sowmini.varadhan@...cle.com> wrote:
> On (01/18/18 17:54), Willem de Bruijn wrote:
>> > 2. If we have the option of passing completion-notification up as ancillary
>> >    data on the pollin/recvmsg channel itself (instead of MSG_ERRQUEUE)
>>
>> This assumes a somewhat symmetric workload, where there are enough recv
>> calls to reap the notification associated with the send calls.
>
> Your comment about the assumption is true, but at least for the database
> use-cases, we have a request-response model, so the assumption works out..
> I dont know if many other workloads that send large buffers have this
> pattern.

If that is true in general for PF_RDS, then it is a reasonable approach.
How about treating it as a (follow-on) optimization path. Opportunistic
piggybacking of notifications on data reads is more widely applicable.

>
>> I would stay with MSG_ERRQUEUE processing. One option is to pass data
>> up to userspace in the data portion of the notification skb instead of
>> encoding it in ancillary data, like tcp_get_timestamping_opt_stats.
>
> that's similar to what I have, except that it does not have the
> MSG_PEEK part (you'd need to enforce that the data portion
> is upper-bounded, and that the application has the responsibility
> of sending down "enough" buffer with recvmsg).

Right. I think that an upper bound is the simplest solution here.

By the way, if you allocate an skb immediately on page pinning, then
there are always sufficient skbs to store all notifications. On errqueue
enqueue just drop the new skb and copy its notification to the body of
the skb already on the queue, if one exists and it has room. That is
essentially what the tcp zerocopy code does with the [data, info] range.

> Note that any one of these choices are ok with me- I have no
> special attachments to any of them.