netdev - Re: [PATCH net-next 4/7] rds: support for zcopy completion notification

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF=yD-KO_n1y7M7oBxfrK0uXkNfxjZiwWX6i6d3r2FXRBOBmPg@mail.gmail.com>
Date:   Sun, 28 Jan 2018 19:46:07 +0100
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Sowmini Varadhan <sowmini.varadhan@...cle.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>, rds-devel@....oracle.com,
        santosh.shilimkar@...cle.com
Subject: Re: [PATCH net-next 4/7] rds: support for zcopy completion notification

On Sun, Jan 28, 2018 at 5:15 PM, Sowmini Varadhan
<sowmini.varadhan@...cle.com> wrote:
>
> thanks for taking the time to go through the code!
>
>>
>> An alternative that does not require a timer is to batch on the sk
>> error queue itself, like tcp zerocopy. That queues the first notification
>> skb on the error queue without any notification latency.
>>
>> Then, if a subsequent notification comes in while another is pending
>> with < MAX zcookies, it coalesces the new notification onto the pending
>> skb and consumes the other. For RDS notifications, the implementation
>> is an extra skb_put + uint32_t assignment.
>
> This is an interesting idea, let me give it a try.
>
>> Optionally, the socket can trigger another sk_error_report on each
>> new notification.
>
> I was trying to avoid that- an upcall for each message is not a good
> idea when you have high traffic and are able to free multiple rds_messages
> at a  time.

Agreed. It was only a suggestion if that would be a reason
for you to not try the above idea.

>> > +static void rds_rm_zerocopy_callback(struct rds_sock *rs,
>> > +                                    struct rds_znotifier *znotifier,
>> > +                                    bool force)
>> > +{
>> > +       struct sock *sk = rds_rs_to_sk(rs);
>> > +       struct sk_buff *skb;
>> > +       struct sock_exterr_skb *serr;
>> > +       unsigned long flags;
>> > +       u32 *ptr;
>> > +       int ncookies = 0, i;
>> > +       struct rds_znotifier *znotif, *ztmp, *first;
>> > +       LIST_HEAD(tmp_list);
>> > +
>> > +       spin_lock_irqsave(&rs->rs_lock, flags);
>> > +       ncookies = rs->rs_ncookies;
>> > +       if (ncookies < SO_EE_ORIGIN_MAX_ZCOOKIES && !force) {
>> > +               if (znotifier) { /* add this cookie to the list and return */
>>
>> can be checked before taking lock.
>>
>> More importantly, when is this ever NULL?
>
> It is null when invoked from tthe timer callback (rs_zcopy_notify()
> going off because havent had any traffic for the expiration interval
> so we want to send out pending notifications, but dont have any znotifier
> in this case). But you are right in that:
>
>> This function is a callback
>> for a zerocopy struct of type znotifier. Is it doing double duty to flush
>> any outstanding if znotifier == NULL && force == true? If so, the first
>> condition probably never occurs unless force == true and thus the
>> second is redundant.
>
> yes, force can simply be !znotifier.
>
> I dont quite follow the "can be checked before taking the lock
> comment though"- the lock is needed to make sure we atomically do
> the lists "add new entry and potentially flush" operation.
> the check is for the "potentially" part of that operation, so
> I'm not seeing how it would help to move it out of the lock.
>
> having said  all that, I like the earlier suggestion of just
> batching on the error_queue itself. If that works out without any
> issues, all of this stuff may not be needed, so let me give that
> a shot first.

Sounds great!

>
>> This adds unnecessary notification latency to delivery of current
>> notification. The latest notification can be appended to tmp_list and
>> sent up immediately.
>
> true it if fits within the MAX cookies limit.
>
>> > +               (void)mod_timer(&rs->rs_cookie_timer, RDS_REAP_TIMEOUT);
>>
>> Resetting timeout on each queued notification causes unbound
>> notification latency for previous notifications on the queue.
>
> I'm not sure I get that comment. RDS_REAP_TIMEOUT is the upper bound
> for sending notification. If, in the interim, we get back a TCP
> ack that lets us reap a bunch of messages, we'd go and flush the
> queue anyway, so the RDS_REAP_TIMEOUT will not matter. Can you
> elaborate on your concern?

I meant that if packet rate is low enough to require the timer
to fire, then by resetting the timer on each notification, the maximum
timeout for the first enqueued notification is the max cookie limit
* timeout, as opposed to timeout if set once and never modified.

Some physical device drivers do the same for their interrupt
moderation. You really want the configured timeout to be the
upper bound.