netdev - Re: [PATCH v3 net] af_packet: Block execution of tasks waiting for transmit to complete in AF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-KEZBds_SRDFnOjqvidW30E=NG-2X=hBdcGx_--PAmjew@mail.gmail.com>
Date:   Tue, 25 Jun 2019 09:37:17 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Neil Horman <nhorman@...driver.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        Matteo Croce <mcroce@...hat.com>,
        "David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH v3 net] af_packet: Block execution of tasks waiting for
 transmit to complete in AF_PACKET

On Tue, Jun 25, 2019 at 7:03 AM Neil Horman <nhorman@...driver.com> wrote:
>
> On Mon, Jun 24, 2019 at 06:15:29PM -0400, Willem de Bruijn wrote:
> > > > > +               if (need_wait && !packet_next_frame(po, &po->tx_ring, TP_STATUS_SEND_REQUEST)) {
> > > > > +                       po->wait_on_complete = 1;
> > > > > +                       timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
> > > >
> > > > This resets timeout on every loop. should only set above the loop once.
> > > >
> > > I explained exactly why I did that in the change log.  Its because I reuse the
> > > timeout variable to get the return value of the wait_for_complete call.
> > > Otherwise I need to add additional data to the stack, which I don't want to do.
> > > Sock_sndtimeo is an inline function and really doesn't add any overhead to this
> > > path, so I see no reason not to reuse the variable.
> >
> > The issue isn't the reuse. It is that timeo is reset to sk_sndtimeo
> > each time. Whereas wait_for_common and its variants return the
> > number of jiffies left in case the loop needs to sleep again later.
> >
> > Reading sock_sndtimeo once and passing it to wait_.. repeatedly is a
> > common pattern across the stack.
> >
> But those patterns are unique to those situations.  For instance, in
> tcp_sendmsg_locked, we aquire the value of the socket timeout, and use that to
> wait for the entire message send operation to complete, which consists of
> potentially several blocking operations (waiting for the tcp connection to be
> established, waiting for socket memory, etc).  In that situation we want to wait
> for all of those operations to complete to send a single message, and fail if
> they exceed the timeout in aggregate.  The semantics are different with
> AF_PACKET.  In this use case, the message is in effect empty, and just used to
> pass some control information.  tpacket_snd, sends as many frames from the
> memory mapped buffer as possible, and on each iteration we want to wait for the
> specified timeout for those frames to complete sending.  I think resetting the
> timeout on each wait instance is the right way to go here.

I disagree. If a SO_SNDTIMEO of a given time is set, the thread should
not wait beyond that. Else what is the point of passing a specific
duration in the syscall?

Btw, we can always drop the timeo and go back to unconditional (bar
signals) waiting.

>
> > > > > @@ -2728,6 +2755,11 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> > > > >                         err = net_xmit_errno(err);
> > > > >                         if (err && __packet_get_status(po, ph) ==
> > > > >                                    TP_STATUS_AVAILABLE) {
> > > > > +                               /* re-init completion queue to avoid subsequent fallthrough
> > > > > +                                * on a future thread calling wait_on_complete_interruptible_timeout
> > > > > +                                */
> > > > > +                               po->wait_on_complete = 0;
> > > >
> > > > If setting where sleeping, no need for resetting if a failure happens
> > > > between those blocks.
> > > >
> > > > > +                               init_completion(&po->skb_completion);
> > > >
> > > > no need to reinit between each use?
> > > >
> > > I explained exactly why I did this in the comment above.  We have to set
> > > wait_for_complete prior to calling transmit, so as to ensure that we call
> > > wait_for_completion before we exit the loop. However, in this error case, we
> > > exit the loop prior to calling wait_for_complete, so we need to reset the
> > > completion variable and the wait_for_complete flag.  Otherwise we will be in a
> > > case where, on the next entrace to this loop we will have a completion variable
> > > with completion->done > 0, meaning the next wait will be a fall through case,
> > > which we don't want.
> >
> > By moving back to the point where schedule() is called, hopefully this
> > complexity automatically goes away. Same as my comment to the line
> > immediately above.
> >
> Its going to change what the complexity is, actually.  I was looking at this
> last night, and I realized that your assertion that we could remove
> packet_next_frame came at a cost.  This is because we need to determine if we
> have to wait before we call po->xmit, but we need to actually do the wait after
> po->xmit

Why? The existing method using schedule() doesn't.

Can we not just loop while sending and sleep immediately when
__packet_get_status returns !TP_STATUS_AVAILABLE?

I don't understand the need to probe the next packet to send instead
of the current.

This seems to be the crux of the disagreement. My guess is that it has
to do with setting wait_on_complete, but I don't see the problem. It
can be set immediately before going to sleep.

I don't meant to draw this out, btw, or add to your workload. If you
prefer, I can take a stab at my (admittedly hand-wavy) suggestion.

> (to ensure that wait_on_complete is set properly when the desructor is
> called).  By moving the wait to the top of the loop and getting rid of
> packet_next_frame we now have a race condition in which we might call
> tpacket_destruct_skb with wait_on_complete set to false, causing us to wait for
> the maximum timeout erroneously.  So I'm going to have to find a new way to
> signal the need to call complete, which I think will introduce a different level
> of complexity.