netdev - Re: [PATCH v4 net] af_packet: Block execution of tasks waiting for transmit to complete in AF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190626105403.GA31355@hmswarspite.think-freely.org>
Date:   Wed, 26 Jun 2019 06:54:03 -0400
From:   Neil Horman <nhorman@...driver.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        Matteo Croce <mcroce@...hat.com>,
        "David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH v4 net] af_packet: Block execution of tasks waiting for
 transmit to complete in AF_PACKET

On Tue, Jun 25, 2019 at 06:30:08PM -0400, Willem de Bruijn wrote:
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index a29d66da7394..a7ca6a003ebe 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -2401,6 +2401,9 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
> >
> >                 ts = __packet_set_timestamp(po, ph, skb);
> >                 __packet_set_status(po, ph, TP_STATUS_AVAILABLE | ts);
> > +
> > +               if (!packet_read_pending(&po->tx_ring))
> > +                       complete(&po->skb_completion);
> >         }
> >
> >         sock_wfree(skb);
> > @@ -2585,7 +2588,7 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
> >
> >  static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> >  {
> > -       struct sk_buff *skb;
> > +       struct sk_buff *skb = NULL;
> >         struct net_device *dev;
> >         struct virtio_net_hdr *vnet_hdr = NULL;
> >         struct sockcm_cookie sockc;
> > @@ -2600,6 +2603,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> >         int len_sum = 0;
> >         int status = TP_STATUS_AVAILABLE;
> >         int hlen, tlen, copylen = 0;
> > +       long timeo = 0;
> >
> >         mutex_lock(&po->pg_vec_lock);
> >
> > @@ -2646,12 +2650,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> >         if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !po->has_vnet_hdr)
> >                 size_max = dev->mtu + reserve + VLAN_HLEN;
> >
> > +       reinit_completion(&po->skb_completion);
> > +
> >         do {
> >                 ph = packet_current_frame(po, &po->tx_ring,
> >                                           TP_STATUS_SEND_REQUEST);
> >                 if (unlikely(ph == NULL)) {
> > -                       if (need_wait && need_resched())
> > -                               schedule();
> > +                       if (need_wait && skb) {
> > +                               timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
> > +                               timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
> 
> This looks really nice.
> 
> But isn't it still susceptible to the race where tpacket_destruct_skb
> is called in between po->xmit and this
> wait_for_completion_interruptible_timeout?
> 
Thats not an issue, since the complete is only gated on packet_read_pending
reaching 0 in tpacket_destuct_skb.  Previously it was gated on my
wait_on_complete flag being non-zero, so we had to set that prior to calling
po->xmit, or the complete call might never get made, resulting in a hang.  Now,
we will always call complete, and the completion api allows for arbitrary
ordering of complete/wait_for_complete (since its internal done variable gets
incremented), making a call to wait_for_complete effectively a fall through if
complete gets called first.

There is an odd path here though.  If an application calls sendmsg on a packet
socket here with MSG_DONTWAIT set, then need_wait will be zero, and we will
eventually exit this loop without ever having called wait_for_complete, but
tpacket_destruct_skb will still have called complete when all the frames
complete transmission.  In and of itself, thats fine, but it leave the
completion structure in a state where its done variable will have been
incremented at least once (specifically it will be set to N, where N is the
number of frames transmitted during the call where MSG_DONTWAIT is set).  If the
application then calls sendmsg on this socket with MSG_DONTWAIT clear, we will
call wait_for_complete, but immediately return from it (due to the previously
made calls to complete).  I've corrected this however, but adding that call to
reinit_completion prior to the loop entry, so that we are always guaranteed to
have the completion variable set properly to wait for only the frames being sent
in this particular instance of the sendmsg call.

> The test for skb is shorthand for packet_read_pending  != 0, right?
> 
Sort of.  gating on skb guarantees for us that we have sent at least one frame
in this call to tpacket_snd.  If we didn't do that, then it would be possible
for an application to call sendmsg without setting any frames in the buffer to
TP_STATUS_SEND_REQUEST, which would cause us to wait for a completion without
having sent any frames, meaning we would block waiting for an event
(tpacket_destruct_skb), that will never happen.  The check for skb ensures that
tpacket_snd_skb will get called, and that we will get a wakeup from a call to
wait_for_complete.  It does suggest that packet_read_pending != 0, but thats not
guaranteed, because tpacket_destruct_skb may already have been called (see the
above explination regarding ordering of complete/wait_for_complete).

Neil