netdev - Re: [PATCH net-next 0/2] skbuff: Fix applications not being woken for errors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 14 Mar 2018 17:39:58 +0000
From:   Soheil Hassas Yeganeh <soheil.kdev@...il.com>
To:     vinicius.gomes@...el.com
Cc:     netdev <netdev@...r.kernel.org>, randy.e.witt@...el.com,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH net-next 0/2] skbuff: Fix applications not being woken for errors

On Wed, Mar 14, 2018 at 12:32 PM Willem de Bruijn <
willemdebruijn.kernel@...il.com> wrote:

> On Tue, Mar 13, 2018 at 4:35 PM, Vinicius Costa Gomes
> <vinicius.gomes@...el.com> wrote:
> > Hi,
> >
> > Changes from the RFC:
> >  - tweaked commit messages;
> >
> > Original cover letter:
> >
> > This is actually a "bug report"-RFC instead of the more usual "new
> > feature"-RFC.
> >
> > We are developing an application that uses TX hardware timestamping to
> > make some measurements, and during development Randy Witt initially
> > reported that the application poll() never unblocked when TX hardware
> > timestamping was enabled.
> >
> > After some investigation, it turned out the problem wasn't only
> > exclusive to hardware timestamping, and could be reproduced with
> > software timestamping.
> >
> > Applying patch (1), and running txtimestamp like this, for example:
> >
> > $ ./txtimestamp -u -4 192.168.1.71 -c 1000 -D -l 1000 -F
> >
> > ('-u' to use UDP only, '-4' for ipv4 only, '-c 1000' to send 1000
> > packets for each test, '-D' to remove the delay between packets, '-l
> > 1000' to set the payload to 1000 bytes, '-F' for configuring poll() to
> > wait forever)
> >
> > will cause the application to become stuck in the poll() call in most
> > of the times. (Note: I couldn't reproduce the issue running against an
> > address that is routed through loopback.)
> >
> > Another interesting fact is that if the POLLIN event is added to the
> > poll() .events, poll() no longer becomes stuck,

> The process has registered interest only in POLLIN, which the call to
> sk_data_read (sock_def_readable) will trigger.

> > and more interestingly
> > the returned event in .revents is only POLLERR.

> datagram_poll will set (E)POLLERR based on non-empty sk_error_queue.

> > After a few debugging sessions, we got to 'sock_queue_err_skb()' and
> > how it notifies applications of the error just enqueued. Changing it
> > to use 'sk->sk_error_report()', fixes the issue for hardware and
> > software timestamping. That is patch (2).
> >
> > The "solution" proposed in patch (2) looks like too big a hammer,

> It looks fine to me. POLLERR is returned regardless of the mask a
> process sets up in pollfd.events. So waking with sk_error_report
> will fix this while still waking callers waiting on POLLIN.

> Note that on sock_dequeue_err_skb, if another notification (of the
> right kind) is waiting, sk_error_report is already called instead of
> sk_data_ready.

Thank you for the fix. It looks fine to me too, because we already only arm
POLLERR for sock_dequeue_err_skb, and POLLERR is always returned when error
queue is not empty:
   if (...skb_queue_empty(&sk->sk_error_queue))
      mask |= POLLERR ...

> This should perhaps go to net, instead of net-next (though not the test).

+1 I think the fix should go to net.

> If resending, a small nit in the test: please keep the alphabetical
> order in getopt. The filepath also looks a bit fishy, but git am applied
> the mbox from patchwork without issue.