netdev - [PATCH net-next v2 0/2] skbuff: Fix applications not being woken for errors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20180314215437.14726-1-vinicius.gomes@intel.com>
Date:   Wed, 14 Mar 2018 14:54:35 -0700
From:   Vinicius Costa Gomes <vinicius.gomes@...el.com>
To:     netdev@...r.kernel.org
Cc:     Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        randy.e.witt@...el.com, davem@...emloft.net, eric.dumazet@...il.com
Subject: [PATCH net-next v2 0/2] skbuff: Fix applications not being woken for errors

Hi,

Changes from v1:
 - Fixed comments from Willem de Bruijn, about the order of the
 options passed to getopt();
 - Added Reviewed-by and Fixes tags to patch (2);

Changes from the RFC:
 - tweaked commit messages;

Original cover letter:

This is actually a "bug report"-RFC instead of the more usual "new
feature"-RFC.

We are developing an application that uses TX hardware timestamping to
make some measurements, and during development Randy Witt initially
reported that the application poll() never unblocked when TX hardware
timestamping was enabled.

After some investigation, it turned out the problem wasn't only
exclusive to hardware timestamping, and could be reproduced with
software timestamping.

Applying patch (1), and running txtimestamp like this, for example:

$ ./txtimestamp -u -4 192.168.1.71 -c 1000 -D -l 1000 -F

('-u' to use UDP only, '-4' for ipv4 only, '-c 1000' to send 1000
packets for each test, '-D' to remove the delay between packets, '-l
1000' to set the payload to 1000 bytes, '-F' for configuring poll() to
wait forever)

will cause the application to become stuck in the poll() call in most
of the times. (Note: I couldn't reproduce the issue running against an
address that is routed through loopback.)

Another interesting fact is that if the POLLIN event is added to the
poll() .events, poll() no longer becomes stuck, and more interestingly
the returned event in .revents is only POLLERR.

After a few debugging sessions, we got to 'sock_queue_err_skb()' and
how it notifies applications of the error just enqueued. Changing it
to use 'sk->sk_error_report()', fixes the issue for hardware and
software timestamping. That is patch (2).

The "solution" proposed in patch (2) looks like too big a hammer, if
it's not, then it seems that this problem existed since a long time
ago (pre git) and was uncommon for folks to reach the necessary
conditions to trigger it (my hypothesis is that only triggers when the
error is reported from a different task context than the application).

Am I missing something here?

Cheers,
--

Vinicius Costa Gomes (2):
  selftests/txtimestamp: Add more configurable parameters
  skbuff: Fix not waking applications when errors are enqueued

 net/core/skbuff.c                                   |  2 +-
 .../selftests/networking/timestamping/txtimestamp.c | 21 ++++++++++++++++++---
 2 files changed, 19 insertions(+), 4 deletions(-)

--
2.16.2