netdev - Re: How do I avoid recvmsg races with IP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1433281373.2485127.285160441.6F17A081@webmail.messagingengine.com>
Date:	Tue, 02 Jun 2015 23:42:53 +0200
From:	Hannes Frederic Sowa <hannes@...essinduktion.org>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Andy Lutomirski <luto@...nel.org>,
	Network Development <netdev@...r.kernel.org>
Subject: Re: How do I avoid recvmsg races with IP_RECVERR?

On Tue, Jun 2, 2015, at 23:33, Andy Lutomirski wrote:
> On Tue, Jun 2, 2015 at 2:17 PM, Hannes Frederic Sowa
> <hannes@...essinduktion.org> wrote:
> > On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote:
> >> As far as I can tell, enabling IP_RECVERR causes the presence of a
> >> queued error to cause recvmsg, etc to return an error (once).  It's
> >> worse, though: a new error can be queued asynchronously at any time,
> >> this setting sk_err to a nonzero value.  How do I sensibly distinguish
> >> recvmsg failures to to genuine errors receiving messages from recvmsg
> >> failures because there's a queued error?
> >>
> >> The only way I can see to get reliable error handling is to literally
> >> call recvmsg in a loop:
> >>
> >> while (true /* or while POLLIN is set */) {
> >>   int ret = recvmsg(..., MSG_ERRQUEUE not set);
> >>   if (ret < 0 && /* what goes here? */) {
> >>     whoops!  this might be a harmless asynchronous error!
> >>     take no action!
> >>   }
> >
> > I see either two possibilities:
> >
> > We export the icmp_err_convert tables along with the udp_lib_err error
> > conversions to user space and spice them up with flags to mark if they
> > are transient (icmp_err_convert already has a fatal flag).
> 
> This seems overcomplicated.  I'd rather have a flag I pass to tell the
> kernel that I don't want to see transient errors (nd that I'll clear
> them myself using POLLERR and either MSG_ERRQUEUE or SO_ERROR.
> 
> >
> > Otherwise you should be able to call recvmsg with MSG_ERRQUEUE set after
> > you got a ret < 0 when calling without MSG_ERRQUEUE and inspect the
> > sock_extended_err, no?
> 
> I do this already, which makes me think that there's a bug or another
> race somewhere.  I've only seen a failure once in several years of
> operation.
> 
> The failure happened on a ping socket.  I suspect that the race is:
> 
> ping_err: ip_icmp_error(...);
> 
> user: recvmsg(MSG_ERRQUEUE) and dequeues the error.
> 
> ping_err: sk_err = err;
> 
> user: recvmsg(MSG_ERRQUEUE not set), and recvmsg sees and clears the
> error via sock_error.
> 
> user: recvmsg(MSG_ERRQUEUE), and recvmsg returns -EAGAIN.
> 
> Now the user code thinks that it was a real (non-transient) error and
> aborts.
> 
> Shouldn't that sk->sk_err = err assignment at least use WRITE_ONCE?

Hmm, I don't think this will help.

> Even if this race were fixed, this interface still sucks IMO.

Yes. :/

My proposal would be to make the error conversion lazy:

Keeping duplicate data is not a good idea in general: So we shouldn't
use sk->sk_err if IP_RECVERR is set at all but let sock_error just use
the sk_error_queue and extract the error code from there.

Only if IP_RECVERR was not set, we use sk->sk_err logic.

What do you think?

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html