linux-kernel - Re: [PATCH v2] kcm: remove any offset before parsing messages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190220041151.GA13520@nautica>
Date:   Wed, 20 Feb 2019 05:11:52 +0100
From:   Dominique Martinet <asmadeus@...ewreck.org>
To:     Tom Herbert <tom@...ntonium.net>
Cc:     Tom Herbert <tom@...bertland.com>,
        David Miller <davem@...emloft.net>,
        Doron Roberts-Kedes <doronrk@...com>,
        Dave Watson <davejwatson@...com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

Dominique Martinet wrote on Fri, Feb 15, 2019:
> With all that said I guess my patch should work correctly then, I'll try
> to find some time to check the error does come back up the tcp socket in
> my reproducer but I have no reason to believe it doesn't.

Ok, so I can confirm this part - the 'csock' does come back up with
POLLERR if the parse function returns ENOMEM in the current code.

It also comes back up with POLLERR when the remote side closes the
connection, which is expected, but I'm having a very hard time
understanding how an application is supposed to deal with these
POLLERR after having read the documentation and a bit of
experimentation.
I'm not sure how much it would matter for real life (if the other end
closes the connection most servers would not care about what they said
just before closing, but I can imagine some clients doing that in real
life e.g. a POST request they don't care if it succeeds or not)...
My test program works like this:
 - client connects, sends 10k messages and close()s the socket
 - server loops recving and close()s after 10k messages; it used to be
recvmsg() directly but it's now using poll then recvmsg.

When the client closes the socket, some messages are obviously still "in
flight", and the server will recv a POLLERR notification on the csock at
some point with many messages left.
The documentation says to unattach the csock when you get POLLER. If I
do that, the kcm socket will no longer give me any message, so all the
messages still in flight at the time are lost.

If I just ignore the csock like I used to, all the messages do come just
fine, but as said previously on a real error this will just make recvmsg
or the polling hang forever and I see no way to distinguish a "real"
error vs. a connection shut down from the remote side with data left in
the pipe.
I thought of checking POLLERR on csock and POLLIN not set on kcmsock,
but even that seems to happen fairly regularily - the kcm sock hasn't
been filled up, it's still reading from the csock.

On the other hand, checking POLLIN on the csock does say there is still
data left, so I know there is data left on the csock, but this is also
the case on a real error (e.g. if parser returns -ENOMEM)
... And this made me try to read from the csock after detaching it and I
can resume manual tcp parsing for a few messages until read() fails with
EPROTO ?! and I cannot seem to be able to get anything out of attaching
it back to kcm (for e.g. an ENOMEM error that was transient)...

I'm honestly not sure how the POLLERR notification mechanism works but I
think it would be much easier to use KCM if we could somehow delay that
error until KCM is done feeding from the csock (when netparser really
stops reading from it like on real error, e.g. abort callback maybe?)
I think it's fine if the csock is closed before the kcm sock message is
read, but we should not lose messages like this.

> I'd like to see some retry on ENOMEM before this is merged though, so
> while I'm there I'll resend this with a second patch doing that
> retry,.. I think just not setting strp->interrupted and not reporting
> the error up might be enough? Will have to try either way.

I also tried playing with that without much success.
I had assumed just not calling strp_parser_err() (which calls the
abort_parser cb) would be enough, eventually calling strp_start_timer()
like the !len case, but no can do.
With that said, returning 0 from the parse function also raises POLLERR
on the csock and hangs netparser, so things aren't that simple...

I could use a bit of help again.

Thanks,
-- 
Dominique