linux-kernel - Re: Why add the general notification queue and its sources

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <930B6F39-4174-46C2-B556-E98ED72E27F8@amacapital.net>
Date:   Fri, 6 Sep 2019 10:14:17 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Steven Whitehouse <swhiteho@...hat.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        David Howells <dhowells@...hat.com>,
        Ray Strode <rstrode@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Nicolas Dichtel <nicolas.dichtel@...nd.com>, raven@...maw.net,
        keyrings@...r.kernel.org, linux-usb@...r.kernel.org,
        linux-block <linux-block@...r.kernel.org>,
        Christian Brauner <christian@...uner.io>,
        LSM List <linux-security-module@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        "Ray, Debarshi" <debarshi.ray@...il.com>,
        Robbie Harwood <rharwood@...hat.com>
Subject: Re: Why add the general notification queue and its sources



> On Sep 6, 2019, at 9:12 AM, Steven Whitehouse <swhiteho@...hat.com> wrote:
> 
> Hi,
> 
>> On 06/09/2019 16:53, Linus Torvalds wrote:
>> On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds
>> <torvalds@...ux-foundation.org> wrote:
>>> This is why I like pipes. You can use them today. They are simple, and
>>> extensible, and you don't need to come up with a new subsystem and
>>> some untested ad-hoc thing that nobody has actually used.
>> The only _real_ complexity is to make sure that events are reliably parseable.
>> 
>> That's where you really want to use the Linux-only "packet pipe"
>> thing, becasue otherwise you have to have size markers or other things
>> to delineate events. But if you do that, then it really becomes
>> trivial.
>> 
>> And I checked, we made it available to user space, even if the
>> original reason for that code was kernel-only autofs use: you just
>> need to make the pipe be O_DIRECT.
>> 
>> This overly stupid program shows off the feature:
>> 
>>         #define _GNU_SOURCE
>>         #include <fcntl.h>
>>         #include <unistd.h>
>> 
>>         int main(int argc, char **argv)
>>         {
>>                 int fd[2];
>>                 char buf[10];
>> 
>>                 pipe2(fd, O_DIRECT | O_NONBLOCK);
>>                 write(fd[1], "hello", 5);
>>                 write(fd[1], "hi", 2);
>>                 read(fd[0], buf, sizeof(buf));
>>                 read(fd[0], buf, sizeof(buf));
>>                 return 0;
>>         }
>> 
>> and it you strace it (because I was too lazy to add error handling or
>> printing of results), you'll see
>> 
>>     write(4, "hello", 5)                    = 5
>>     write(4, "hi", 2)                       = 2
>>     read(3, "hello", 10)                    = 5
>>     read(3, "hi", 10)                       = 2
>> 
>> note how you got packets of data on the reader side, instead of
>> getting the traditional "just buffer it as a stream".
>> 
>> So now you can even have multiple readers of the same event pipe, and
>> packetization is obvious and trivial. Of course, I'm not sure why
>> you'd want to have multiple readers, and you'd lose _ordering_, but if
>> all events are independent, this _might_ be a useful thing in a
>> threaded environment. Maybe.
>> 
>> (Side note: a zero-sized write will not cause a zero-sized packet. It
>> will just be dropped).
>> 
>>                Linus
> 
> The events are generally not independent - we would need ordering either implicit in the protocol or explicit in the messages. We also need to know in case messages are dropped too - doesn't need to be anything fancy, just some idea that since we last did a read, there are messages that got lost, most likely due to buffer overrun.

This could be a bit fancier: if the pipe recorded the bitwise or of the first few bytes of dropped message, then the messages could set a bit in the header indicating the type, and readers could then learn which *types* of messages were dropped.

Or they could just use multiple pipes.

If this whole mechanism catches on, I wonder if implementing recvmmsg() on pipes would be worthwhile.