linux-kernel - Re: [PATCH 1/1] eventfd new tag EFD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cd20672aaf13f939b4f798d0839d2438@suse.de>
Date:   Fri, 31 May 2019 13:48:39 +0200
From:   Roman Penyaev <rpenyaev@...e.de>
To:     Renzo Davoli <renzo@...unibo.it>
Cc:     Greg KH <gregkh@...uxfoundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Davide Libenzi <davidel@...ilserver.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-api@...r.kernel.org, linux-kernel-owner@...r.kernel.org
Subject: Re: [PATCH 1/1] eventfd new tag EFD_VPOLL: generate epoll events

On 2019-05-31 12:45, Renzo Davoli wrote:
> HI Roman,
> 
> On Fri, May 31, 2019 at 11:34:08AM +0200, Roman Penyaev wrote:
>> On 2019-05-27 15:36, Renzo Davoli wrote:
>> > Unfortunately this approach cannot be applied to
>> > poll/select/ppoll/pselect/epoll.
>> 
>> If you have to override other systemcalls, what is the problem to 
>> override
>> poll family?  It will add, let's say, 50 extra code lines complexity 
>> to your
>> userspace code.  All you need is to be woken up by *any* event and 
>> check
>> one mask variable, in order to understand what you need to do: read or
>> write,
>> basically exactly what you do in your eventfd modification, but only 
>> in
>> userspace.
> 
> This approach would not scale. If I want to use both a (user-space)
> network stack
> and a (emulated) device (or more stacks and devices) which
> (overridden) poll would I use?
> 
> The poll of the first stack is not able to to deal with the third 
> device.

Since each such a stack has a set of read/write/etc functions you always
can extend you stack with another call which returns you event mask,
specifying what exactly you have to do, e.g.:

     nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
     for (n = 0; n < nfds; ++n) {
          struct sock *sock;

          sock = events[n].data.ptr;
          events = sock->get_events(sock, &events[n]);

          if (events & EPOLLIN)
              sock->read(sock);
          if (events & EPOLLOUT)
              sock->write(sock);
     }


With such a virtual table you can mix all userspace stacks and even
with normal sockets, for which 'get_events' function can be declared as

static poll_t kernel_sock_get_events(struct sock *sock, struct 
epoll_event *ev)
{
     return ev->events;
}

Do I miss something?


>> > > Why can it not be less than 64?
>> > This is the imeplementation of 'write'. The 64 bits include the
>> > 'command'
>> > EFD_VPOLL_ADDEVENTS, EFD_VPOLL_DELEVENTS or EFD_VPOLL_MODEVENTS (in the
>> > most
>> > significant 32 bits) and the set of events (in the lowest 32 bits).
>> 
>> Do you really need add/del/mod semantics?  Userspace still has to keep 
>> mask
>> somewhere, so you can have one simple command, which does:
>>    ctx->count = events;
>> in kernel, so no masks and this games with bits are needed.  That will
>> simplify API.
> 
> It is true, at the price to have more complex code in user space.
> Other system calls could have beeen implemented as "set the value",
> instead there are
> ADD/DEL modification flags.
> I mean for example sigprocmask (SIG_BLOCK, SIG_UNBLOCK, SIG_SETMASK),
> or even epoll_ctl.
> While poll requires the program to keep the struct pollfd array stored
> somewhere,
> epoll is more powerful and flexible as different file descriptors can 
> be added
> and deleted by different modules/components.
> 
> If I have two threads implementing the send and receive path of a
> socket in a user-space

Eventually you come up with such a lock to protect your tcp or whatever
state machine.  Or you have a real example where read and write paths
can work completely independently?

--
Roman