linux-kernel - [take23 1/5] kevent: Description.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1162918248891@2ka.mipt.ru>
Date:	Tue, 7 Nov 2006 19:50:48 +0300
From:	Evgeniy Polyakov <johnpol@....mipt.ru>
To:	Evgeniy Polyakov <johnpol@....mipt.ru>
Cc:	David Miller <davem@...emloft.net>,
	Ulrich Drepper <drepper@...hat.com>,
	Andrew Morton <akpm@...l.org>,
	Evgeniy Polyakov <johnpol@....mipt.ru>,
	netdev <netdev@...r.kernel.org>,
	Zach Brown <zach.brown@...cle.com>,
	Christoph Hellwig <hch@...radead.org>,
	Chase Venters <chase.venters@...entec.com>,
	Johann Borck <johann.borck@...sedata.com>,
	linux-kernel@...r.kernel.org, Jeff Garzik <jeff@...zik.org>
Subject: [take23 1/5] kevent: Description.


Description.

int kevent_ctl(int fd, unsigned int cmd, unsigned int num, struct ukevent *arg);

fd - is the file descriptor referring to the kevent queue to manipulate. 
It is created by opening "/dev/kevent" char device, which is created with dynamic 
minor number and major number assigned for misc devices. 

cmd - is the requested operation. It can be one of the following:
    KEVENT_CTL_ADD - add event notification 
    KEVENT_CTL_REMOVE - remove event notification 
    KEVENT_CTL_MODIFY - modify existing notification 

num - number of struct ukevent in the array pointed to by arg 
arg - array of struct ukevent

When called, kevent_ctl will carry out the operation specified in the cmd parameter.
-------------------------------------------------------------------------------------

 int kevent_get_events(int ctl_fd, unsigned int min_nr, unsigned int max_nr, __u64 timeout, struct ukevent *buf, unsigned flags)

ctl_fd - file descriptor referring to the kevent queue 
min_nr - minimum number of completed events that kevent_get_events will block waiting for 
max_nr - number of struct ukevent in buf 
timeout - number of nanoseconds to wait before returning less than min_nr events. 
	If this is -1, then wait forever. 
buf - pointer to an array of struct ukevent. 
flags - unused 

kevent_get_events will wait timeout milliseconds for at least min_nr completed events, 
copying completed struct ukevents to buf and deleting any KEVENT_REQ_ONESHOT event requests. 
In nonblocking mode it returns as many events as possible, but not more than max_nr. 
In blocking mode it waits until timeout or if at least min_nr events are ready.
-------------------------------------------------------------------------------------

 int kevent_wait(int ctl_fd, unsigned int num, __u64 timeout)

ctl_fd - file descriptor referring to the kevent queue 
num - number of processed kevents 
timeout - this timeout specifies number of nanoseconds to wait until there is free space in kevent queue 

This syscall waits until either timeout expires or at least one event becomes ready. 
It also copies that num events into special ring buffer and requeues them (or removes depending on flags). 
-------------------------------------------------------------------------------------

 int kevent_ring_init(int ctl_fd, struct kevent_ring *ring, unsigned int num)

ctl_fd - file descriptor referring to the kevent queue 
num - size of the ring buffer in events 

 struct kevent_ring
 {
   unsigned int ring_kidx;
   struct ukevent event[0];
 }

ring_kidx - is an index in the ring buffer where kernel will put new events when 
  kevent_wait() or kevent_get_events() is called 

Example userspace code (ring_buffer.c) can be found on project's homepage.

Each kevent syscall can be so called cancellation point in glibc, i.e. when thread has 
been cancelled in kevent syscall, thread can be safely removed and no events will be lost, 
since each syscall (kevent_wait() or kevent_get_events()) will copy event into special ring buffer, 
accessible from other threads or even processes (if shared memory is used).

When kevent is removed (not dequeued when it is ready, but just removed), even if it was ready, 
it is not copied into ring buffer, since if it is removed, no one cares about it (otherwise user 
would wait until it becomes ready and got it through usual way using kevent_get_events() or kevent_wait()) 
and thus no need to copy it to the ring buffer.

It is possible with userspace ring buffer, that events in the ring buffer can be replaced without knowledge 
for the thread currently reading them (when other thread calls kevent_get_events() or kevent_wait()), 
so appropriate locking between threads or processes, which can simultaneously access the same ring buffer, 
is required.
-------------------------------------------------------------------------------------

The bulk of the interface is entirely done through the ukevent struct. 
It is used to add event requests, modify existing event requests, 
specify which event requests to remove, and return completed events.

struct ukevent contains the following members:

struct kevent_id id
    Id of this request, e.g. socket number, file descriptor and so on 
__u32 type
    Event type, e.g. KEVENT_SOCK, KEVENT_INODE, KEVENT_TIMER and so on 
__u32 event
    Event itself, e.g. SOCK_ACCEPT, INODE_CREATED, TIMER_FIRED 
__u32 req_flags
    Per-event request flags,

    KEVENT_REQ_ONESHOT
        event will be removed when it is ready 

    KEVENT_REQ_WAKEUP_ONE
        When several threads wait on the same kevent queue and requested the same event, 
	for example 'wake me up when new client has connected, so I could call accept()', 
	then all threads will be awakened when new client has connected, but only one of 
	them can process the data. This problem is known as thundering nerd problem. 
	Events which have this flag set will not be marked as ready (and appropriate threads 
	will not be awakened) if at least one event has been already marked. 

    KEVENT_REQ_ET
        Edge Triggered behaviour. It is an optimisation which allows to move ready and dequeued 
	(i.e. copied to userspace) event to move into set of interest for given storage (socket, 
	inode and so on) again. It is very usefull for cases when the same event should be used 
	many times (like reading from pipe). It is similar to epoll()'s EPOLLET flag. 

__u32 ret_flags
    Per-event return flags

    KEVENT_RET_BROKEN
        Kevent is broken 

    KEVENT_RET_DONE
        Kevent processing was finished successfully 

    KEVENT_RET_COPY_FAILED
        Kevent was not copied into ring buffer due to some error conditions. 

__u32 ret_data
    Event return data. Event originator fills it with anything it likes (for example 
    timer notifications put number of milliseconds when timer has fired 
union { __u32 user[2]; void *ptr; }
    User's data. It is not used, just copied to/from user. The whole structure is aligned 
    to 8 bytes already, so the last union is aligned properly. 

---------------------------------------------------------------------------------

Usage

For KEVENT_CTL_ADD, all fields relevant to the event type must be filled 
(id, type, possibly event, req_flags). After kevent_ctl(..., KEVENT_CTL_ADD, ...) 
returns each struct's ret_flags should be checked to see if the event is already broken or done.

For KEVENT_CTL_MODIFY, the id, req_flags, and user and event fields must be set and an 
existing kevent request must have matching id and user fields. If a match is found, 
req_flags and event are replaced with the newly supplied values and requeueing is started, 
so modified kevent can be checked and probably marked as ready immediately. If a match can't
be found, the passed in ukevent's ret_flags has KEVENT_RET_BROKEN set. KEVENT_RET_DONE is always set.

For KEVENT_CTL_REMOVE, the id and user fields must be set and an existing kevent request must 
have matching id and user fields. If a match is found, the kevent request is removed. 
If a match can't be found, the passed in ukevent's ret_flags has KEVENT_RET_BROKEN set. 
KEVENT_RET_DONE is always set.

For kevent_get_events, the entire structure is returned.

---------------------------------------------------------------------------------

Usage cases

kevent_timer
struct ukevent should contain following fields:
    type - KEVENT_TIMER 
    event - KEVENT_TIMER_FIRED 
    req_flags - KEVENT_REQ_ONESHOT if you want to fire that timer only once 
    id.raw[0] - number of seconds after commit when this timer shout expire 
    id.raw[0] - additional to number of seconds number of nanoseconds 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/