[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200608231451.07499.dada1@cosmosbay.com>
Date: Wed, 23 Aug 2006 14:51:06 +0200
From: Eric Dumazet <dada1@...mosbay.com>
To: Evgeniy Polyakov <johnpol@....mipt.ru>
Cc: lkml <linux-kernel@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Ulrich Drepper <drepper@...hat.com>,
Andrew Morton <akpm@...l.org>, netdev <netdev@...r.kernel.org>,
Zach Brown <zach.brown@...cle.com>,
Christoph Hellwig <hch@...radead.org>
Subject: Re: [take13 1/3] kevent: Core files.
Again Evgeniy I really begin to like kevent :)
On Wednesday 23 August 2006 13:24, Evgeniy Polyakov wrote:
+struct kevent
+{
+ /* Used for kevent freeing.*/
+ struct rcu_head rcu_head;
+ struct ukevent event;
+ /* This lock protects ukevent manipulations, e.g. ret_flags changes.
*/
+ spinlock_t ulock;
+
+ /* Entry of user's queue. */
+ struct list_head kevent_entry;
+ /* Entry of origin's queue. */
+ struct list_head storage_entry;
+ /* Entry of user's ready. */
+ struct list_head ready_entry;
+
+ u32 flags;
+
+ /* User who requested this kevent. */
+ struct kevent_user *user;
+ /* Kevent container. */
+ struct kevent_storage *st;
+
+ struct kevent_callbacks callbacks;
+
+ /* Private data for different storages.
+ * poll()/select storage has a list of wait_queue_t containers
+ * for each ->poll() { poll_wait()' } here.
+ */
+ void *priv;
+};
I wonder if you can reorder fields in this structure, so that 'read mostly'
fields are grouped together, maybe in a different cache line.
This should help reduce false sharing in SMP.
read mostly fields are (but you know better than me) : callbacks, rcu_head,
priv, user, event, ...
+#define KEVENT_MAX_EVENTS 4096
Could you please tell me (Forgive me if you already clarified this point) ,
what happens if the number of queued events reaches this value ?
+int kevent_init(struct kevent *k)
+{
+ spin_lock_init(&k->ulock);
+ k->flags = 0;
+
+ if (unlikely(k->event.type >= KEVENT_MAX))
+ return kevent_break(k);
+
As long you are sure we cannot call kevent_enqueue()/kevent_dequeue() after a
failed kevent_init() it should be fine.
+int kevent_add_callbacks(const struct kevent_callbacks *cb, int pos)
+{
+ struct kevent_callbacks *p;
+
+ if (pos >= KEVENT_MAX)
+ return -EINVAL;
if a negative pos is used here we might crash. KEVENT_MAX is a signed too, so
the compare is done on signed values.
If we consider callers always give a sane value, the test can be suppressed.
If we consider callers may be wrong, then we must do a correct test.
If you dont want to change function prototype, then change the test to :
if ((unsigned)pos >= KEVENT_MAX)
return -EINVAL;
Some people on lkml will prefer:
if (pos < 0 || pos >= KEVENT_MAX)
return -EINVAL;
or
#define KEVENT_MAX 6U /* unsigned constant */
+static kmem_cache_t *kevent_cache;
You probably want to add __read_mostly here to avoid false sharing.
+static kmem_cache_t *kevent_cache __read_mostly;
Same for other caches :
+static kmem_cache_t *kevent_poll_container_cache;
+static kmem_cache_t *kevent_poll_priv_cache;
About the hash table :
+struct kevent_user
+{
+ struct list_head kevent_list[KEVENT_HASH_MASK+1];
+ spinlock_t kevent_lock;
epoll used to use a hash table too (its size was configurable at init time),
and was converted to a RB-tree for good reasons...(avoid a user to allocate a
big hash table in pinned memory and DOS)
Are you sure a process handling one million sockets will succeed using kevent
instead of epoll ?
Do you have a pointer to sample source code using mmap()/kevent interface ?
It's not clear to me how we can use it (and notice that a full wrap occured,
user app could miss x*KEVENT_MAX_EVENTS events ?). Do we still must use a
syscall to dequeue events ?
In particular you state sizeof(mukevent) is 40, while its 12:
+/*
+ * Note that kevents does not exactly fill the page (each mukevent is 40
bytes),
+ * so we reuse 4 bytes at the begining of the first page to store index.
+ * Take that into account if you want to change size of struct ukevent.
+ */
+struct mukevent
+{
+ struct kevent_id id; /* size()=8 */
+ __u32 ret_flags; /* size()=4 */
+};
Thank you
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists