[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250521145320.oqUIOaRG@linutronix.de>
Date: Wed, 21 May 2025 16:53:20 +0200
From: Nam Cao <namcao@...utronix.de>
To: Valentin Schneider <vschneid@...hat.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
John Ogness <john.ogness@...utronix.de>,
Clark Williams <clrkwllms@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
linux-rt-users@...r.kernel.org, Joe Damato <jdamato@...tly.com>,
Martin Karsten <mkarsten@...terloo.ca>,
Jens Axboe <axboe@...nel.dk>,
Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH] eventpoll: Fix priority inversion problem
On Wed, May 21, 2025 at 04:25:00PM +0200, Valentin Schneider wrote:
> On 19/05/25 09:40, Nam Cao wrote:
> > @@ -136,14 +136,28 @@ struct epitem {
> > struct rcu_head rcu;
> > };
> >
> > - /* List header used to link this structure to the eventpoll ready list */
> > - struct list_head rdllink;
> > + /*
> > + * Whether this item can be added to the eventpoll ready list. Adding to the list can be
> > + * blocked for two reasons:
> > + *
> > + * 1. This item is already on the list.
> > + * 2. A waiter is consuming the ready list and has consumed this item. The waiter therefore
> > + * is blocking this item from being added again, preventing seeing the same item twice.
> > + * If adding is blocked due to this reason, the waiter will add this item to the list
> > + * once consuming is done.
> > + */
> > + bool link_locked;
>
> Nit: IIUC it's not just the readylist, it's anytime the link is used
> (e.g. to punt it on a txlist), so how about:
>
> /*
> * Whether epitem.rdllink is currently used in a list. When used, it
> * cannot be detached or inserted elsewhere.
> *
> * It may be in use for two reasons:
> *
> * 1. This item is on the eventpoll ready list
> * 2. This item is being consumed by a waiter and stashed on a temporary
> * list. If adding is blocked due to this reason, the waiter will add
> * this item to the list once consuming is done.
> */
> bool link_used;
Acked.
> >
> > /*
> > - * Works together "struct eventpoll"->ovflist in keeping the
> > - * single linked chain of items.
> > + * Indicate whether this item is ready for consumption. All items on the ready list has this
> > + * flag set. Item that should be on the ready list, but cannot be added because of
> > + * link_locked (in other words, a waiter is consuming the ready list), also has this flag
> > + * set. When a waiter is done consuming, the waiter will add ready items to the ready list.
> > */
> > - struct epitem *next;
> > + bool ready;
> > +
> > + /* List header used to link this structure to the eventpoll ready list */
> > + struct llist_node rdllink;
> >
> > /* The file descriptor information this item refers to */
> > struct epoll_filefd ffd;
>
> > @@ -361,10 +368,27 @@ static inline int ep_cmp_ffd(struct epoll_filefd *p1,
> > (p1->file < p2->file ? -1 : p1->fd - p2->fd));
> > }
> >
> > -/* Tells us if the item is currently linked */
> > -static inline int ep_is_linked(struct epitem *epi)
> > +static void epitem_ready(struct epitem *epi)
> > {
> > - return !list_empty(&epi->rdllink);
> > + /*
> > + * Mark it ready, just in case a waiter is blocking this item from going into the ready
> > + * list. This will tell the waiter to add this item to the ready list, after the waiter is
> > + * finished.
> > + */
> > + xchg(&epi->ready, true);
>
> Perhaps a stupid question, why use an xchg() for .ready and .link_locked
> (excepted for that epitem_ready() cmpxchg()) writes when the return value
> is always discarded? Wouldn't e.g. smp_store_release() suffice, considering
> the reads are smp_load_acquire()?
That's me being stupid, smp_store_release() is good enough.
> > +
> > + /*
> > + * If this item is not blocked, add it to the ready list. This item could be blocked for two
> > + * reasons:
> > + *
> > + * 1. It is already on the ready list. Then nothing further is required.
> > + * 2. A waiter is consuming the ready list, and has consumed this item. The waiter is now
> > + * blocking this item, so that this item won't be seen twice. In this case, the waiter
> > + * will add this item to the ready list after the waiter is finished.
> > + */
> > + if (!cmpxchg(&epi->link_locked, false, true))
> > + llist_add(&epi->rdllink, &epi->ep->rdllist);
> > +
> > }
> >
> > static inline struct eppoll_entry *ep_pwq_from_wait(wait_queue_entry_t *p)
>
> > @@ -1924,19 +1874,39 @@ static int ep_send_events(struct eventpoll *ep,
> > * Trigger mode, we need to insert back inside
> > * the ready list, so that the next call to
> > * epoll_wait() will check again the events
> > - * availability. At this point, no one can insert
> > - * into ep->rdllist besides us. The epoll_ctl()
> > - * callers are locked out by
> > - * ep_send_events() holding "mtx" and the
> > - * poll callback will queue them in ep->ovflist.
> > + * availability.
> > */
> > - list_add_tail(&epi->rdllink, &ep->rdllist);
> > + xchg(&epi->ready, true);
> > + }
> > + }
> > +
> > + llist_for_each_entry_safe(epi, tmp, txlist.first, rdllink) {
> > + /*
> > + * We are done iterating. Allow the items we took to be added back to the ready
> > + * list.
> > + */
> > + xchg(&epi->link_locked, false);
> > +
> > + /*
> > + * In the loop above, we may mark some items ready, and they should be added back.
> > + *
> > + * Additionally, someone else may also attempt to add the item to the ready list,
> > + * but got blocked by us. Add those blocked items now.
> > + */
> > + if (smp_load_acquire(&epi->ready)) {
> > ep_pm_stay_awake(epi);
> > + epitem_ready(epi);
> > }
>
> Isn't this missing a:
>
> list_del_init(&epi->rdllink);
>
> AFAICT we're always going to overwrite that link when next marking the item
> as ready, but I'd say it's best to exit this with a clean state. That would
> have to be before the clearing of link_locked so it doesn't race with a
> concurrent epitem_ready() call.
To confirm I understand you: there is no functional problem, and your
comment is more of a "just to be safe"?
Thanks so much for the review,
Nam
Powered by blists - more mailing lists