lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmh8qmq9h37.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Wed, 21 May 2025 16:25:00 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: Nam Cao <namcao@...utronix.de>, Alexander Viro
 <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan
 Kara <jack@...e.cz>, Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 John Ogness <john.ogness@...utronix.de>, Clark Williams
 <clrkwllms@...nel.org>, Steven Rostedt <rostedt@...dmis.org>,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-rt-devel@...ts.linux.dev, linux-rt-users@...r.kernel.org, Joe Damato
 <jdamato@...tly.com>, Martin Karsten <mkarsten@...terloo.ca>, Jens Axboe
 <axboe@...nel.dk>
Cc: Nam Cao <namcao@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH] eventpoll: Fix priority inversion problem

On 19/05/25 09:40, Nam Cao wrote:
> @@ -136,14 +136,28 @@ struct epitem {
>               struct rcu_head rcu;
>       };
>
> -	/* List header used to link this structure to the eventpoll ready list */
> -	struct list_head rdllink;
> +	/*
> +	 * Whether this item can be added to the eventpoll ready list. Adding to the list can be
> +	 * blocked for two reasons:
> +	 *
> +	 *  1. This item is already on the list.
> +	 *  2. A waiter is consuming the ready list and has consumed this item. The waiter therefore
> +	 *     is blocking this item from being added again, preventing seeing the same item twice.
> +	 *     If adding is blocked due to this reason, the waiter will add this item to the list
> +	 *     once consuming is done.
> +	 */
> +	bool link_locked;

Nit: IIUC it's not just the readylist, it's anytime the link is used
(e.g. to punt it on a txlist), so how about:

        /*
         * Whether epitem.rdllink is currently used in a list. When used, it
         * cannot be detached or inserted elsewhere.
         *
         * It may be in use for two reasons:
         *
         * 1. This item is on the eventpoll ready list
         * 2. This item is being consumed by a waiter and stashed on a temporary
         *    list. If adding is blocked due to this reason, the waiter will add
         *    this item to the list once consuming is done.
         */
         bool link_used;

>
>       /*
> -	 * Works together "struct eventpoll"->ovflist in keeping the
> -	 * single linked chain of items.
> +	 * Indicate whether this item is ready for consumption. All items on the ready list has this
> +	 * flag set. Item that should be on the ready list, but cannot be added because of
> +	 * link_locked (in other words, a waiter is consuming the ready list), also has this flag
> +	 * set. When a waiter is done consuming, the waiter will add ready items to the ready list.
>        */
> -	struct epitem *next;
> +	bool ready;
> +
> +	/* List header used to link this structure to the eventpoll ready list */
> +	struct llist_node rdllink;
>
>       /* The file descriptor information this item refers to */
>       struct epoll_filefd ffd;

> @@ -361,10 +368,27 @@ static inline int ep_cmp_ffd(struct epoll_filefd *p1,
>               (p1->file < p2->file ? -1 : p1->fd - p2->fd));
>  }
>
> -/* Tells us if the item is currently linked */
> -static inline int ep_is_linked(struct epitem *epi)
> +static void epitem_ready(struct epitem *epi)
>  {
> -	return !list_empty(&epi->rdllink);
> +	/*
> +	 * Mark it ready, just in case a waiter is blocking this item from going into the ready
> +	 * list. This will tell the waiter to add this item to the ready list, after the waiter is
> +	 * finished.
> +	 */
> +	xchg(&epi->ready, true);

Perhaps a stupid question, why use an xchg() for .ready and .link_locked
(excepted for that epitem_ready() cmpxchg()) writes when the return value
is always discarded? Wouldn't e.g. smp_store_release() suffice, considering
the reads are smp_load_acquire()?

> +
> +	/*
> +	 * If this item is not blocked, add it to the ready list. This item could be blocked for two
> +	 * reasons:
> +	 *
> +	 * 1. It is already on the ready list. Then nothing further is required.
> +	 * 2. A waiter is consuming the ready list, and has consumed this item. The waiter is now
> +	 *    blocking this item, so that this item won't be seen twice. In this case, the waiter
> +	 *    will add this item to the ready list after the waiter is finished.
> +	 */
> +	if (!cmpxchg(&epi->link_locked, false, true))
> +		llist_add(&epi->rdllink, &epi->ep->rdllist);
> +
>  }
>
>  static inline struct eppoll_entry *ep_pwq_from_wait(wait_queue_entry_t *p)

> @@ -1924,19 +1874,39 @@ static int ep_send_events(struct eventpoll *ep,
>                        * Trigger mode, we need to insert back inside
>                        * the ready list, so that the next call to
>                        * epoll_wait() will check again the events
> -			 * availability. At this point, no one can insert
> -			 * into ep->rdllist besides us. The epoll_ctl()
> -			 * callers are locked out by
> -			 * ep_send_events() holding "mtx" and the
> -			 * poll callback will queue them in ep->ovflist.
> +			 * availability.
>                        */
> -			list_add_tail(&epi->rdllink, &ep->rdllist);
> +			xchg(&epi->ready, true);
> +		}
> +	}
> +
> +	llist_for_each_entry_safe(epi, tmp, txlist.first, rdllink) {
> +		/*
> +		 * We are done iterating. Allow the items we took to be added back to the ready
> +		 * list.
> +		 */
> +		xchg(&epi->link_locked, false);
> +
> +		/*
> +		 * In the loop above, we may mark some items ready, and they should be added back.
> +		 *
> +		 * Additionally, someone else may also attempt to add the item to the ready list,
> +		 * but got blocked by us. Add those blocked items now.
> +		 */
> +		if (smp_load_acquire(&epi->ready)) {
>                       ep_pm_stay_awake(epi);
> +			epitem_ready(epi);
>               }

Isn't this missing a:

                list_del_init(&epi->rdllink);

AFAICT we're always going to overwrite that link when next marking the item
as ready, but I'd say it's best to exit this with a clean state. That would
have to be before the clearing of link_locked so it doesn't race with a
concurrent epitem_ready() call.

>       }
> -	ep_done_scan(ep, &txlist);
> +
> +	__pm_relax(ep->ws);
>       mutex_unlock(&ep->mtx);
>
> +	if (!llist_empty(&ep->rdllist)) {
> +		if (waitqueue_active(&ep->wq))
> +			wake_up(&ep->wq);
> +	}
> +
>       return res;
>  }
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ