lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y8gPhTGkfCbGwoUu@linutronix.de>
Date:   Wed, 18 Jan 2023 16:25:57 +0100
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        Linux-RT <linux-rt-users@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] locking/rwbase: Prevent indefinite writer starvation

On 2023-01-17 16:50:21 [+0000], Mel Gorman wrote:

> diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
> index c201aadb9301..99d81e8d1f25 100644
> --- a/kernel/locking/rwbase_rt.c
> +++ b/kernel/locking/rwbase_rt.c
> @@ -65,6 +69,64 @@ static __always_inline int rwbase_read_trylock(struct rwbase_rt *rwb)
>  	return 0;
>  }
>  
> +/*
> + * Allow reader bias with a pending writer for a minimum of 4ms or 1 tick.
> + * This matches RWSEM_WAIT_TIMEOUT for the generic RWSEM implementation.
> + * The granularity is not exact as the lowest bit in rwbase_rt->waiter_timeout
> + * is used to detect recent DL / RT tasks taking a read lock.
> + */
> +#define RWBASE_RT_WAIT_TIMEOUT DIV_ROUND_UP(HZ, 250)
> +
> +static void __sched update_dlrt_reader(struct rwbase_rt *rwb)
> +{
> +	/* No update required if DL / RT tasks already identified. */
> +	if (rwb->waiter_timeout & 1)
> +		return;
> +
> +	/*
> +	 * Record a DL / RT task acquiring the lock for read. This may result
> +	 * in indefinite writer starvation but DL / RT tasks should avoid such
> +	 * behaviour.
> +	 */
> +	if (rt_task(current)) {
> +		struct rt_mutex_base *rtm = &rwb->rtmutex;
> +		unsigned long flags;
> +
> +		raw_spin_lock_irqsave(&rtm->wait_lock, flags);
> +		rwb->waiter_timeout |= 1;

Let me see of I parsed the whole logic right:

_After_ the RT reader acquired the lock, the lowest bit is set. This may
be immediately if the timeout did not occur yet.
With this flag set, all following reader incl. SCHED_OTHER will acquire
the lock.

If so, then I don't know why this is a good idea.

If _only_ the RT reader is allowed to acquire the lock while the writer
is waiting then it make sense to prefer the RT tasks. (So the check is
on current and not on the lowest bit).
All other (SCHED_OTHER) reader would have to block on the rtmutex after
the timeout. This makes sense to avoid the starvation.

If we drop that "we prefer the RT reader" then it would block on the
RTmutex. It will _still_ be preferred over the writer because it will be
enqueued before the writer in the queue due to its RT priority. The only
downside is that it has to wait until all readers are left.
So by allowing the RT reader to always acquire the lock as long as the
WRITER_BIAS isn't set, we would allow to enter early while the other
reader are still in and after the timeout you would only have RT reader
going in and out. All SCHED_OTHER reader block on the RTmutex.

I think I like this.


> +		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
> +	}
> +}
> +

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ