linux-kernel - Re: [PATCH] io_uring: Use io

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230724161654.cjh7pd63uas5grmz@awork3.anarazel.de>
Date:   Mon, 24 Jul 2023 09:16:54 -0700
From:   Andres Freund <andres@...razel.de>
To:     Jens Axboe <axboe@...nel.dk>
Cc:     Phil Elwell <phil@...pberrypi.com>, asml.silence@...il.com,
        david@...morbit.com, hch@....de, io-uring@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>, linux-xfs@...r.kernel.org,
        stable <stable@...r.kernel.org>
Subject: Re: [PATCH] io_uring: Use io_schedule* in cqring wait

Hi,

On 2023-07-24 09:48:58 -0600, Jens Axboe wrote:
> On 7/24/23 9:35?AM, Phil Elwell wrote:
> > Hi Andres,
> > 
> > With this commit applied to the 6.1 and later kernels (others not
> > tested) the iowait time ("wa" field in top) in an ARM64 build running
> > on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core
> > is permanently blocked on I/O. The change can be observed after
> > installing mariadb-server (no configuration or use is required). After
> > reverting just this commit, "wa" drops to zero again.
> 
> There are a few other threads on this...
> 
> > I can believe that this change hasn't negatively affected performance,
> > but the result is misleading. I also think it's pushing the boundaries
> > of what a back-port to stable should do.

FWIW, I think this partially just mpstat reporting something quite bogus. It
makes no sense to say that a cpu is 100% busy waiting for IO, when the one
process is doing IO is just waiting.


> +static bool current_pending_io(void)
> +{
> +	struct io_uring_task *tctx = current->io_uring;
> +
> +	if (!tctx)
> +		return false;
> +	return percpu_counter_read_positive(&tctx->inflight);
> +}
> +
>  /* when returns >0, the caller should retry */
>  static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
>  					  struct io_wait_queue *iowq)
>  {
> -	int token, ret;
> +	int io_wait, ret;
>  
>  	if (unlikely(READ_ONCE(ctx->check_cq)))
>  		return 1;
> @@ -2511,17 +2520,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
>  		return 0;
>  
>  	/*
> -	 * Use io_schedule_prepare/finish, so cpufreq can take into account
> -	 * that the task is waiting for IO - turns out to be important for low
> -	 * QD IO.
> +	 * Mark us as being in io_wait if we have pending requests, so cpufreq
> +	 * can take into account that the task is waiting for IO - turns out
> +	 * to be important for low QD IO.
>  	 */
> -	token = io_schedule_prepare();
> +	io_wait = current->in_iowait;

I don't know the kernel "rules" around this, but ->in_iowait is only modified
in kernel/sched, so it seemed a tad "unfriendly" to scribble on it here...


Building a kernel to test with the patch applied, will reboot into it once the
call I am on has finished. Unfortunately the performance difference didn't
reproduce nicely in VM...

Greetings,

Andres Freund