lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30cf5639-ff99-9e73-42cd-21955088c283@kernel.dk>
Date:   Wed, 19 Apr 2023 07:32:44 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     luhongfei <luhongfei@...o.com>,
        Pavel Begunkov <asml.silence@...il.com>,
        "open list:IO_URING" <io-uring@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>
Cc:     opensource.kernel@...o.com
Subject: Re: [PATCH] io_uring: Optimization of buffered random write

On 4/19/23 3:22?AM, luhongfei wrote:
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 4a865f0e85d0..64bb91beb4d6
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2075,8 +2075,23 @@ static inline void io_queue_sqe(struct io_kiocb *req)
>  	__must_hold(&req->ctx->uring_lock)
>  {
>  	int ret;
> +	bool is_write;
>  
> -	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
> +	switch (req->opcode) {
> +	case IORING_OP_WRITEV:
> +	case IORING_OP_WRITE_FIXED:
> +	case IORING_OP_WRITE:
> +		is_write = true;
> +		break;
> +	default:
> +		is_write = false;
> +		break;
> +	}
> +
> +	if (!is_write || (req->rw.kiocb.ki_flags & IOCB_DIRECT))
> +		ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
> +	else
> +		ret = io_issue_sqe(req, 0);
>  
>  	/*
>  	 * We async punt it if the file wasn't marked NOWAIT, or if the file

We really can't just do that, implicitly. What you are doing is making
any of write synchronous. What are you writing to in terms of device or
file? If file, what file system is being used? Curious if the target
supports async buffered writes, guessing it does not which is why you
see io-wq activity for all of them.

That said, I did toss out a test patch a while back that explicitly sets
up the ring such that we'll do blocking IO rather than do a non-blocking
attempt and then punt it if that fails. And I do think there's a use
case for that, in case you just want to use io_uring for batched
syscalls and don't care about if you end up blocking for some IO.

Let's do a primer on what happens for io_uring issue:

1) Non-blocking issue is attempted for IO. If successful, we're done for
   now.

2) Case 1 failed. Now we have two options
	a) We can poll the file. We arm poll, and we're done for now
	   until that triggers.
	b) File cannot be polled, we punt to io-wq which then does a
	   blocking attempt.

For case 2b, this is the one where we could've just done a blocking
attempt initially if the ring was setup with a flag explicitly saying
that's what the application wants. Or io_uring_enter() had a flag passed
in that explicitly said this is what the applications wants. I suspect
we'll want both, to cover both SQPOLL and !SQPOLL.

I'd recommend we still retain non-blocking issue for pollable files, as
you could very quickly block forever otherwise. Imagine an empty pipe
and a read issued to it in the blocking mode.

A solution like that would cater to your case too, without potentially
breaking a lot of things like your patch could. The key here is the
explicit nature of it, we cannot just go and make odd assumptions about
a particular opcode type (writes) and ring type (SQPOLL) and say "oh
this one is fine for just ignoring blocking off the issue path".

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ