[<prev] [next>] [day] [month] [year] [list]
Message-Id: <D74C05BD-EC64-4A24-B7D8-E126056E831A@gmail.com>
Date: Wed, 30 Apr 2025 12:12:48 -0700
From: Thomas Haynes <loghyr@...il.com>
To: linux-kernel@...r.kernel.org
Subject: [QUESTION] io_uring: Handling -EAGAIN and potential duplicate
submissions
Hi LKML,
I am using kernel version 6.14.4-300.fc42.x86_64 and performing RPC
handling of NFSv3 requests in an user land server.
I'm working with io_uring and have a question about the correct way
to handle -EAGAIN from io_uring_submit(), specifically to avoid
potential duplicate submissions.
I have a submission loop that looks like this:
for (int i = 0; i < MAX_RETRIES; i++) {
ret = io_uring_submit(ring);
if (ret >= 0)
break;
if (ret == -EAGAIN) {
TRACE(write_fragment_trace,
"Context=%p resubmission %d", (void *)ic, i);
usleep(IO_URING_WAIT_US);
} else
break;
}
My understanding is that -EAGAIN from io_uring_submit() indicates
that the kernel's submission queue was temporarily full and the
submission should be retried. However, I'm observing a behavior
that suggests a potential for duplicate operations:
* I submit a request.
* io_uring_submit() returns -EAGAIN. The SQE remains in the
submission queue.
* I retry the io_uring_submit().
* Eventually, io_uring_submit() returns a positive value.
It appears that both the original SQE (from the -EAGAIN case) and
the SQE submitted in the successful call are processed, leading to
the operation being performed twice. It also leads to heap-use-after-free
after I release the associated memory after processing the first
CQE.
This raises a few questions:
* Is this behavior expected? Does -EAGAIN in io_uring_submit()
imply that the SQE may or may not have been partially processed
or queued for processing, even though the submit call itself
failed?
* If this is expected, what is the recommended way to handle
-EAGAIN to guarantee that each SQE is submitted and processed
exactly once, even under temporary queue pressure? Should I be
modifying the SQE or the submission queue in some way before
retrying?
* Are there any specific io_uring setup flags or other considerations
that might influence this behavior?
I'm concerned about the potential for data corruption or other
issues if operations are performed multiple times.
Any insights or best practices on handling -EAGAIN in this context
would be greatly appreciated.
Thanks,
Tom Haynes
Powered by blists - more mailing lists