[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251218204239.4159453-9-sashal@kernel.org>
Date: Thu, 18 Dec 2025 15:42:30 -0500
From: Sasha Levin <sashal@...nel.org>
To: linux-api@...r.kernel.org
Cc: linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org,
tools@...nel.org,
gpaoloni@...hat.com,
Sasha Levin <sashal@...nel.org>
Subject: [RFC PATCH v5 08/15] kernel/api: add API specification for io_cancel
Signed-off-by: Sasha Levin <sashal@...nel.org>
---
fs/aio.c | 246 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 237 insertions(+), 9 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index f6f1b3790c88b..710517c9a990d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2843,15 +2843,243 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
}
#endif
-/* sys_io_cancel:
- * Attempts to cancel an iocb previously passed to io_submit. If
- * the operation is successfully cancelled, the resulting event is
- * copied into the memory pointed to by result without being placed
- * into the completion queue and 0 is returned. May fail with
- * -EFAULT if any of the data structures pointed to are invalid.
- * May fail with -EINVAL if aio_context specified by ctx_id is
- * invalid. May fail with -EAGAIN if the iocb specified was not
- * cancelled. Will fail with -ENOSYS if not implemented.
+/**
+ * sys_io_cancel - Attempt to cancel an outstanding asynchronous I/O operation
+ * @ctx_id: AIO context handle returned by io_setup
+ * @iocb: Pointer to the iocb structure that was previously submitted
+ * @result: Unused parameter (historically for result storage, now ignored)
+ *
+ * long-desc: Attempts to cancel an asynchronous I/O operation that was
+ * previously submitted via io_submit(). The syscall searches for the
+ * specified iocb in the context's active request list and invokes the
+ * operation-specific cancellation callback if found.
+ *
+ * The cancellation behavior depends on the type of I/O operation:
+ * - For poll operations (IOCB_CMD_POLL): The request is marked as cancelled
+ * and a work item is scheduled to complete the cancellation.
+ * - For USB gadget I/O: The USB endpoint dequeue function is called, which
+ * triggers the completion callback with -ECONNRESET status.
+ * - For most direct I/O operations: Cancellation is typically not supported
+ * as these operations do not register a cancel callback.
+ *
+ * If the iocb is found and has a registered cancellation callback, that
+ * callback is invoked and the iocb is removed from the active request list.
+ * The completion event is delivered via the ring buffer (not via the result
+ * parameter, which is now unused for this purpose).
+ *
+ * On successful cancellation initiation, the syscall returns -EINPROGRESS
+ * (not 0) to indicate that cancellation is in progress. This is because
+ * the actual completion may occur asynchronously via the cancel callback.
+ *
+ * Important limitations:
+ * - Most file I/O operations do not support cancellation
+ * - The iocb must still be pending (not yet completed)
+ * - The iocb must have been submitted via io_submit (aio_key == KIOCB_KEY)
+ * - Only operations that register a ki_cancel callback can be cancelled
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC
+ *
+ * param: ctx_id
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must be a valid AIO context handle previously returned by
+ * io_setup() for the current process. The context must not have been
+ * destroyed via io_destroy(). A value of 0 is always invalid. The handle
+ * is actually the virtual address of the ring buffer mapping, and must
+ * belong to the calling process's address space.
+ *
+ * param: iocb
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ * size: sizeof(struct iocb)
+ * constraint-type: KAPI_CONSTRAINT_USER_PTR
+ * constraint: Must be a valid userspace pointer to a struct iocb that was
+ * previously submitted via io_submit(). The iocb's aio_key field must
+ * contain KIOCB_KEY (0), which is written by the kernel during io_submit.
+ * A NULL pointer will result in EFAULT. The iocb must still be pending
+ * (present in the context's active_reqs list) for cancellation to succeed.
+ *
+ * param: result
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL
+ * constraint-type: KAPI_CONSTRAINT_NONE
+ * constraint: This parameter is no longer used by the kernel. It was
+ * historically intended to receive the io_event result on successful
+ * cancellation, but completion events are now always delivered via the
+ * ring buffer. May be NULL.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_ERROR_CHECK
+ * success: -EINPROGRESS
+ * desc: Returns -EINPROGRESS when the cancellation callback was successfully
+ * invoked and the request is being cancelled. This is the expected return
+ * value on successful cancellation initiation. The completion event will
+ * be delivered via the ring buffer. Note that this is different from the
+ * man page which claims 0 is returned on success.
+ *
+ * error: EFAULT, Cannot read iocb from userspace
+ * desc: Returned if the iocb pointer is invalid or points to memory that
+ * cannot be read. Specifically, the kernel attempts to read the aio_key
+ * field from the iocb via get_user() and returns EFAULT if this fails.
+ * A NULL iocb pointer will trigger this error.
+ *
+ * error: EINVAL, iocb not submitted via io_submit
+ * desc: Returned if the aio_key field of the iocb does not contain KIOCB_KEY
+ * (which is 0). The kernel sets aio_key to KIOCB_KEY when an iocb is
+ * successfully submitted via io_submit(). If aio_key contains a different
+ * value, it indicates the iocb was never successfully submitted, is
+ * corrupted, or the memory has been reused.
+ *
+ * error: EINVAL, Invalid AIO context
+ * desc: Returned if ctx_id does not refer to a valid AIO context. This can
+ * occur if: (1) the context was never created, (2) the context was
+ * destroyed via io_destroy(), (3) the ctx_id is 0, (4) the ring buffer
+ * header cannot be read from userspace, (5) the context belongs to a
+ * different process, or (6) the context's internal ID doesn't match.
+ *
+ * error: EINVAL, iocb not found or not cancellable
+ * desc: Returned if the specified iocb is not present in the context's
+ * active request list. This occurs when: (1) the operation has already
+ * completed and the completion event is in the ring buffer, (2) the
+ * operation was never submitted to this context, (3) the iocb pointer
+ * does not match any pending operation (comparison is by pointer value
+ * converted to u64), or (4) the operation did not register a cancellation
+ * callback (though in this case EINVAL comes from the default ret value).
+ * Note: The man page documents EAGAIN for this case, but the actual
+ * implementation returns EINVAL.
+ *
+ * error: ENOSYS, AIO not implemented
+ * desc: Returned if the kernel was compiled without CONFIG_AIO support.
+ * This error is returned by the syscall dispatch mechanism before the
+ * io_cancel implementation is even reached.
+ *
+ * error: (driver-specific), Cancellation callback failed
+ * desc: If the iocb is found and its ki_cancel callback is invoked, the
+ * callback's return value is propagated to userspace if non-zero. For
+ * USB gadget operations, usb_ep_dequeue() may return various errors
+ * including EINVAL if the request wasn't queued. The aio_poll_cancel
+ * callback always returns 0. Driver-specific cancellation functions
+ * may return other error codes.
+ *
+ * lock: RCU read lock
+ * type: KAPI_LOCK_RCU
+ * desc: Acquired in lookup_ioctx() during context lookup. Protects against
+ * concurrent modification of the mm->ioctx_table while searching for the
+ * context. Released before any spinlocks are acquired.
+ *
+ * lock: ctx->ctx_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * desc: Per-context spinlock acquired with interrupts disabled via
+ * spin_lock_irq(). Held while iterating through the active_reqs list
+ * searching for the iocb, while invoking the ki_cancel callback, and
+ * while removing the iocb from the list. The cancel callback is invoked
+ * with this lock held, so callbacks must not sleep and must be IRQ-safe.
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: ctx->active_reqs list
+ * desc: If the iocb is found and its cancellation callback is invoked, the
+ * kiocb is removed from the context's active_reqs list via list_del_init().
+ * This prevents the iocb from being found by subsequent io_cancel calls.
+ * condition: iocb found and ki_cancel callback invoked
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: Pending I/O operation
+ * desc: The cancellation callback may modify the state of the underlying
+ * I/O operation. For poll operations, the cancelled flag is set. For USB
+ * operations, the USB request is dequeued which triggers the completion
+ * callback. The completion event is delivered via the ring buffer.
+ * condition: ki_cancel callback is invoked
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_SCHEDULE
+ * target: aio_poll work queue
+ * desc: For poll operations (IOCB_CMD_POLL), the aio_poll_cancel callback
+ * schedules a work item via schedule_work() to complete the cancellation
+ * asynchronously. This work item will eventually deliver the completion
+ * event to the ring buffer.
+ * condition: Cancelling a poll operation
+ * reversible: no
+ *
+ * state-trans: kiocb state
+ * from: in_flight (in active_reqs list)
+ * to: cancelling (removed from list, cancel callback invoked)
+ * condition: iocb found and ki_cancel invoked
+ * desc: When the iocb is found in the active_reqs list and its cancellation
+ * callback is invoked, the kiocb transitions from in-flight to cancelling
+ * state. The kiocb is removed from the active_reqs list, preventing
+ * duplicate cancellation attempts. Final completion occurs asynchronously.
+ *
+ * state-trans: poll_iocb cancelled flag
+ * from: false
+ * to: true
+ * condition: aio_poll_cancel is invoked
+ * desc: For poll operations, the aio_poll_cancel callback sets the cancelled
+ * flag on the poll_iocb structure. This signals to the poll completion
+ * handler that the operation was cancelled rather than completed normally.
+ *
+ * constraint: Operation must support cancellation
+ * desc: Only operations that register a ki_cancel callback can be cancelled.
+ * Operations that don't set this callback (most direct I/O operations)
+ * will never appear in the active_reqs list and thus cannot be cancelled.
+ * Currently, only IOCB_CMD_POLL operations in the kernel AIO subsystem
+ * and USB gadget operations support cancellation.
+ *
+ * constraint: Timing window for cancellation
+ * desc: The iocb must still be pending at the time io_cancel is called.
+ * There is an inherent race condition: the operation may complete
+ * naturally between the time the application decides to cancel and when
+ * io_cancel is invoked. In this case, EINVAL is returned because the
+ * iocb is no longer in the active_reqs list.
+ *
+ * constraint: CONFIG_AIO required
+ * desc: The kernel must be compiled with CONFIG_AIO=y for this syscall
+ * to be available. If not configured, ENOSYS is returned.
+ *
+ * examples: io_cancel(ctx, &iocb, NULL); // Cancel with unused result param
+ * if (io_cancel(ctx, &iocb, NULL) == -EINPROGRESS) handle_cancellation();
+ * ret = io_cancel(ctx, &iocb, NULL); if (ret && ret != -EINPROGRESS) error();
+ *
+ * notes: The return value semantics are unusual: -EINPROGRESS indicates
+ * successful cancellation initiation, not an error. This is because the
+ * actual cancellation may complete asynchronously, with the completion
+ * event delivered via the ring buffer.
+ *
+ * The result parameter is completely ignored by current kernels. It was
+ * historically used to return the io_event directly, but since commit
+ * 28468cbed92e ("Revert 'fs/aio: Make io_cancel() generate completions
+ * again'"), completion events are always delivered via the ring buffer.
+ * Applications should use io_getevents() to retrieve the cancelled
+ * operation's completion event.
+ *
+ * The man page documents EAGAIN as a possible error when "the iocb specified
+ * was not cancelled", but code analysis shows that EINVAL is actually
+ * returned in this case. The man page is outdated in this regard.
+ *
+ * The aio_key field must equal KIOCB_KEY (0) because the kernel writes this
+ * value during io_submit. If an application attempts to cancel an iocb
+ * before submitting it, or after the memory has been reused, this check
+ * will fail with EINVAL.
+ *
+ * For poll operations specifically, the cancellation is marked but the
+ * actual completion may be delayed until a worker processes it. The
+ * -EINPROGRESS return value reflects this asynchronous completion model.
+ *
+ * USB gadget operations are an exception: when usb_ep_dequeue() is called,
+ * it typically completes the request synchronously with -ECONNRESET status
+ * in the completion callback.
+ *
+ * There is no glibc wrapper for this syscall. Applications must use
+ * syscall(SYS_io_cancel, ...) or the libaio library. The libaio wrapper
+ * returns negative error numbers directly rather than returning -1 and
+ * setting errno.
+ *
+ * io_uring (since Linux 5.1) provides a more capable and widely-supported
+ * async I/O interface with better cancellation support via IORING_OP_ASYNC_CANCEL.
+ *
+ * since-version: 2.5
*/
SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
struct io_event __user *, result)
--
2.51.0
Powered by blists - more mailing lists