lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251218204239.4159453-14-sashal@kernel.org>
Date: Thu, 18 Dec 2025 15:42:35 -0500
From: Sasha Levin <sashal@...nel.org>
To: linux-api@...r.kernel.org
Cc: linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	tools@...nel.org,
	gpaoloni@...hat.com,
	Sasha Levin <sashal@...nel.org>
Subject: [RFC PATCH v5 13/15] kernel/api: add API specification for sys_close

Signed-off-by: Sasha Levin <sashal@...nel.org>
---
 fs/open.c | 247 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 243 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 343e6d3798ec3..26d8ee8336405 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1868,10 +1868,249 @@ int filp_close(struct file *filp, fl_owner_t id)
 }
 EXPORT_SYMBOL(filp_close);
 
-/*
- * Careful here! We test whether the file pointer is NULL before
- * releasing the fd. This ensures that one clone task can't release
- * an fd while another clone is opening it.
+/**
+ * sys_close - Close a file descriptor
+ * @fd: The file descriptor to close
+ *
+ * long-desc: Terminates access to an open file descriptor, releasing the file
+ *   descriptor for reuse by subsequent open(), dup(), or similar syscalls. Any
+ *   advisory record locks (POSIX locks, OFD locks, and flock locks) held on the
+ *   associated file are released. When this is the last file descriptor
+ *   referring to the underlying open file description, associated resources are
+ *   freed. If the file was previously unlinked, the file itself is deleted when
+ *   the last reference is closed.
+ *
+ *   CRITICAL: The file descriptor is ALWAYS closed, even when close() returns
+ *   an error. This differs from POSIX semantics where the state of the file
+ *   descriptor is unspecified after EINTR. On Linux, the fd is released early
+ *   in close() processing before flush operations that may fail. Therefore,
+ *   retrying close() after an error return is DANGEROUS and may close an
+ *   unrelated file descriptor that was assigned to another thread.
+ *
+ *   Errors returned from close() (EIO, ENOSPC, EDQUOT) indicate that the final
+ *   flush of buffered data failed. These errors commonly occur on network
+ *   filesystems like NFS when write errors are deferred to close time. A
+ *   successful return from close() does NOT guarantee that data has been
+ *   successfully written to disk; the kernel uses buffer cache to defer writes.
+ *   To ensure data persistence, call fsync() before close().
+ *
+ *   On close, the following cleanup operations are performed: POSIX advisory
+ *   locks are removed, dnotify registrations are cleaned up, the file is
+ *   flushed if the file operations define a flush callback, and the file
+ *   reference is released. If this was the last reference, additional cleanup
+ *   includes: fsnotify close notification, epoll cleanup, flock and lease
+ *   removal, FASYNC cleanup, the file's release callback invocation, and
+ *   the file structure deallocation.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: fd
+ *   type: KAPI_TYPE_FD
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, INT_MAX
+ *   constraint: Must be a valid, open file descriptor for the current process.
+ *     The value 0, 1, or 2 (stdin, stdout, stderr) may be closed like any other
+ *     fd, though this is unusual and may cause issues with libraries that assume
+ *     these descriptors are valid. The parameter is unsigned int to match kernel
+ *     file descriptor table indexing, but values exceeding INT_MAX are effectively
+ *     invalid due to internal checks.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_EXACT
+ *   success: 0
+ *   desc: Returns 0 on success. On error, returns a negative error code.
+ *     IMPORTANT: Even when an error is returned, the file descriptor is still
+ *     closed and must not be used again. The error indicates a problem with
+ *     the final flush operation, not that the fd remains open.
+ *
+ * error: EBADF, Bad file descriptor
+ *   desc: The file descriptor fd is not a valid open file descriptor, or was
+ *     already closed. This is the only error that indicates the fd was NOT
+ *     closed (because it was never open to begin with). Occurs when fd is out
+ *     of range, has no file assigned, or was already closed.
+ *
+ * error: EINTR, Interrupted system call
+ *   desc: The flush operation was interrupted by a signal before completion.
+ *     This occurs when a file's flush callback (e.g., NFS) performs an
+ *     interruptible wait that receives a signal. IMPORTANT: Despite this error,
+ *     the file descriptor IS closed and must not be used again. This error
+ *     is generated by converting kernel-internal restart codes (ERESTARTSYS,
+ *     ERESTARTNOINTR, ERESTARTNOHAND, ERESTART_RESTARTBLOCK) to EINTR because
+ *     restarting the syscall would be incorrect once the fd is freed.
+ *
+ * error: EIO, I/O error
+ *   desc: An I/O error occurred during the flush of buffered data to the
+ *     underlying storage. This typically indicates a hardware error, network
+ *     failure on NFS, or other storage system error. The file descriptor is
+ *     still closed. Previously buffered write data may have been lost.
+ *
+ * error: ENOSPC, No space left on device
+ *   desc: There was insufficient space on the storage device to flush buffered
+ *     writes. This is common on NFS when the server runs out of space between
+ *     write() and close(). The file descriptor is still closed.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ *   desc: The user's disk quota was exceeded while attempting to flush buffered
+ *     writes. Common on NFS when quota is exceeded between write() and close().
+ *     The file descriptor is still closed.
+ *
+ * lock: files->file_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   acquired: true
+ *   released: true
+ *   desc: Acquired via file_close_fd() to atomically lookup and remove the fd
+ *     from the file descriptor table. Held only during the table manipulation;
+ *     released before flush and final cleanup operations. This ensures that
+ *     another thread cannot allocate the same fd number while close is in
+ *     progress.
+ *
+ * lock: file->f_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   acquired: true
+ *   released: true
+ *   desc: Acquired during epoll cleanup (eventpoll_release_file) and dnotify
+ *     cleanup to safely unlink the file from monitoring structures. May also
+ *     be acquired during lock context operations.
+ *
+ * lock: ep->mtx
+ *   type: KAPI_LOCK_MUTEX
+ *   acquired: true
+ *   released: true
+ *   desc: Acquired during epoll cleanup if the file was monitored by epoll.
+ *     Used to safely remove the file from epoll interest lists.
+ *
+ * lock: flc_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   acquired: true
+ *   released: true
+ *   desc: File lock context spinlock, acquired during locks_remove_file() to
+ *     safely remove POSIX, flock, and lease locks associated with the file.
+ *
+ * signal: pending_signals
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_RETURN
+ *   condition: When flush callback performs interruptible wait
+ *   desc: If the file's flush callback (e.g., nfs_file_flush) performs an
+ *     interruptible wait and a signal is pending, the wait is interrupted.
+ *     Any kernel restart codes are converted to EINTR since close cannot be
+ *     restarted after the fd is freed.
+ *   error: -EINTR
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   restartable: no
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_DESTROY | KAPI_EFFECT_IRREVERSIBLE
+ *   target: File descriptor table entry
+ *   desc: The file descriptor is removed from the process's file descriptor
+ *     table, making the fd number available for reuse by subsequent open(),
+ *     dup(), or similar calls. This occurs BEFORE any flush or cleanup that
+ *     might fail, making the operation irreversible regardless of return value.
+ *   condition: Always (when fd is valid)
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_LOCK_RELEASE
+ *   target: POSIX advisory locks, OFD locks, flock locks
+ *   desc: All advisory locks held on the file by this process are removed.
+ *     POSIX locks are removed via locks_remove_posix() during filp_flush().
+ *     All lock types (POSIX, OFD, flock) are removed via locks_remove_file()
+ *     during __fput() when this is the last reference.
+ *   condition: File has FMODE_OPENED and !(FMODE_PATH)
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_DESTROY
+ *   target: File leases
+ *   desc: Any file leases held on the file are removed during locks_remove_file()
+ *     when this is the last reference to the open file description.
+ *   condition: File had leases and this is the last close
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: dnotify registrations
+ *   desc: Directory notification (dnotify) registrations associated with this
+ *     file are cleaned up via dnotify_flush(). This only applies to directories.
+ *   condition: File is a directory with dnotify registrations
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: epoll interest lists
+ *   desc: If the file was being monitored by epoll instances, it is removed
+ *     from those interest lists via eventpoll_release().
+ *   condition: File was added to epoll instances
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: Buffered data
+ *   desc: The file's flush callback is invoked if defined (e.g., NFS calls
+ *     nfs_file_flush). This attempts to write any buffered data to storage
+ *     and may return errors (EIO, ENOSPC, EDQUOT) if the flush fails. The
+ *     success of this flush is NOT guaranteed even with a 0 return; use
+ *     fsync() before close() to ensure data persistence.
+ *   condition: File has a flush callback and was opened for writing
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FREE_MEMORY
+ *   target: struct file and related structures
+ *   desc: When this is the last reference to the file, __fput() is called
+ *     synchronously (fput_close_sync), which frees the file structure, releases
+ *     the dentry and mount references, and invokes the file's release callback.
+ *   condition: This is the last reference to the file
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: Unlinked file deletion
+ *   desc: If the file was previously unlinked (deleted) but kept open, closing
+ *     the last reference causes the actual file data to be removed from the
+ *     filesystem and the inode to be freed.
+ *   condition: File was unlinked and this is the last reference
+ *   reversible: no
+ *
+ * state-trans: file_descriptor
+ *   from: open
+ *   to: closed/free
+ *   condition: Valid fd passed to close
+ *   desc: The file descriptor transitions from open (usable) to closed (invalid).
+ *     The fd number becomes available for reuse. This transition occurs early
+ *     in close() processing, before any operations that might fail.
+ *
+ * state-trans: file_reference_count
+ *   from: n
+ *   to: n-1 (or freed if n was 1)
+ *   condition: Always on successful fd lookup
+ *   desc: The file's reference count is decremented. If this was the last
+ *     reference, the file is fully cleaned up and freed.
+ *
+ * constraint: File Descriptor Reuse Race
+ *   desc: Because the fd is freed early in close() processing, another thread
+ *     may receive the same fd number from a concurrent open() before close()
+ *     returns. Applications must not retry close() after an error return, as
+ *     this could close an unrelated file opened by another thread.
+ *   expr: After close(fd) returns (even with error), fd is invalid
+ *
+ * examples: close(fd);  // Basic usage - ignore errors (common but not ideal)
+ *   if (close(fd) == -1) perror("close");  // Log errors for debugging
+ *   fsync(fd); close(fd);  // Ensure data persistence before closing
+ *
+ * notes: This syscall has subtle non-POSIX semantics: the fd is ALWAYS closed
+ *   regardless of the return value. POSIX specifies that on EINTR, the state
+ *   of the fd is unspecified, but Linux always closes it. HP-UX requires
+ *   retrying close() on EINTR, but doing so on Linux may close an unrelated
+ *   fd that was reassigned by another thread. For portable code, the safest
+ *   approach is to check for errors but never retry close().
+ *
+ *   Error codes from the flush callback (EIO, ENOSPC, EDQUOT) indicate that
+ *   previously written data may have been lost. These errors are particularly
+ *   common on NFS where write errors are often deferred to close time.
+ *
+ *   The driver's release() callback errors are explicitly ignored by the
+ *   kernel, so device driver cleanup errors are not propagated to userspace.
+ *
+ *   Calling close() on a file descriptor while another thread is using it
+ *   (e.g., in a blocking read() or write()) has implementation-defined
+ *   behavior. On Linux, the blocked operation continues on the underlying
+ *   file and may complete even after close() returns.
+ *
+ * since-version: 1.0
  */
 SYSCALL_DEFINE1(close, unsigned int, fd)
 {
-- 
2.51.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ