[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAPEB=f0QxT2S9q2znpML35z+YyMVJ7p2kmr2sON+m7eJ=XbMpw@mail.gmail.com>
Date: Wed, 28 May 2025 11:11:55 +0300
From: Andrey Khalturin <andrey.khalturin@...il.com>
To: linux-kernel@...r.kernel.org
Subject: Subject: BUG: Process termination deadlock in exit_aio() vs device release
Problem Description
We have encountered a fundamental deadlock issue in the Linux kernel's
process termination sequence when using libaio with custom kernel
modules. The problem manifests when the OOM killer terminates a
process that has pending asynchronous I/O operations managed by a
custom kernel module.
Root Cause Analysis
The issue stems from the order of operations in do_exit():
do_exit()
-> exit_aio() // Waits for completion of all AIO operations
-> wait_for_completion(&ctx->comp) // BLOCKS HERE
-> exit_files() // This code is never reached
-> close_files()
-> filp_close()
-> fops->release() // Custom module's release() never called
This creates an unresolvable circular dependency:
AIO subsystem waits for all pending operations to complete before proceeding
Custom kernel module cannot complete operations because it's
unaware the process is terminating
Module notification (file_operations->release()) only happens
after AIO cleanup
Process cannot exit until AIO operations complete
Scenario Details
Service process uses libaio to communicate with a custom kernel module
OOM killer sends SIGKILL (uncatchable) to the service process
Process enters do_exit() but hangs in exit_aio() waiting for
operation completion
Kernel module never receives device close notification via
release() callback
AIO operations remain pending indefinitely, preventing process termination
Current Workarounds
Kernel module developers currently must implement workarounds such as:
Process state monitoring - checking current->flags & PF_EXITING
Timeout mechanisms - forcibly completing operations after timeouts
Task work callbacks - using task_work_add() to detect process termination
Proposed Solution
The process termination sequence should be reordered to notify drivers
before waiting for AIO completion:
do_exit()
-> notify_drivers_of_exit() // Signal all file descriptors about
termination
-> exit_aio() // Then wait for AIO completion
-> exit_files() // Finally close file descriptors
This would allow:
Kernel modules to receive early notification of process termination
Proper cleanup and completion of pending AIO operations
Graceful process termination even under memory pressure
Impact
This architectural issue affects:
Custom kernel modules using asynchronous I/O
High-load systems where OOM killer activation is common
Any driver that legitimately expects release() notification for cleanup
Environment
Kernel versions: Affects multiple kernel versions (observed on 5.x
and 6.x series)
Architecture: x86_64
Subsystems affected: AIO, VFS, process management
Request
We request consideration of this architectural issue for future kernel
development. The current design creates an unavoidable deadlock
scenario for legitimate kernel module implementations that rely on
proper file descriptor lifecycle notifications.
The fundamental problem is that asynchronous operations should not
block process termination when the process cannot provide the
necessary completion signals due to the termination sequence itself.
Powered by blists - more mailing lists