[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250711114248.2288591-4-sashal@kernel.org>
Date: Fri, 11 Jul 2025 07:42:47 -0400
From: Sasha Levin <sashal@...nel.org>
To: linux-kernel@...r.kernel.org
Cc: linux-doc@...r.kernel.org,
linux-api@...r.kernel.org,
tools@...nel.org,
Sasha Levin <sashal@...nel.org>
Subject: [RFC v3 3/4] mm/mlock: add API specification for mlock
Add kernel API specification for the mlock() system call.
Signed-off-by: Sasha Levin <sashal@...nel.org>
---
mm/mlock.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 85 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index 3cb72b579ffd3..06e260da5aba6 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -658,6 +658,91 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
return 0;
}
+/**
+ * sys_mlock - Lock pages in memory
+ * @start: Starting address of memory range to lock
+ * @len: Length of memory range to lock in bytes
+ *
+ * long-desc: Locks pages in the specified address range into RAM, preventing
+ * them from being paged to swap. Requires CAP_IPC_LOCK capability
+ * or RLIMIT_MEMLOCK resource limit.
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ * param-type: start, KAPI_TYPE_UINT
+ * param-flags: start, KAPI_PARAM_IN
+ * param-constraint-type: start, KAPI_CONSTRAINT_NONE
+ * param-constraint: start, Rounded down to page boundary
+ * param-type: len, KAPI_TYPE_UINT
+ * param-flags: len, KAPI_PARAM_IN
+ * param-constraint-type: len, KAPI_CONSTRAINT_RANGE
+ * param-range: len, 0, LONG_MAX
+ * param-constraint: len, Rounded up to page boundary
+ * return-type: KAPI_TYPE_INT
+ * return-check-type: KAPI_RETURN_ERROR_CHECK
+ * return-success: 0
+ * error-code: -ENOMEM, ENOMEM, Address range issue,
+ * Some of the specified range is not mapped, has unmapped gaps,
+ * or the lock would cause the number of mapped regions to exceed the limit.
+ * error-code: -EPERM, EPERM, Insufficient privileges,
+ * The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.
+ * error-code: -EINVAL, EINVAL, Address overflow,
+ * The result of the addition start+len was less than start (arithmetic overflow).
+ * error-code: -EAGAIN, EAGAIN, Some or all memory could not be locked,
+ * Some or all of the specified address range could not be locked.
+ * error-code: -EINTR, EINTR, Interrupted by signal,
+ * The operation was interrupted by a fatal signal before completion.
+ * error-code: -EFAULT, EFAULT, Bad address,
+ * The specified address range contains invalid addresses that cannot be accessed.
+ * since-version: 2.0
+ * lock: mmap_lock, KAPI_LOCK_RWLOCK
+ * lock-acquired: true
+ * lock-released: true
+ * lock-desc: Process memory map write lock
+ * signal: FATAL
+ * signal-direction: KAPI_SIGNAL_RECEIVE
+ * signal-action: KAPI_SIGNAL_ACTION_RETURN
+ * signal-condition: Fatal signal pending
+ * signal-desc: Fatal signals (SIGKILL) can interrupt the operation at two points:
+ * when acquiring mmap_write_lock_killable() and during page population
+ * in __mm_populate(). Returns -EINTR. Non-fatal signals do NOT interrupt
+ * mlock - the operation continues even if SIGINT/SIGTERM are received.
+ * signal-error: -EINTR
+ * signal-timing: KAPI_SIGNAL_TIME_DURING
+ * signal-priority: 0
+ * signal-interruptible: yes
+ * signal-state-req: KAPI_SIGNAL_STATE_RUNNING
+ * examples: mlock(addr, 4096); // Lock one page
+ * mlock(addr, len); // Lock range of pages
+ * notes: Memory locks do not stack - multiple calls on the same range can be
+ * undone by a single munlock. Locks are not inherited by child processes.
+ * Pages are locked on whole page boundaries. Commonly used by real-time
+ * applications to prevent page faults during time-critical operations.
+ * Also used for security to prevent sensitive data (e.g., cryptographic keys)
+ * from being written to swap. Note: locked pages may still be saved to
+ * swap during system suspend/hibernate.
+ *
+ * Tagged addresses are automatically handled via untagged_addr(). The operation
+ * occurs in two phases: first VMAs are marked with VM_LOCKED, then pages are
+ * populated into memory. When checking RLIMIT_MEMLOCK, the kernel optimizes
+ * by recounting locked memory to avoid double-counting overlapping regions.
+ * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, process memory, Locks pages into physical memory, preventing swapping, reversible=yes
+ * side-effect: KAPI_EFFECT_MODIFY_STATE, mm->locked_vm, Increases process locked memory counter, reversible=yes
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY, physical pages, May allocate and populate page table entries, condition=Pages not already present, reversible=yes
+ * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, page faults, Triggers page faults to bring pages into memory, condition=Pages not already resident
+ * side-effect: KAPI_EFFECT_MODIFY_STATE, VMA splitting, May split existing VMAs at lock boundaries, condition=Lock range partially overlaps existing VMA
+ * state-trans: memory pages, swappable, locked in RAM, Pages become non-swappable and pinned in physical memory
+ * state-trans: VMA flags, unlocked, VM_LOCKED set, Virtual memory area marked as locked
+ * capability: CAP_IPC_LOCK, KAPI_CAP_BYPASS_CHECK, CAP_IPC_LOCK capability
+ * capability-allows: Lock unlimited amount of memory (no RLIMIT_MEMLOCK enforcement)
+ * capability-without: Must respect RLIMIT_MEMLOCK resource limit
+ * capability-condition: Checked when RLIMIT_MEMLOCK is 0 or locking would exceed limit
+ * capability-priority: 0
+ * constraint: RLIMIT_MEMLOCK Resource Limit, The RLIMIT_MEMLOCK soft resource limit specifies the maximum bytes of memory that may be locked into RAM. Unprivileged processes are restricted to this limit. CAP_IPC_LOCK capability allows bypassing this limit entirely. The limit is enforced per-process, not per-user.
+ * constraint-expr: RLIMIT_MEMLOCK Resource Limit, locked_memory + request_size <= RLIMIT_MEMLOCK || CAP_IPC_LOCK
+ * constraint: Memory Pressure and OOM, Locking large amounts of memory can cause system-wide memory pressure and potentially trigger the OOM killer. The kernel does not prevent locking memory that would destabilize the system.
+ * constraint: Special Memory Areas, Some memory types cannot be locked or are silently skipped: VM_IO/VM_PFNMAP areas (device mappings) are skipped; Hugetlb pages are inherently pinned and skipped; DAX mappings are always present in memory and skipped; Secret memory (memfd_secret) mappings are skipped; VM_DROPPABLE memory cannot be locked and is skipped; Gate VMA (kernel entry point) is skipped; VM_LOCKED areas are already locked. These special areas are silently excluded without error.
+ *
+ * Context: Process context. May sleep. Takes mmap_lock for write.
+ */
SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
{
return do_mlock(start, len, VM_LOCKED);
--
2.39.5
Powered by blists - more mailing lists