lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v7obpoxn.fsf@trenco.lwn.net>
Date: Tue, 01 Jul 2025 15:43:32 -0600
From: Jonathan Corbet <corbet@....net>
To: Sasha Levin <sashal@...nel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>,
 linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-api@...r.kernel.org, workflows@...r.kernel.org, tools@...nel.org,
 Kate Stewart <kstewart@...uxfoundation.org>, Gabriele Paoloni
 <gpaoloni@...hat.com>, Chuck Wolber <chuckwolber@...il.com>
Subject: Re: [RFC v2 01/22] kernel/api: introduce kernel API specification
 framework

Sasha Levin <sashal@...nel.org> writes:

> So I have a proof of concept which during the build process creates
> .apispec.h which are generated from kerneldoc and contain macros
> identical to the ones in my RFC.
>
> Here's an example of sys_mlock() spec:

So I'm getting ahead of the game, but I have to ask some questions...

> /**
>   * sys_mlock - Lock pages in memory
>   * @start: Starting address of memory range to lock
>   * @len: Length of memory range to lock in bytes
>   *
>   * Locks pages in the specified address range into RAM, preventing them from
>   * being paged to swap. Requires CAP_IPC_LOCK capability or RLIMIT_MEMLOCK
>   * resource limit.
>   *
>   * long-desc: Locks pages in the specified address range into RAM, preventing
>   *   them from being paged to swap. Requires CAP_IPC_LOCK capability
>   *   or RLIMIT_MEMLOCK resource limit.

Why duplicate the long description?

>   * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
>   * param-type: start, KAPI_TYPE_UINT

This is something I wondered before; rather than a bunch of lengthy
KAPI_* symbols, why not just say __u64 (or some other familiar type)
here?

>   * param-flags: start, KAPI_PARAM_IN
>   * param-constraint-type: start, KAPI_CONSTRAINT_NONE
>   * param-constraint: start, Rounded down to page boundary
>   * param-type: len, KAPI_TYPE_UINT
>   * param-flags: len, KAPI_PARAM_IN
>   * param-constraint-type: len, KAPI_CONSTRAINT_RANGE
>   * param-range: len, 0, LONG_MAX
>   * param-constraint: len, Rounded up to page boundary
>   * return-type: KAPI_TYPE_INT
>   * return-check-type: KAPI_RETURN_ERROR_CHECK
>   * return-success: 0
>   * error-code: -ENOMEM, ENOMEM, Address range issue,
>   *   Some of the specified range is not mapped, has unmapped gaps,
>   *   or the lock would cause the number of mapped regions to exceed the limit.
>   * error-code: -EPERM, EPERM, Insufficient privileges,
>   *   The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.
>   * error-code: -EINVAL, EINVAL, Address overflow,
>   *   The result of the addition start+len was less than start (arithmetic overflow).
>   * error-code: -EAGAIN, EAGAIN, Some or all memory could not be locked,
>   *   Some or all of the specified address range could not be locked.
>   * error-code: -EINTR, EINTR, Interrupted by signal,
>   *   The operation was interrupted by a fatal signal before completion.
>   * error-code: -EFAULT, EFAULT, Bad address,
>   *   The specified address range contains invalid addresses that cannot be accessed.
>   * since-version: 2.0
>   * lock: mmap_lock, KAPI_LOCK_RWLOCK
>   * lock-acquired: true
>   * lock-released: true
>   * lock-desc: Process memory map write lock
>   * signal: FATAL
>   * signal-direction: KAPI_SIGNAL_RECEIVE
>   * signal-action: KAPI_SIGNAL_ACTION_RETURN
>   * signal-condition: Fatal signal pending
>   * signal-desc: Fatal signals (SIGKILL) can interrupt the operation at two points:
>   *   when acquiring mmap_write_lock_killable() and during page population
>   *   in __mm_populate(). Returns -EINTR. Non-fatal signals do NOT interrupt
>   *   mlock - the operation continues even if SIGINT/SIGTERM are received.
>   * signal-error: -EINTR
>   * signal-timing: KAPI_SIGNAL_TIME_DURING
>   * signal-priority: 0
>   * signal-interruptible: yes
>   * signal-state-req: KAPI_SIGNAL_STATE_RUNNING
>   * examples: mlock(addr, 4096);  // Lock one page
>   *   mlock(addr, len);   // Lock range of pages
>   * notes: Memory locks do not stack - multiple calls on the same range can be
>   *   undone by a single munlock. Locks are not inherited by child processes.
>   *   Pages are locked on whole page boundaries. Commonly used by real-time
>   *   applications to prevent page faults during time-critical operations.
>   *   Also used for security to prevent sensitive data (e.g., cryptographic keys)
>   *   from being written to swap. Note: locked pages may still be saved to
>   *   swap during system suspend/hibernate.
>   *
>   *   Tagged addresses are automatically handled via untagged_addr(). The operation
>   *   occurs in two phases: first VMAs are marked with VM_LOCKED, then pages are
>   *   populated into memory. When checking RLIMIT_MEMLOCK, the kernel optimizes
>   *   by recounting locked memory to avoid double-counting overlapping regions.
>   * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, process memory, Locks pages into physical memory, preventing swapping, reversible=yes

I hope the really long lines starting here aren't the intended way to go...:)

>   * side-effect: KAPI_EFFECT_MODIFY_STATE, mm->locked_vm, Increases process locked memory counter, reversible=yes
>   * side-effect: KAPI_EFFECT_ALLOC_MEMORY, physical pages, May allocate and populate page table entries, condition=Pages not already present, reversible=yes
>   * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, page faults, Triggers page faults to bring pages into memory, condition=Pages not already resident
>   * side-effect: KAPI_EFFECT_MODIFY_STATE, VMA splitting, May split existing VMAs at lock boundaries, condition=Lock range partially overlaps existing VMA
>   * state-trans: memory pages, swappable, locked in RAM, Pages become non-swappable and pinned in physical memory
>   * state-trans: VMA flags, unlocked, VM_LOCKED set, Virtual memory area marked as locked
>   * capability: CAP_IPC_LOCK, KAPI_CAP_BYPASS_CHECK, CAP_IPC_LOCK capability
>   * capability-allows: Lock unlimited amount of memory (no RLIMIT_MEMLOCK enforcement)
>   * capability-without: Must respect RLIMIT_MEMLOCK resource limit
>   * capability-condition: Checked when RLIMIT_MEMLOCK is 0 or locking would exceed limit
>   * capability-priority: 0
>   * constraint: RLIMIT_MEMLOCK Resource Limit, The RLIMIT_MEMLOCK soft resource limit specifies the maximum bytes of memory that may be locked into RAM. Unprivileged processes are restricted to this limit. CAP_IPC_LOCK capability allows bypassing this limit entirely. The limit is enforced per-process, not per-user.
>   * constraint-expr: RLIMIT_MEMLOCK Resource Limit, locked_memory + request_size <= RLIMIT_MEMLOCK || CAP_IPC_LOCK
>   * constraint: Memory Pressure and OOM, Locking large amounts of memory can cause system-wide memory pressure and potentially trigger the OOM killer. The kernel does not prevent locking memory that would destabilize the system.
>   * constraint: Special Memory Areas, Some memory types cannot be locked or are silently skipped: VM_IO/VM_PFNMAP areas (device mappings) are skipped; Hugetlb pages are inherently pinned and skipped; DAX mappings are always present in memory and skipped; Secret memory (memfd_secret) mappings are skipped; VM_DROPPABLE memory cannot be locked and is skipped; Gate VMA (kernel entry point) is skipped; VM_LOCKED areas are already locked. These special areas are silently excluded without error.
>   *
>   * Context: Process context. May sleep. Takes mmap_lock for write.
>   *
>   * Return: 0 on success, negative error code on failure

Both of these, of course, are much less informative versions of the data
you have put up above; it would be nice to unify them somehow.

Thanks,

jon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ