[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fdd3bd7d-526d-d441-00f1-e8321441174e@redhat.com>
Date:   Thu, 16 Feb 2023 12:01:43 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Alistair Popple <apopple@...dia.com>, linux-mm@...ck.org,
        cgroups@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org, jgg@...dia.com, jhubbard@...dia.com,
        tjmercier@...gle.com, hannes@...xchg.org, surenb@...gle.com,
        mkoutny@...e.com, daniel@...ll.ch,
        "Daniel P . Berrange" <berrange@...hat.com>,
        Alex Williamson <alex.williamson@...hat.com>
Subject: Re: [PATCH 00/19] mm: Introduce a cgroup to limit the amount of
 locked and pinned memory
On 06.02.23 08:47, Alistair Popple wrote:
> Having large amounts of unmovable or unreclaimable memory in a system
> can lead to system instability due to increasing the likelihood of
> encountering out-of-memory conditions. Therefore it is desirable to
> limit the amount of memory users can lock or pin.
> 
>  From userspace such limits can be enforced by setting
> RLIMIT_MEMLOCK. However there is no standard method that drivers and
> other in-kernel users can use to check and enforce this limit.
> 
> This has lead to a large number of inconsistencies in how limits are
> enforced. For example some drivers will use mm->locked_mm while others
> will use mm->pinned_mm or user->locked_mm. It is therefore possible to
> have up to three times RLIMIT_MEMLOCKED pinned.
> 
> Having pinned memory limited per-task also makes it easy for users to
> exceed the limit. For example drivers that pin memory with
> pin_user_pages() it tends to remain pinned after fork. To deal with
> this and other issues this series introduces a cgroup for tracking and
> limiting the number of pages pinned or locked by tasks in the group.
> 
> However the existing behaviour with regards to the rlimit needs to be
> maintained. Therefore the lesser of the two limits is
> enforced. Furthermore having CAP_IPC_LOCK usually bypasses the rlimit,
> but this bypass is not allowed for the cgroup.
> 
> The first part of this series converts existing drivers which
> open-code the use of locked_mm/pinned_mm over to a common interface
> which manages the refcounts of the associated task/mm/user
> structs. This ensures accounting of pages is consistent and makes it
> easier to add charging of the cgroup.
> 
> The second part of the series adds the cgroup controller and converts
> core mm code such as mlock over to charging the cgroup before finally
> introducing some selftests.
> 
> Rather than adding onto an exisiting cgroup controller such as memcg
> we introduce a new controller. This is primarily because we wish to
> limit the total number of pages tasks within a cgroup may pin/lock.
> 
> As I don't have access to systems with all the various devices I
> haven't been able to test all driver changes. Any help there would be
> appreciated.
> 
> Note that this series is based on v6.2-rc5 and
> https://lore.kernel.org/linux-rdma/20230201115540.360353-1-bmt@zurich.ibm.com/
> which makes updating the siw driver easier (thanks Bernard).
> 
> Changes from initial RFC:
> 
>   - Fixes to some driver error handling.
> 
>   - Pages charged with vm_account will always increment mm->pinned_vm
>     and enforce the limit against user->locked_vm or mm->pinned_vm
>     depending on initialisation flags.
> 
>   - Moved vm_account prototypes and struct definitions into a separate header.
> 
>   - Minor updates to commit messages and kernel docs (thanks to Jason,
>     Christoph, Yosry and T.J.).
> 
> Outstanding issues:
> 
>   - David H pointed out that the vm_account naming is potentially
>     confusing and I agree. However I have yet to come up with something
>     better so will rename this in a subsequent version of this series
>     (suggestions welcome).
vm_lockaccount ? vm_pinaccount ?
Less confusing than reusing VM_ACCOUNT which translates to "commit 
accounting".
Might also make sense to rename VM_ACCOUNT to VM_COMMIT or sth like that.
-- 
Thanks,
David / dhildenb
Powered by blists - more mailing lists
 
