[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADnq5_Ng7oe_NMSb6GdL=_T_zw22Gk0B6ePDXRiU7Ljind6Gww@mail.gmail.com>
Date: Tue, 31 May 2022 18:00:51 -0400
From: Alex Deucher <alexdeucher@...il.com>
To: Christian König <ckoenig.leichtzumerken@...il.com>,
Maling list - DRI developers
<dri-devel@...ts.freedesktop.org>
Cc: linux-media <linux-media@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
amd-gfx list <amd-gfx@...ts.freedesktop.org>,
nouveau <nouveau@...ts.freedesktop.org>,
linux-tegra@...r.kernel.org,
Linux-Fsdevel <linux-fsdevel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
Andrey Grodzovsky <andrey.grodzovsky@....com>,
Hugh Dickens <hughd@...gle.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Daniel Vetter <daniel@...ll.ch>,
"Deucher, Alexander" <alexander.deucher@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Christian Koenig <christian.koenig@....com>
Subject: Re: Per file OOM badness
+ dri-devel
On Tue, May 31, 2022 at 6:00 AM Christian König
<ckoenig.leichtzumerken@...il.com> wrote:
>
> Hello everyone,
>
> To summarize the issue I'm trying to address here: Processes can allocate
> resources through a file descriptor without being held responsible for it.
>
> Especially for the DRM graphics driver subsystem this is rather
> problematic. Modern games tend to allocate huge amounts of system memory
> through the DRM drivers to make it accessible to GPU rendering.
>
> But even outside of the DRM subsystem this problem exists and it is
> trivial to exploit. See the following simple example of
> using memfd_create():
>
> fd = memfd_create("test", 0);
> while (1)
> write(fd, page, 4096);
>
> Compile this and you can bring down any standard desktop system within
> seconds.
>
> The background is that the OOM killer will kill every processes in the
> system, but just not the one which holds the only reference to the memory
> allocated by the memfd.
>
> Those problems where brought up on the mailing list multiple times now
> [1][2][3], but without any final conclusion how to address them. Since
> file descriptors are considered shared the process can not directly held
> accountable for allocations made through them. Additional to that file
> descriptors can also easily move between processes as well.
>
> So what this patch set does is to instead of trying to account the
> allocated memory to a specific process it adds a callback to struct
> file_operations which the OOM killer can use to query the specific OOM
> badness of this file reference. This badness is then divided by the
> file_count, so that every process using a shmem file, DMA-buf or DRM
> driver will get it's equal amount of OOM badness.
>
> Callbacks are then implemented for the two core users (memfd and DMA-buf)
> as well as 72 DRM based graphics drivers.
>
> The result is that the OOM killer can now much better judge if a process
> is worth killing to free up memory. Resulting a quite a bit better system
> stability in OOM situations, especially while running games.
>
> The only other possibility I can see would be to change the accounting of
> resources whenever references to the file structure change, but this would
> mean quite some additional overhead for a rather common operation.
>
> Additionally I think trying to limit device driver allocations using
> cgroups is orthogonal to this effort. While cgroups is very useful, it
> works on per process limits and tries to enforce a collaborative model on
> memory management while the OOM killer enforces a competitive model.
>
> Please comment and/or review, we have that problem flying around for years
> now and are not at a point where we finally need to find a solution for
> this.
>
> Regards,
> Christian.
>
> [1] https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> [2] https://lkml.org/lkml/2018/1/18/543
> [3] https://lkml.org/lkml/2021/2/4/799
>
>
Powered by blists - more mailing lists