[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YbHFVxd34P0CvfpG@dhcp22.suse.cz>
Date: Thu, 9 Dec 2021 09:59:03 +0100
From: Michal Hocko <mhocko@...e.com>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: akpm@...ux-foundation.org, rientjes@...gle.com,
willy@...radead.org, hannes@...xchg.org, guro@...com,
riel@...riel.com, minchan@...nel.org, kirill@...temov.name,
aarcange@...hat.com, christian@...uner.io, hch@...radead.org,
oleg@...hat.com, david@...hat.com, jannh@...gle.com,
shakeelb@...gle.com, luto@...nel.org, christian.brauner@...ntu.com,
fweimer@...hat.com, jengelh@...i.de, timmurray@...gle.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
kernel-team@...roid.com
Subject: Re: [PATCH v4 3/3] mm/oom_kill: allow process_mrelease to run under
mmap_lock protection
On Wed 08-12-21 13:22:11, Suren Baghdasaryan wrote:
> With exit_mmap holding mmap_write_lock during free_pgtables call,
> process_mrelease does not need to elevate mm->mm_users in order to
> prevent exit_mmap from destrying pagetables while __oom_reap_task_mm
> is walking the VMA tree. The change prevents process_mrelease from
> calling the last mmput, which can lead to waiting for IO completion
> in exit_aio.
>
> Fixes: 337546e83fc7 ("mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap")
I am not sure I have brought this up already but I do not think Fixes
tag is a good fit. 337546e83fc7 is a correct way to handle the race. It
is just slightly less optimal than this fix.
> Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
Acked-by: Michal Hocko <mhocko@...e.com>
Thanks!
> ---
> mm/oom_kill.c | 27 +++++++++++++++------------
> 1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 1ddabefcfb5a..67780386f478 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -1169,15 +1169,15 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
> goto put_task;
> }
>
> - if (mmget_not_zero(p->mm)) {
> - mm = p->mm;
> - if (task_will_free_mem(p))
> - reap = true;
> - else {
> - /* Error only if the work has not been done already */
> - if (!test_bit(MMF_OOM_SKIP, &mm->flags))
> - ret = -EINVAL;
> - }
> + mm = p->mm;
> + mmgrab(mm);
> +
> + if (task_will_free_mem(p))
> + reap = true;
> + else {
> + /* Error only if the work has not been done already */
> + if (!test_bit(MMF_OOM_SKIP, &mm->flags))
> + ret = -EINVAL;
> }
> task_unlock(p);
>
> @@ -1188,13 +1188,16 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
> ret = -EINTR;
> goto drop_mm;
> }
> - if (!__oom_reap_task_mm(mm))
> + /*
> + * Check MMF_OOM_SKIP again under mmap_read_lock protection to ensure
> + * possible change in exit_mmap is seen
> + */
> + if (!test_bit(MMF_OOM_SKIP, &mm->flags) && !__oom_reap_task_mm(mm))
> ret = -EAGAIN;
> mmap_read_unlock(mm);
>
> drop_mm:
> - if (mm)
> - mmput(mm);
> + mmdrop(mm);
> put_task:
> put_task_struct(task);
> return ret;
> --
> 2.34.1.400.ga245620fadb-goog
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists