[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpEt9SSrELZzfmcqJ7JL_nEzWGz-YE9GRUZTjU5unqQjQg@mail.gmail.com>
Date: Tue, 10 May 2022 09:31:50 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Matthew Wilcox <willy@...radead.org>,
Johannes Weiner <hannes@...xchg.org>,
Roman Gushchin <guro@...com>, Minchan Kim <minchan@...nel.org>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Andrea Arcangeli <aarcange@...hat.com>,
Christian Brauner <brauner@...nel.org>,
Christoph Hellwig <hch@...radead.org>,
Oleg Nesterov <oleg@...hat.com>,
David Hildenbrand <david@...hat.com>,
Jann Horn <jannh@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
Peter Xu <peterx@...hat.com>,
John Hubbard <jhubbard@...dia.com>, shuah@...nel.org,
LKML <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>, linux-kselftest@...r.kernel.org,
kernel-team <kernel-team@...roid.com>
Subject: Re: [PATCH 2/3] mm: drop oom code from exit_mmap
On Tue, May 10, 2022 at 6:06 AM Michal Hocko <mhocko@...e.com> wrote:
>
> On Mon 09-05-22 20:00:13, Suren Baghdasaryan wrote:
> > With the oom-killer being able to operate on locked pages, exit_mmap
> > does not need to ensure that oom_reap_task_mm is done before it can
> > proceed. Instead it can rely on mmap_lock write lock to prevent
> > oom-killer from operating on the vma tree while it's freeing page
> > tables. exit_mmap can hold mmap_lock read lock when unmapping vmas
> > and then take mmap_lock write lock before freeing page tables.
>
> The changelog is rather light on nasty details which might be good but
> for the sake of our future us let's be more verbose so that we do not
> have to reinvent the prior history each time we are looking into this
> code. I would go with something like this instead:
> "
> The primary reason to invoke the oom reaper from the exit_mmap path used
> to be a prevention of an excessive oom killing if the oom victim exit
> races with the oom reaper (see 212925802454 ("mm: oom: let oom_reap_task
> and exit_mmap run concurrently") for more details. The invocation has
> moved around since then because of the interaction with the munlock
> logic but the underlying reason has remained the same (see 27ae357fa82b
> ("mm, oom: fix concurrent munlock and oom reaper unmap, v3").
>
> Munlock code is no longer a problem since a213e5cf71cb ("mm/munlock:
> delete munlock_vma_pages_all(), allow oomreap") and there shouldn't be
> any blocking operation before the memory is unmapped by exit_mmap so
> the oom reaper invocation can be dropped. The unmapping part can be done
> with the non-exclusive mmap_sem and the exclusive one is only required
> when page tables are freed.
>
> Remove the oom_reaper from exit_mmap which will make the code easier to
> read. This is really unlikely to make any observable difference although
> some microbenchmarks could benefit from one less branch that needs to be
> evaluated even though it almost never is true.
> "
Looks great! Thanks for collecting all the history. Will update the description.
>
> One minor comment below. Other than that \o/ this is finally going away.
> I strongly suspect that the history of this code is a nice example about how
> over optimizing code can cause more harm than good.
>
> Acked-by: Michal Hocko <mhocko@...e.com>
Thanks.
>
> Thanks!
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
> > ---
> > include/linux/oom.h | 2 --
> > mm/mmap.c | 25 ++++++-------------------
> > mm/oom_kill.c | 2 +-
> > 3 files changed, 7 insertions(+), 22 deletions(-)
> >
> [...]
> > @@ -3138,6 +3121,10 @@ void exit_mmap(struct mm_struct *mm)
> > /* update_hiwater_rss(mm) here? but nobody should be looking */
> > /* Use -1 here to ensure all VMAs in the mm are unmapped */
> > unmap_vmas(&tlb, vma, 0, -1);
> > + mmap_read_unlock(mm);
> > + /* Set MMF_OOM_SKIP to disregard this mm from further consideration.*/
> > + set_bit(MMF_OOM_SKIP, &mm->flags);
>
> I think that it would be slightly more readable to add an empty line
> above and below of this. Also the comment would be more helpful if it
> explaind what the further consideration actually means. I would go with
>
> /*
> * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
> * because the memory has been already freed. Do not bother
> * checking mm_is_oom_victim because setting a bit
> * unconditionally is just cheaper.
> */
>
Ack.
> > + mmap_write_lock(mm);
> > free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
> > tlb_finish_mmu(&tlb);
>
> --
> Michal Hocko
> SUSE Labs
Powered by blists - more mailing lists