[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230324135550.q2xfuj4hjs7odbu5@box>
Date: Fri, 24 Mar 2023 16:55:50 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: Michal Hocko <mhocko@...e.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Naresh Kamboju <naresh.kamboju@...aro.org>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: WARN_ON in move_normal_pmd
On Fri, Mar 24, 2023 at 09:43:10AM -0400, Joel Fernandes wrote:
> On Fri, Mar 24, 2023 at 9:05 AM Kirill A. Shutemov <kirill@...temov.name> wrote:
> >
> > On Fri, Mar 24, 2023 at 12:15:24PM +0100, Michal Hocko wrote:
> > > Hi,
> > > our QA is regularly hitting
> > > [ 544.198822][T20518] WARNING: CPU: 1 PID: 20518 at ../mm/mremap.c:255 move_pgt_entry+0x4c6/0x510
> > > triggered by thp01 LTP test. This has been brought up in the past and
> > > resulted in f81fdd0c4ab7 ("mm: document warning in move_normal_pmd() and
> > > make it warn only once"). While it is good that the underlying problem
> > > is understood, it doesn't seem there is enough interest to fix it
> > > properly. Which is fair but I am wondering whether the WARN_ON gives
> > > us anything here.
> > >
> > > Our QA process collects all unexpected side effects of tests and a WARN*
> > > in the log is certainly one of those. This trigger bugs which are mostly
> > > ignored as there is no upstream fix for them. This alone is nothing
> > > really critical but there are workloads which do run with panic on warn
> > > configured and this issue would put the system down which is unnecessary
> > > IMHO. Would it be sufficient to replace the WARN_ON_ONCE by
> > > pr_warn_once?
> >
> > What about relaxing the check to exclude temporary stack from the WARN
> > condition:
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index 411a85682b58..eb0778b9d645 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -247,15 +247,12 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
> > * of any 4kB pages, but still there) PMD in the page table
> > * tree.
> > *
> > - * Warn on it once - because we really should try to figure
> > - * out how to do this better - but then say "I won't move
> > - * this pmd".
> > - *
> > - * One alternative might be to just unmap the target pmd at
> > - * this point, and verify that it really is empty. We'll see.
> > + * Warn on it once unless it is initial (temporary) stack.
> > */
> > - if (WARN_ON_ONCE(!pmd_none(*new_pmd)))
> > + if (!pmd_none(*new_pmd)) {
> > + WARN_ON_ONCE(!vma_is_temporary_stack(vma));
> > return false;
> > + }
>
> Wouldn't it be better to instead fix it from the caller side? Like
> making it non-overlapping.
>
> Reading some old threads, I had tried to fix it [1] along these lines
> but Linus was rightfully concerned about that fix [2]. Maybe we can
> revisit and fix it properly this time.
>
> Personally I feel the safest thing to do is to not do a
> non-overlapping mremap and get rid of the warning. Or is there a
> better way like unmapping the target from the caller side first,
> before the move?
Making it non-overlapping limits randomization effectiveness. We need to
quantify it at least.
--
Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists