lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YT1qr9F1QaABthUx6qxWPYYom-oW7XMVExzrHLWdhUGKg@mail.gmail.com>
Date:   Fri, 19 May 2023 23:17:37 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        linux-mm@...ck.org, Shuah Khan <shuah@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...e.com>,
        Lorenzo Stoakes <lstoakes@...il.com>,
        Kirill A Shutemov <kirill@...temov.name>,
        "Liam R. Howlett" <liam.howlett@...cle.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>
Subject: Re: [PATCH v2 1/4] mm/mremap: Optimize the start addresses in move_page_tables()

Hi Linus,

On Fri, May 19, 2023 at 10:34 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Fri, May 19, 2023 at 3:52 PM Joel Fernandes <joel@...lfernandes.org> wrote:
> > >
> > > I *suspect* that the test is literally just for the stack movement
> > > case by execve, where it catches the case where we're doing the
> > > movement entirely within the one vma we set up.
> >
> > Yes that's right, the test is only for the stack movement case. For
> > the regular mremap case, I don't think there is a way for it to
> > trigger.
>
> So I feel the test is simply redundant.
>
> For the regular mremap case, it never triggers.

Unfortunately, I just found that mremap-ing a range purely within a
VMA can actually cause the old and new VMA passed to
move_page_tables() to be the same.

I added a printk to the beginning of move_page_tables that prints all the args:
printk("move_page_tables(vma=(%lx,%lx), old_addr=%lx,
new_vma=(%lx,%lx), new_addr=%lx, len=%lx)\n", vma->vm_start,
vma->vm_end, old_addr, new_vma->vm_start, new_vma->vm_end, new_addr,
len);

Then I wrote a simple test to move 1MB purely within a 10MB range and
I found on running the test that the old and new vma passed to
move_page_tables() are exactly the same.

[   19.697596] move_page_tables(vma=(7f1f985f7000,7f1f98ff7000),
old_addr=7f1f987f7000, new_vma=(7f1f985f7000,7f1f98ff7000),
new_addr=7f1f98af7000, len=100000)

That is a bit counter intuitive as I really thought we'd be splitting
the VMAs with such a move. Any idea what am I missing?

Also, such a usecase will break with my patch as we may accidentally
overwrite parts of a range that were not part of the mremap request.
Maybe I should just turn off the optimization if vma == new_vma,
however that will also turn it off for the stack move so then maybe
another way is to special case stack moves in move_page_tables().

So this means I have to go back to the drawing board a bit on this
patch, and also add more tests in mremap_test.c to test such
within-VMA moving. I believe there are no such existing tests... More
work to do for me. :-)

> And for the stack movement case by execve, I don't think it matters if
> you just were to change the logic of the subsequent checks a bit.
>
> In particular, you do this:
>
>         /* If the masked address is within vma, there is no prev
> mapping of concern. */
>         if (vma->vm_start <= addr_masked)
>                 return false;
>
>         /*
>          * Attempt to find vma before prev that contains the address.
>          * On any issue, assume the address is within a previous mapping.
>          * @mmap write lock is held here, so the lookup is safe.
>          */
>         cur = find_vma_prev(vma->vm_mm, vma->vm_start, &prev);
>         if (!cur || cur != vma || !prev)
>                 return true;
>         /* The masked address fell within a previous mapping. */
>         if (prev->vm_end > addr_masked)
>                 return true;
>
>         return false;
>
> And I think that
>
>         if (!cur || cur != vma || !prev)
>                 return true;
>
> is actively wrong, because if there is no 'prev', then you should return false.

During my tests, I observed that there was always an existing,
unrelated memory mapping present prior to the new memory region
allocated by mmap. Based on this observation, I concluded that if
there is no previous mapping (i.e., if prev is NULL), it indicates a
potential issue with find_vma_prev(). Therefore, I designed this
function to return here indicating that the masked address is not
suitable for optimization, whenever prev is NULL.

That's obviously confusing so I'll try to rewrite this part of the
patch a bit better with appropriate comments.

> So I *think* all of the above could just be replaced with this instead:
>
>         find_vma_prev(vma->vm_mm, vma->vm_start, &prev);
>         return prev && prev->vm_end  > addr_masked;
>
> because only if we have a 'prev', and the prev is into that masked
> address, do we need to avoid doing the masking.
>
> With that simplified test, do you even care about that whole "the
> masked address was already in the vma"? Not that I can see.
>
> And we don't even care about the return value of 'find_vma_prev()',
> because it had better be 'vma'. We're giving it 'vma->vm_start' as an
> address, for chrissake!
>
> So if you *really* wanted to, you could do something like
>
>         cur = find_vma_prev(..);
>         if (WARN_ON_ONCE(cut != vma))
>                 return true;
>
> but even that WARN_ON_ONCE() seems pretty bogus. If it triggers, we
> have some serious corruption going on.
>
> So I stil find that whole "vma->vm_start <= addr_masked" test a bit
> confusing, since it seems entirely redundant.
>
> Is it just because you wanted to avoid calling "find_vma_prev()" at
> all? Maybe just say that in the comment.

Yes exactly, I did not want to run find_vma_prev() unnecessarily. I
will add such clarifications in the comments.

Thanks for all the comments so far, I will continue to work on this.

 - Joel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ