lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 29 Oct 2022 17:15:55 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Naoya Horiguchi <naoya.horiguchi@...ux.dev>,
        David Hildenbrand <david@...hat.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Wei Chen <harperchen1110@...il.com>, stable@...r.kernel.org
Subject: Re: [PATCH v2] hugetlb: don't delete vma_lock in hugetlb
 MADV_DONTNEED processing

On 10/28/22 19:20, Peter Xu wrote:
> On Fri, Oct 28, 2022 at 02:17:01PM -0700, Mike Kravetz wrote:
> > On 10/28/22 12:13, Peter Xu wrote:
> > > On Fri, Oct 28, 2022 at 08:23:25AM -0700, Mike Kravetz wrote:
> > > > On 10/26/22 21:12, Peter Xu wrote:
> > > > > On Wed, Oct 26, 2022 at 04:54:01PM -0700, Mike Kravetz wrote:
> > > > > > On 10/26/22 17:42, Peter Xu wrote:
> > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > index c7105ec6d08c..d8b4d7e56939 100644
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -790,7 +790,10 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
> > > >  static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
> > > >  					unsigned long start, unsigned long end)
> > > >  {
> > > > -	zap_page_range(vma, start, end - start);
> > > > +	if (!is_vm_hugetlb_page(vma))
> > > > +		zap_page_range(vma, start, end - start);
> > > > +	else
> > > > +		clear_hugetlb_page_range(vma, start, end);
> > > 
> > > With the new ZAP_FLAG_UNMAP flag, clear_hugetlb_page_range() can be dropped
> > > completely?  As zap_page_range() won't be with ZAP_FLAG_UNMAP so we can
> > > identify things?
> > > 
> > > IIUC that's the major reason why I thought the zap flag could be helpful..
> > 
> > Argh.  I went to drop clear_hugetlb_page_range() but there is one issue.
> > In zap_page_range() the MMU_NOTIFY_CLEAR notifier is certainly called.
> > However, we really need to have a 'adjust_range_if_pmd_sharing_possible'
> > call in there because the 'range' may be part of a shared pmd.  :(
> > 
> > I think we need to either have a separate routine like clear_hugetlb_page_range
> > that sets up the appropriate range, or special case hugetlb in zap_page_range.
> > What do you think?
> > I think clear_hugetlb_page_range is the least bad of the two options.
> 
> How about special case hugetlb as you mentioned?  If I'm not wrong, it
> should be 3 lines change:
> 
> ---8<---
> diff --git a/mm/memory.c b/mm/memory.c
> index c5599a9279b1..0a1632e44571 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1706,11 +1706,13 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
>         lru_add_drain();
>         mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
>                                 start, start + size);
> +       if (is_vm_hugetlb_page(vma))
> +               adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
>         tlb_gather_mmu(&tlb, vma->vm_mm);
>         update_hiwater_rss(vma->vm_mm);
>         mmu_notifier_invalidate_range_start(&range);
>         do {
> -               unmap_single_vma(&tlb, vma, start, range.end, NULL);
> +               unmap_single_vma(&tlb, vma, start, start + size, NULL);
>         } while ((vma = mas_find(&mas, end - 1)) != NULL);
>         mmu_notifier_invalidate_range_end(&range);
>         tlb_finish_mmu(&tlb);
> ---8<---
> 
> As zap_page_range() is already vma-oriented anyway.  But maybe I missed
> something important?

zap_page_range is a bit confusing.  It appears that the passed range can
span multiple vmas.  Otherwise, there would be no do while loop.  Yet, there
is only one mmu_notifier_range_init call specifying the passed vma.

It appears all callers pass a range entirely within a single vma.

The modifications above would work for a range within a single vma.  However,
things would be more complicated if the range can indeed span multiple vmas.
For multiple vmas, we would need to check the first and last vmas for
pmd sharing.

Anyone know more about this seeming confusing behavior?  Perhaps, range
spanning multiple vmas was left over earlier code?
-- 
Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ