lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y8cZHv6dNHIg99tW@x1n>
Date:   Tue, 17 Jan 2023 16:54:38 -0500
From:   Peter Xu <peterx@...hat.com>
To:     James Houghton <jthoughton@...gle.com>
Cc:     Mike Kravetz <mike.kravetz@...cle.com>,
        Muchun Song <songmuchun@...edance.com>,
        David Hildenbrand <david@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>,
        Zach O'Keefe <zokeefe@...gle.com>,
        Manish Mishra <manish.mishra@...anix.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Yang Shi <shy828301@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 35/46] hugetlb: add MADV_COLLAPSE for hugetlb

        On Tue, Jan 17, 2023 at 01:38:24PM -0800, James Houghton wrote:
> > > +             if (curr < end) {
> > > +                     /* Don't hold the VMA lock for too long. */
> > > +                     hugetlb_vma_unlock_write(vma);
> > > +                     cond_resched();
> > > +                     hugetlb_vma_lock_write(vma);
> >
> > The intention is good here but IIUC this will cause vma lock to be taken
> > after the i_mmap_rwsem, which can cause circular deadlocks.  If to do this
> > properly we'll need to also release the i_mmap_rwsem.
> 
> Sorry if you spent a long time debugging this! I sent a reply a week
> ago about this too.

Oops, yes, I somehow missed that one.  No worry - it's reported by
lockdep. :)

> 
> >
> > However it may make the resched() logic over complicated, meanwhile for 2M
> > huge pages I think this will be called for each 2M range which can be too
> > fine grained, so it looks like the "cur < end" check is a bit too aggresive.
> >
> > The other thing is I noticed that the long period of mmu notifier
> > invalidate between start -> end will (in reallife VM context) causing vcpu
> > threads spinning.
> >
> > I _think_ it's because is_page_fault_stale() (when during a vmexit
> > following a kvm page fault) always reports true during the long procedure
> > of MADV_COLLAPSE if to be called upon a large range, so even if we release
> > both locks here it may not tremedously on the VM migration use case because
> > of the long-standing mmu notifier invalidation procedure.
> 
> Oh... indeed. Thanks for pointing that out.
> 
> >
> > To summarize.. I think a simpler start version of hugetlb MADV_COLLAPSE can
> > drop this "if" block, and let the userapp decide the step size of COLLAPSE?
> 
> I'll drop this resched logic. Thanks Peter.

Sounds good, thanks.

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ