linux-kernel - Re: [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180907090312.GF19621@dhcp22.suse.cz>
Date:   Fri, 7 Sep 2018 11:03:12 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>
Cc:     akpm@...ux-foundation.org, Alexey Kardashevskiy <aik@...abs.ru>,
        mpe@...erman.id.au, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma
 allocated pages during mm_iommu_get

On Thu 06-09-18 19:00:43, Aneesh Kumar K.V wrote:
> On 09/06/2018 06:23 PM, Michal Hocko wrote:
> > On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
> > > Current code doesn't do page migration if the page allocated is a compound page.
> > > With HugeTLB migration support, we can end up allocating hugetlb pages from
> > > CMA region. Also THP pages can be allocated from CMA region. This patch updates
> > > the code to handle compound pages correctly.
> > > 
> > > This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
> > > with right count, instead of doing one get_user_pages per page. That avoids
> > > reading page table multiple times.
> > > 
> > > The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
> > > We use the same storage location to store pointers to struct page. We cannot
> > > update alll the code path use struct page *, because we access hpas in real mode
> > > and we can't do that struct page * to pfn conversion in real mode.
> > 
> > I am not fmailiar with this code so bear with me. I am completely
> > missing the purpose of this patch. The changelog doesn't really explain
> > that AFAICS. I can only guess that you do not want to establish long
> > pins on CMA pages, right? So whenever you are about to pin a page that
> > is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?
> 
> That is right.
> 
> > If that is the case then how do you handle pins which are already in
> > zone_movable? I do not see any specific check for those.
> 
> 
> > 
> > Btw. why is this a proper thing to do? Problems with longterm pins are
> > not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
> > well so there is a risk of OOMs if there are too many of them. We have
> > discussed approaches that would allow to force pin invalidation/revocation
> > at LSF/MM. Isn't that a more appropriate solution to the problem you are
> > seeing?
> > 
> 
> The CMA area is used on powerpc platforms to allocate guest specific page
> table (hash page table). If we don't have sufficient free pages we fail to
> allocate hash page table that result in failure to start guest.
> 
> Now with vfio, we end up pinning the entire guest RAM. There is a
> possibility that these guest RAM  pages got allocated from CMA region. We
> already do supporting migrating those pages out except for compound pages.
> What this patch does is to start supporting compound page migration that got
> allocated out of CMA region (ie, THP pages and hugetlb pages if platform
> supported hugetlb migration).

This definitely belongs to the changelog.

> Now to do that I added a helper get_user_pages_cma_migrate().
> 
> I agree that long term pinned pages do have other issues. The patchset is
> not solving that issue.

It would be great to note why a generic approach is not viable. I assume
the main reason is that those pins are pretty much permanent for the
guest lifetime so the situation has to be handled in advance. In other
words, more information please. 
-- 
Michal Hocko
SUSE Labs