[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180907112528.GI19621@dhcp22.suse.cz>
Date: Fri, 7 Sep 2018 13:25:28 +0200
From: Michal Hocko <mhocko@...nel.org>
To: "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>
Cc: akpm@...ux-foundation.org, Alexey Kardashevskiy <aik@...abs.ru>,
mpe@...erman.id.au, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [RFC PATCH V2 4/4] powerpc/mm/iommu: Allow migration of cma
allocated pages during mm_iommu_get
On Fri 07-09-18 16:45:09, Aneesh Kumar K.V wrote:
> On 09/07/2018 02:33 PM, Michal Hocko wrote:
> > On Thu 06-09-18 19:00:43, Aneesh Kumar K.V wrote:
> > > On 09/06/2018 06:23 PM, Michal Hocko wrote:
> > > > On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
> > > > > Current code doesn't do page migration if the page allocated is a compound page.
> > > > > With HugeTLB migration support, we can end up allocating hugetlb pages from
> > > > > CMA region. Also THP pages can be allocated from CMA region. This patch updates
> > > > > the code to handle compound pages correctly.
> > > > >
> > > > > This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
> > > > > with right count, instead of doing one get_user_pages per page. That avoids
> > > > > reading page table multiple times.
> > > > >
> > > > > The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
> > > > > We use the same storage location to store pointers to struct page. We cannot
> > > > > update alll the code path use struct page *, because we access hpas in real mode
> > > > > and we can't do that struct page * to pfn conversion in real mode.
> > > >
> > > > I am not fmailiar with this code so bear with me. I am completely
> > > > missing the purpose of this patch. The changelog doesn't really explain
> > > > that AFAICS. I can only guess that you do not want to establish long
> > > > pins on CMA pages, right? So whenever you are about to pin a page that
> > > > is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?
> > >
> > > That is right.
> > >
> > > > If that is the case then how do you handle pins which are already in
> > > > zone_movable? I do not see any specific check for those.
> > >
> > >
> > > >
> > > > Btw. why is this a proper thing to do? Problems with longterm pins are
> > > > not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
> > > > well so there is a risk of OOMs if there are too many of them. We have
> > > > discussed approaches that would allow to force pin invalidation/revocation
> > > > at LSF/MM. Isn't that a more appropriate solution to the problem you are
> > > > seeing?
> > > >
> > >
> > > The CMA area is used on powerpc platforms to allocate guest specific page
> > > table (hash page table). If we don't have sufficient free pages we fail to
> > > allocate hash page table that result in failure to start guest.
> > >
> > > Now with vfio, we end up pinning the entire guest RAM. There is a
> > > possibility that these guest RAM pages got allocated from CMA region. We
> > > already do supporting migrating those pages out except for compound pages.
> > > What this patch does is to start supporting compound page migration that got
> > > allocated out of CMA region (ie, THP pages and hugetlb pages if platform
> > > supported hugetlb migration).
> >
> > This definitely belongs to the changelog.
> >
> > > Now to do that I added a helper get_user_pages_cma_migrate().
> > >
> > > I agree that long term pinned pages do have other issues. The patchset is
> > > not solving that issue.
> >
> > It would be great to note why a generic approach is not viable. I assume
> > the main reason is that those pins are pretty much permanent for the
> > guest lifetime so the situation has to be handled in advance. In other
> > words, more information please.
> >
>
> That is correct. I will add these details to commit message. And will also
> do a cover letter for the patch series.
OK, then the early migration makes some sense. Although I suspect this
will lead to other issues (OOM in kernel zones) but revocation approach
is clearly not usable. An excessive pinning simply sucks.
Thanks a lot for the updated information though!
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists