[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49f6a0f1-c6fa-4642-2db0-69f090e8a392@oracle.com>
Date: Wed, 16 Dec 2020 14:49:36 -0800
From: Mike Kravetz <mike.kravetz@...cle.com>
To: Oscar Salvador <osalvador@...e.de>
Cc: Muchun Song <songmuchun@...edance.com>, corbet@....net,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, x86@...nel.org,
hpa@...or.com, dave.hansen@...ux.intel.com, luto@...nel.org,
peterz@...radead.org, viro@...iv.linux.org.uk,
akpm@...ux-foundation.org, paulmck@...nel.org,
mchehab+huawei@...nel.org, pawan.kumar.gupta@...ux.intel.com,
rdunlap@...radead.org, oneukum@...e.com, anshuman.khandual@....com,
jroedel@...e.de, almasrymina@...gle.com, rientjes@...gle.com,
willy@...radead.org, mhocko@...e.com, song.bao.hua@...ilicon.com,
david@...hat.com, duanxiongchun@...edance.com,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v9 03/11] mm/hugetlb: Free the vmemmap pages associated
with each HugeTLB page
On 12/16/20 2:25 PM, Oscar Salvador wrote:
> On Wed, Dec 16, 2020 at 02:08:30PM -0800, Mike Kravetz wrote:
>>> + * vmemmap_rmap_walk - walk vmemmap page table
>>> +
>>> +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
>>> + unsigned long end, struct vmemmap_rmap_walk *walk)
>>> +{
>>> + pte_t *pte;
>>> +
>>> + pte = pte_offset_kernel(pmd, addr);
>>> + do {
>>> + BUG_ON(pte_none(*pte));
>>> +
>>> + if (!walk->reuse)
>>> + walk->reuse = pte_page(pte[VMEMMAP_TAIL_PAGE_REUSE]);
>>
>> It may be just me, but I don't like the pte[-1] here. It certainly does work
>> as designed because we want to remap all pages in the range to the page before
>> the range (at offset -1). But, we do not really validate this 'reuse' page.
>> There is the BUG_ON(pte_none(*pte)) as a sanity check, but we do nothing similar
>> for pte[-1]. Based on the usage for HugeTLB pages, we can be confident that
>> pte[-1] is actually a pte. In discussions with Oscar, you mentioned another
>> possible use for these routines.
>
> Without giving it much of a thought, I guess we could duplicate the
> BUG_ON for the pte outside the loop, and add a new one for pte[-1].
> Also, since walk->reuse seems to not change once it is set, we can take
> it outside the loop? e.g:
>
> pte *pte;
>
> pte = pte_offset_kernel(pmd, addr);
> BUG_ON(pte_none(*pte));
> BUG_ON(pte_none(pte[VMEMMAP_TAIL_PAGE_REUSE]));
> walk->reuse = pte_page(pte[VMEMMAP_TAIL_PAGE_REUSE]);
> do {
> ....
> } while...
>
> Or I am not sure whether we want to keep it inside the loop in case
> future cases change walk->reuse during the operation.
> But to be honest, I do not think it is realistic of all future possible
> uses of this, so I would rather keep it simple for now.
I was thinking about possibly passing the 'reuse' address as another parameter
to vmemmap_remap_reuse(). We could add this addr to the vmemmap_rmap_walk
struct and set walk->reuse when we get to the pte for that address. Of
course this would imply that the addr would need to be part of the range.
Ideally, we would walk the page table to get to the reuse page. My concern
was not explicitly about adding the BUG_ON. In more general use, *pte could
be the first entry on a pte page. And, then pte[-1] may not even be a pte.
Again, I don't think this matters for the current HugeTLB use case. Just a
little concerned if code is put to use for other purposes.
--
Mike Kravetz
Powered by blists - more mailing lists