[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <223acdf2-ba25-4c31-94df-a212d31bd6c3@arm.com>
Date: Fri, 19 Jul 2024 09:56:36 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Barry Song <baohua@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Hugh Dickins
<hughd@...gle.com>, Jonathan Corbet <corbet@....net>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
David Hildenbrand <david@...hat.com>, Lance Yang <ioworker0@...il.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, Gavin Shan <gshan@...hat.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v2 3/3] mm: mTHP stats for pagecache folio allocations
On 19/07/2024 01:12, Barry Song wrote:
> On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@....com> wrote:
>>
>> Expose 3 new mTHP stats for file (pagecache) folio allocations:
>>
>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
>>
>> This will provide some insight on the sizes of large folios being
>> allocated for file-backed memory, and how often allocation is failing.
>>
>> All non-order-0 (and most order-0) folio allocations are currently done
>> through filemap_alloc_folio(), and folios are charged in a subsequent
>> call to filemap_add_folio(). So count file_fallback when allocation
>> fails in filemap_alloc_folio() and count file_alloc or
>> file_fallback_charge in filemap_add_folio(), based on whether charging
>> succeeded or not. There are some users of filemap_add_folio() that
>> allocate their own order-0 folio by other means, so we would not count
>> an allocation failure in this case, but we also don't care about order-0
>> allocations. This approach feels like it should be good enough and
>> doesn't require any (impractically large) refactoring.
>>
>> The existing mTHP stats interface is reused to provide consistency to
>> users. And because we are reusing the same interface, we can reuse the
>> same infrastructure on the kernel side.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
>> ---
>> Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
>> include/linux/huge_mm.h | 3 +++
>> include/linux/pagemap.h | 16 ++++++++++++++--
>> mm/filemap.c | 6 ++++--
>> mm/huge_memory.c | 7 +++++++
>> 5 files changed, 41 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>> index 058485daf186..d4857e457add 100644
>> --- a/Documentation/admin-guide/mm/transhuge.rst
>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>> @@ -512,6 +512,19 @@ shmem_fallback_charge
>> falls back to using small pages even though the allocation was
>> successful.
>>
>> +file_alloc
>> + is incremented every time a file huge page is successfully
>> + allocated.
>> +
>> +file_fallback
>> + is incremented if a file huge page is attempted to be allocated
>> + but fails and instead falls back to using small pages.
>> +
>> +file_fallback_charge
>> + is incremented if a file huge page cannot be charged and instead
>> + falls back to using small pages even though the allocation was
>> + successful.
>> +
>
> I realized that when we talk about fallback, it doesn't necessarily mean
> small pages; it could also refer to smaller huge pages.
Yes good point, I'll update the documentation to reflect that for all memory types.
>
> anon_fault_alloc
> is incremented every time a huge page is successfully
> allocated and charged to handle a page fault.
>
> anon_fault_fallback
> is incremented if a page fault fails to allocate or charge
> a huge page and instead falls back to using huge pages with
> lower orders or small pages.
>
> anon_fault_fallback_charge
> is incremented if a page fault fails to charge a huge page and
> instead falls back to using huge pages with lower orders or
> small pages even though the allocation was successful.
>
> This also applies to files, right?
It does in the place you highlight below, but page_cache_ra_order() just falls
back immediately to order-0. Regardless, I think we should just document the
user facing docs to allow for a lower high order. That gives the implementation
more flexibility.
>
> do {
> gfp_t alloc_gfp = gfp;
>
> err = -ENOMEM;
> if (order > 0)
> alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
> folio = filemap_alloc_folio(alloc_gfp, order);
> if (!folio)
> continue;
>
> /* Init accessed so avoid atomic
> mark_page_accessed later */
> if (fgp_flags & FGP_ACCESSED)
> __folio_set_referenced(folio);
>
> err = filemap_add_folio(mapping, folio, index, gfp);
> if (!err)
> break;
> folio_put(folio);
> folio = NULL;
> } while (order-- > 0);
>
>
>> split
>> is incremented every time a huge page is successfully split into
>> smaller orders. This can happen for a variety of reasons but a
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index b8c63c3e967f..4f9109fcdded 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -123,6 +123,9 @@ enum mthp_stat_item {
>> MTHP_STAT_SHMEM_ALLOC,
>> MTHP_STAT_SHMEM_FALLBACK,
>> MTHP_STAT_SHMEM_FALLBACK_CHARGE,
>> + MTHP_STAT_FILE_ALLOC,
>> + MTHP_STAT_FILE_FALLBACK,
>> + MTHP_STAT_FILE_FALLBACK_CHARGE,
>> MTHP_STAT_SPLIT,
>> MTHP_STAT_SPLIT_FAILED,
>> MTHP_STAT_SPLIT_DEFERRED,
>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index 6e2f72d03176..95a147b5d117 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page)
>> }
>>
>> #ifdef CONFIG_NUMA
>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>> #else
>> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> {
>> return folio_alloc_noprof(gfp, order);
>> }
>> #endif
>>
>> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> +{
>> + struct folio *folio;
>> +
>> + folio = __filemap_alloc_folio_noprof(gfp, order);
>> +
>> + if (!folio)
>> + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
>> +
>> + return folio;
>> +}
>
> Do we need to add and export __filemap_alloc_folio_noprof()?
It is exported. See the below change in filemap.c. Previously
filemap_alloc_folio_noprof() was exported, but that is now an inline and
__filemap_alloc_folio_noprof() is exported in its place.
> In any case,
> we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and
> will only allocate the folio instead?
Sorry I don't understand what you mean by this?
>
>> +
>> #define filemap_alloc_folio(...) \
>> alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 53d5d0410b51..131d514fca29 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>> int ret;
>>
>> ret = mem_cgroup_charge(folio, NULL, gfp);
>> + count_mthp_stat(folio_order(folio),
>> + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC);
>> if (ret)
>> return ret;
>
> Would the following be better?
>
> ret = mem_cgroup_charge(folio, NULL, gfp);
> if (ret) {
> count_mthp_stat(folio_order(folio),
> MTHP_STAT_FILE_FALLBACK_CHARGE);
> return ret;
> }
> count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
>
> Anyway, it's up to you. The code just feels a bit off to me :-)
Yes, agree your version is better. I also noticed that anon and shmem increment
FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same
semantics here.
Thanks,
Ryan
>
>>
>> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>> EXPORT_SYMBOL_GPL(filemap_add_folio);
>>
>> #ifdef CONFIG_NUMA
>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> {
>> int n;
>> struct folio *folio;
>> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> }
>> return folio_alloc_noprof(gfp, order);
>> }
>> -EXPORT_SYMBOL(filemap_alloc_folio_noprof);
>> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof);
>> #endif
>>
>> /*
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 578ac212c172..26d558e3e80f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = {
>> .attrs = anon_stats_attrs,
>> };
>>
>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
>> +
>> static struct attribute *file_stats_attrs[] = {
>> + &file_alloc_attr.attr,
>> + &file_fallback_attr.attr,
>> + &file_fallback_charge_attr.attr,
>> #ifdef CONFIG_SHMEM
>> &shmem_alloc_attr.attr,
>> &shmem_fallback_attr.attr,
>> --
>> 2.43.0
>>
>
> Thanks
> Barry
Powered by blists - more mailing lists