[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGsJ_4xc4zhdCtMKtBUmHXNpEzG5r9Enb2=9VUMS4XY8gji7MQ@mail.gmail.com>
Date: Wed, 24 Jul 2024 22:23:12 +1200
From: Barry Song <baohua@...nel.org>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Hugh Dickins <hughd@...gle.com>,
Jonathan Corbet <corbet@....net>, "Matthew Wilcox (Oracle)" <willy@...radead.org>,
David Hildenbrand <david@...hat.com>, Lance Yang <ioworker0@...il.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, Gavin Shan <gshan@...hat.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v2 3/3] mm: mTHP stats for pagecache folio allocations
On Wed, Jul 24, 2024 at 8:12 PM Ryan Roberts <ryan.roberts@....com> wrote:
>
> On 23/07/2024 23:07, Barry Song wrote:
> > On Fri, Jul 19, 2024 at 8:56 PM Ryan Roberts <ryan.roberts@....com> wrote:
> >>
> >> On 19/07/2024 01:12, Barry Song wrote:
> >>> On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@....com> wrote:
> >>>>
> >>>> Expose 3 new mTHP stats for file (pagecache) folio allocations:
> >>>>
> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
> >>>>
> >>>> This will provide some insight on the sizes of large folios being
> >>>> allocated for file-backed memory, and how often allocation is failing.
> >>>>
> >>>> All non-order-0 (and most order-0) folio allocations are currently done
> >>>> through filemap_alloc_folio(), and folios are charged in a subsequent
> >>>> call to filemap_add_folio(). So count file_fallback when allocation
> >>>> fails in filemap_alloc_folio() and count file_alloc or
> >>>> file_fallback_charge in filemap_add_folio(), based on whether charging
> >>>> succeeded or not. There are some users of filemap_add_folio() that
> >>>> allocate their own order-0 folio by other means, so we would not count
> >>>> an allocation failure in this case, but we also don't care about order-0
> >>>> allocations. This approach feels like it should be good enough and
> >>>> doesn't require any (impractically large) refactoring.
> >>>>
> >>>> The existing mTHP stats interface is reused to provide consistency to
> >>>> users. And because we are reusing the same interface, we can reuse the
> >>>> same infrastructure on the kernel side.
> >>>>
> >>>> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
> >>>> ---
> >>>> Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
> >>>> include/linux/huge_mm.h | 3 +++
> >>>> include/linux/pagemap.h | 16 ++++++++++++++--
> >>>> mm/filemap.c | 6 ++++--
> >>>> mm/huge_memory.c | 7 +++++++
> >>>> 5 files changed, 41 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> >>>> index 058485daf186..d4857e457add 100644
> >>>> --- a/Documentation/admin-guide/mm/transhuge.rst
> >>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
> >>>> @@ -512,6 +512,19 @@ shmem_fallback_charge
> >>>> falls back to using small pages even though the allocation was
> >>>> successful.
> >>>>
> >>>> +file_alloc
> >>>> + is incremented every time a file huge page is successfully
> >>>> + allocated.
> >>>> +
> >>>> +file_fallback
> >>>> + is incremented if a file huge page is attempted to be allocated
> >>>> + but fails and instead falls back to using small pages.
> >>>> +
> >>>> +file_fallback_charge
> >>>> + is incremented if a file huge page cannot be charged and instead
> >>>> + falls back to using small pages even though the allocation was
> >>>> + successful.
> >>>> +
> >>>
> >>> I realized that when we talk about fallback, it doesn't necessarily mean
> >>> small pages; it could also refer to smaller huge pages.
> >>
> >> Yes good point, I'll update the documentation to reflect that for all memory types.
> >>
> >>>
> >>> anon_fault_alloc
> >>> is incremented every time a huge page is successfully
> >>> allocated and charged to handle a page fault.
> >>>
> >>> anon_fault_fallback
> >>> is incremented if a page fault fails to allocate or charge
> >>> a huge page and instead falls back to using huge pages with
> >>> lower orders or small pages.
> >>>
> >>> anon_fault_fallback_charge
> >>> is incremented if a page fault fails to charge a huge page and
> >>> instead falls back to using huge pages with lower orders or
> >>> small pages even though the allocation was successful.
> >>>
> >>> This also applies to files, right?
> >>
> >> It does in the place you highlight below, but page_cache_ra_order() just falls
> >> back immediately to order-0. Regardless, I think we should just document the
> >> user facing docs to allow for a lower high order. That gives the implementation
> >> more flexibility.
> >>
> >>>
> >>> do {
> >>> gfp_t alloc_gfp = gfp;
> >>>
> >>> err = -ENOMEM;
> >>> if (order > 0)
> >>> alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
> >>> folio = filemap_alloc_folio(alloc_gfp, order);
> >>> if (!folio)
> >>> continue;
> >>>
> >>> /* Init accessed so avoid atomic
> >>> mark_page_accessed later */
> >>> if (fgp_flags & FGP_ACCESSED)
> >>> __folio_set_referenced(folio);
> >>>
> >>> err = filemap_add_folio(mapping, folio, index, gfp);
> >>> if (!err)
> >>> break;
> >>> folio_put(folio);
> >>> folio = NULL;
> >>> } while (order-- > 0);
> >>>
> >>>
> >>>> split
> >>>> is incremented every time a huge page is successfully split into
> >>>> smaller orders. This can happen for a variety of reasons but a
> >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >>>> index b8c63c3e967f..4f9109fcdded 100644
> >>>> --- a/include/linux/huge_mm.h
> >>>> +++ b/include/linux/huge_mm.h
> >>>> @@ -123,6 +123,9 @@ enum mthp_stat_item {
> >>>> MTHP_STAT_SHMEM_ALLOC,
> >>>> MTHP_STAT_SHMEM_FALLBACK,
> >>>> MTHP_STAT_SHMEM_FALLBACK_CHARGE,
> >>>> + MTHP_STAT_FILE_ALLOC,
> >>>> + MTHP_STAT_FILE_FALLBACK,
> >>>> + MTHP_STAT_FILE_FALLBACK_CHARGE,
> >>>> MTHP_STAT_SPLIT,
> >>>> MTHP_STAT_SPLIT_FAILED,
> >>>> MTHP_STAT_SPLIT_DEFERRED,
> >>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> >>>> index 6e2f72d03176..95a147b5d117 100644
> >>>> --- a/include/linux/pagemap.h
> >>>> +++ b/include/linux/pagemap.h
> >>>> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page)
> >>>> }
> >>>>
> >>>> #ifdef CONFIG_NUMA
> >>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
> >>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
> >>>> #else
> >>>> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>> {
> >>>> return folio_alloc_noprof(gfp, order);
> >>>> }
> >>>> #endif
> >>>>
> >>>> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>> +{
> >>>> + struct folio *folio;
> >>>> +
> >>>> + folio = __filemap_alloc_folio_noprof(gfp, order);
> >>>> +
> >>>> + if (!folio)
> >>>> + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
> >>>> +
> >>>> + return folio;
> >>>> +}
> >>>
> >>> Do we need to add and export __filemap_alloc_folio_noprof()?
> >>
> >> It is exported. See the below change in filemap.c. Previously
> >> filemap_alloc_folio_noprof() was exported, but that is now an inline and
> >> __filemap_alloc_folio_noprof() is exported in its place.
> >>
> >>> In any case,
> >>> we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and
> >>> will only allocate the folio instead?
> >>
> >> Sorry I don't understand what you mean by this?
> >
> > Ryan, my question is whether exporting __filemap_alloc_folio_noprof() might lead
> > to situations where people call __filemap_alloc_folio_noprof() and
> > forget to call
> > count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems
> > that counting is always necessary?
>
> OK to make sure I've understood, I think you are saying that there is a risk
> that people will call __filemap_alloc_folio_noprof() and bypass the stat
> counting? But its the same problem with all "_noprof" functions; if those are
> called directly, it bypasses the memory allocation profiling infrastructure. So
> this just needs to be a case of "don't do that" IMHO. filemap_alloc_folio() is
> the public API.
Indeed. Maybe I'm overthinking it.
>
> > Another option is that we still
> > call count mthp
> > within filemap_alloc_folio_noprof() and make it noinline if
> > __filemap_alloc_folio_noprof()
> > is not inline?
>
> I could certainly make filemap_alloc_folio_noprof() always out-of-line and then
> handle the counting privately in the compilation unit. But before my change
> filemap_alloc_folio_noprof() was inline for the !CONFIG_NUMA case and I was
> trying not to change that. I think what you're suggesting would be tidier
> though. I'll make the change. But I don't think it solves the wider problem of
> the possibility that people can call private APIs. That's what review is for.
Agreed.
>
> Thanks,
> Ryan
>
>
> >
> >>
> >>>
> >>>> +
> >>>> #define filemap_alloc_folio(...) \
> >>>> alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
> >>>>
> >>>> diff --git a/mm/filemap.c b/mm/filemap.c
> >>>> index 53d5d0410b51..131d514fca29 100644
> >>>> --- a/mm/filemap.c
> >>>> +++ b/mm/filemap.c
> >>>> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
> >>>> int ret;
> >>>>
> >>>> ret = mem_cgroup_charge(folio, NULL, gfp);
> >>>> + count_mthp_stat(folio_order(folio),
> >>>> + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC);
> >>>> if (ret)
> >>>> return ret;
> >>>
> >>> Would the following be better?
> >>>
> >>> ret = mem_cgroup_charge(folio, NULL, gfp);
> >>> if (ret) {
> >>> count_mthp_stat(folio_order(folio),
> >>> MTHP_STAT_FILE_FALLBACK_CHARGE);
> >>> return ret;
> >>> }
> >>> count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
> >>>
> >>> Anyway, it's up to you. The code just feels a bit off to me :-)
> >>
> >> Yes, agree your version is better. I also noticed that anon and shmem increment
> >> FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same
> >> semantics here.
> >>
> >> Thanks,
> >> Ryan
> >>
> >>
> >>>
> >>>>
> >>>> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
> >>>> EXPORT_SYMBOL_GPL(filemap_add_folio);
> >>>>
> >>>> #ifdef CONFIG_NUMA
> >>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>> {
> >>>> int n;
> >>>> struct folio *folio;
> >>>> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>> }
> >>>> return folio_alloc_noprof(gfp, order);
> >>>> }
> >>>> -EXPORT_SYMBOL(filemap_alloc_folio_noprof);
> >>>> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof);
> >>>> #endif
> >>>>
> >>>> /*
> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>> index 578ac212c172..26d558e3e80f 100644
> >>>> --- a/mm/huge_memory.c
> >>>> +++ b/mm/huge_memory.c
> >>>> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = {
> >>>> .attrs = anon_stats_attrs,
> >>>> };
> >>>>
> >>>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
> >>>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
> >>>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> >>>> +
> >>>> static struct attribute *file_stats_attrs[] = {
> >>>> + &file_alloc_attr.attr,
> >>>> + &file_fallback_attr.attr,
> >>>> + &file_fallback_charge_attr.attr,
> >>>> #ifdef CONFIG_SHMEM
> >>>> &shmem_alloc_attr.attr,
> >>>> &shmem_fallback_attr.attr,
> >>>> --
> >>>> 2.43.0
> >>>>
> >>>
> >
Thanks
Barry
Powered by blists - more mailing lists