[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5dd62fee-56bf-0b54-2e91-c31068a2b040@redhat.com>
Date: Fri, 26 May 2023 17:06:55 +0300
From: Mika Penttilä <mpenttil@...hat.com>
To: David Howells <dhowells@...hat.com>, Yunsheng Lin
<linyunsheng@...wei.com>, Matthew Wilcox <willy@...radead.org>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
David Ahern <dsahern@...nel.org>, Jens Axboe <axboe@...nel.dk>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Jeroen de Borst <jeroendb@...gle.com>, Catherine Sullivan
<csully@...gle.com>, Shailend Chand <shailend@...gle.com>,
Felix Fietkau <nbd@....name>, John Crispin <john@...ozen.org>,
Sean Wang <sean.wang@...iatek.com>, Mark Lee <Mark-MC.Lee@...iatek.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
Chaitanya Kulkarni <kch@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-arm-kernel@...ts.infradead.org, linux-mediatek@...ts.infradead.org,
linux-nvme@...ts.infradead.org
Subject: Re: [PATCH net-next 04/12] mm: Make the page_frag_cache allocator use
multipage folios
Hi,
On 26.5.2023 15.47, David Howells wrote:
> Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
>>> Change the page_frag_cache allocator to use multipage folios rather than
>>> groups of pages. This reduces page_frag_free to just a folio_put() or
>>> put_page().
>>
>> put_page() is not used in this patch, perhaps remove it to avoid
>> the confusion?
>
> Will do if I need to respin the patches.
>
>> Also, Is there any significant difference between __free_pages()
>> and folio_put()? IOW, what does the 'reduces' part means here?
>
> I meant that the folio code handles page compounding for us and we don't need
> to work out how big the page is for ourselves.
>
> If you look at __free_pages(), you can see a PageHead() call. folio_put()
> doesn't need that.
>
>> I followed some disscusion about folio before, but have not really
>> understood about real difference between 'multipage folios' and
>> 'groups of pages' yet. Is folio mostly used to avoid the confusion
>> about whether a page is 'headpage of compound page', 'base page' or
>> 'tailpage of compound page'? Or is there any abvious benefit about
>> folio that I missed?
>
> There is a benefit: a folio pointer always points to the head page and so we
> never need to do "is this compound? where's the head?" logic to find it. When
> going from a page pointer, we still have to find the head.
>
But page_frag_free() uses folio_put(virt_to_folio(addr)) and
virt_to_folio() depends on the compound infrastructure to get the head
page and folio.
> Ultimately, the aim is to reduce struct page to a typed pointer to massively
> reduce the amount of space consumed by mem_map[]. A page struct will then
> point at a folio or a slab struct or one of a number of different types. But
> to get to that point, we have to stop a whole lot of things from using page
> structs, but rather use some other type, such as folio.
>
> Eventually, there won't be a need for head pages and tail pages per se - just
> memory objects of different sizes.
>
>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>>> index 306a3d1a0fa6..d7c52a5979cc 100644
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> @@ -420,18 +420,13 @@ static inline void *folio_get_private(struct folio *folio)
>>> }
>>>
>>> struct page_frag_cache {
>>> - void * va;
>>> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>>> - __u16 offset;
>>> - __u16 size;
>>> -#else
>>> - __u32 offset;
>>> -#endif
>>> + struct folio *folio;
>>> + unsigned int offset;
>>> /* we maintain a pagecount bias, so that we dont dirty cache line
>>> * containing page->_refcount every time we allocate a fragment.
>>> */
>>> - unsigned int pagecnt_bias;
>>> - bool pfmemalloc;
>>> + unsigned int pagecnt_bias;
>>> + bool pfmemalloc;
>>> };
>>
>> It seems 'va' and 'size' field is used to avoid touching 'stuct page' to
>> avoid possible cache bouncing when there is more frag can be allocated
>> from the page while other frags is freed at the same time before this patch?
>
> Hmmm... fair point, though va is calculated from the page pointer on most
> arches without the need to dereference struct page (only arc, m68k and sparc
> define WANT_PAGE_VIRTUAL).
>
> David
>
--Mika
Powered by blists - more mailing lists