[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <112b4bcd-230a-4482-ae2e-67fa22b3596f@redhat.com>
Date: Mon, 11 Aug 2025 11:52:11 +0200
From: David Hildenbrand <david@...hat.com>
To: Kiryl Shutsemau <kirill@...temov.name>,
"Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>
Cc: Suren Baghdasaryan <surenb@...gle.com>,
Ryan Roberts <ryan.roberts@....com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, Vlastimil Babka
<vbabka@...e.cz>, Zi Yan <ziy@...dia.com>, Mike Rapoport <rppt@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>, Michal Hocko <mhocko@...e.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>, Nico Pache <npache@...hat.com>,
Dev Jain <dev.jain@....com>, "Liam R . Howlett" <Liam.Howlett@...cle.com>,
Jens Axboe <axboe@...nel.dk>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, willy@...radead.org,
Ritesh Harjani <ritesh.list@...il.com>, linux-block@...r.kernel.org,
linux-fsdevel@...r.kernel.org, "Darrick J . Wong" <djwong@...nel.org>,
mcgrof@...nel.org, gost.dev@...sung.com, hch@....de,
Pankaj Raghav <p.raghav@...sung.com>
Subject: Re: [PATCH v3 0/5] add persistent huge zero folio support
On 11.08.25 11:49, David Hildenbrand wrote:
> On 11.08.25 11:43, Kiryl Shutsemau wrote:
>> On Mon, Aug 11, 2025 at 10:41:08AM +0200, Pankaj Raghav (Samsung) wrote:
>>> From: Pankaj Raghav <p.raghav@...sung.com>
>>>
>>> Many places in the kernel need to zero out larger chunks, but the
>>> maximum segment we can zero out at a time by ZERO_PAGE is limited by
>>> PAGE_SIZE.
>>>
>>> This concern was raised during the review of adding Large Block Size support
>>> to XFS[2][3].
>>>
>>> This is especially annoying in block devices and filesystems where
>>> multiple ZERO_PAGEs are attached to the bio in different bvecs. With multipage
>>> bvec support in block layer, it is much more efficient to send out
>>> larger zero pages as a part of single bvec.
>>>
>>> Some examples of places in the kernel where this could be useful:
>>> - blkdev_issue_zero_pages()
>>> - iomap_dio_zero()
>>> - vmalloc.c:zero_iter()
>>> - rxperf_process_call()
>>> - fscrypt_zeroout_range_inline_crypt()
>>> - bch2_checksum_update()
>>> ...
>>>
>>> Usually huge_zero_folio is allocated on demand, and it will be
>>> deallocated by the shrinker if there are no users of it left. At the moment,
>>> huge_zero_folio infrastructure refcount is tied to the process lifetime
>>> that created it. This might not work for bio layer as the completions
>>> can be async and the process that created the huge_zero_folio might no
>>> longer be alive. And, one of the main point that came during discussion
>>> is to have something bigger than zero page as a drop-in replacement.
>>>
>>> Add a config option PERSISTENT_HUGE_ZERO_FOLIO that will always allocate
>>> the huge_zero_folio, and disable the shrinker so that huge_zero_folio is
>>> never freed.
>>> This makes using the huge_zero_folio without having to pass any mm struct and does
>>> not tie the lifetime of the zero folio to anything, making it a drop-in
>>> replacement for ZERO_PAGE.
>>>
>>> I have converted blkdev_issue_zero_pages() as an example as a part of
>>> this series. I also noticed close to 4% performance improvement just by
>>> replacing ZERO_PAGE with persistent huge_zero_folio.
>>>
>>> I will send patches to individual subsystems using the huge_zero_folio
>>> once this gets upstreamed.
>>>
>>> Looking forward to some feedback.
>>
>> Why does it need to be compile-time? Maybe whoever needs huge zero page
>> would just call get_huge_zero_page()/folio() on initialization to get it
>> pinned?
>
> That's what v2 did, and this way here is cleaner.
Sorry, RFC v2 I think. It got a bit confusing with series names/versions.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists