lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <rr6kkjxizlpruc46hjnx72jl5625rsw3mcpkc5h4bvtp3wbmjf@g45yhep3ogjo>
Date: Mon, 11 Aug 2025 11:07:48 +0100
From: Kiryl Shutsemau <kirill@...temov.name>
To: David Hildenbrand <david@...hat.com>
Cc: "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>, 
	Suren Baghdasaryan <surenb@...gle.com>, Ryan Roberts <ryan.roberts@....com>, 
	Baolin Wang <baolin.wang@...ux.alibaba.com>, Vlastimil Babka <vbabka@...e.cz>, Zi Yan <ziy@...dia.com>, 
	Mike Rapoport <rppt@...nel.org>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	Michal Hocko <mhocko@...e.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Nico Pache <npache@...hat.com>, Dev Jain <dev.jain@....com>, 
	"Liam R . Howlett" <Liam.Howlett@...cle.com>, Jens Axboe <axboe@...nel.dk>, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, willy@...radead.org, Ritesh Harjani <ritesh.list@...il.com>, 
	linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	"Darrick J . Wong" <djwong@...nel.org>, mcgrof@...nel.org, gost.dev@...sung.com, hch@....de, 
	Pankaj Raghav <p.raghav@...sung.com>
Subject: Re: [PATCH v3 0/5] add persistent huge zero folio support

On Mon, Aug 11, 2025 at 11:52:11AM +0200, David Hildenbrand wrote:
> On 11.08.25 11:49, David Hildenbrand wrote:
> > On 11.08.25 11:43, Kiryl Shutsemau wrote:
> > > On Mon, Aug 11, 2025 at 10:41:08AM +0200, Pankaj Raghav (Samsung) wrote:
> > > > From: Pankaj Raghav <p.raghav@...sung.com>
> > > > 
> > > > Many places in the kernel need to zero out larger chunks, but the
> > > > maximum segment we can zero out at a time by ZERO_PAGE is limited by
> > > > PAGE_SIZE.
> > > > 
> > > > This concern was raised during the review of adding Large Block Size support
> > > > to XFS[2][3].
> > > > 
> > > > This is especially annoying in block devices and filesystems where
> > > > multiple ZERO_PAGEs are attached to the bio in different bvecs. With multipage
> > > > bvec support in block layer, it is much more efficient to send out
> > > > larger zero pages as a part of single bvec.
> > > > 
> > > > Some examples of places in the kernel where this could be useful:
> > > > - blkdev_issue_zero_pages()
> > > > - iomap_dio_zero()
> > > > - vmalloc.c:zero_iter()
> > > > - rxperf_process_call()
> > > > - fscrypt_zeroout_range_inline_crypt()
> > > > - bch2_checksum_update()
> > > > ...
> > > > 
> > > > Usually huge_zero_folio is allocated on demand, and it will be
> > > > deallocated by the shrinker if there are no users of it left. At the moment,
> > > > huge_zero_folio infrastructure refcount is tied to the process lifetime
> > > > that created it. This might not work for bio layer as the completions
> > > > can be async and the process that created the huge_zero_folio might no
> > > > longer be alive. And, one of the main point that came during discussion
> > > > is to have something bigger than zero page as a drop-in replacement.
> > > > 
> > > > Add a config option PERSISTENT_HUGE_ZERO_FOLIO that will always allocate
> > > > the huge_zero_folio, and disable the shrinker so that huge_zero_folio is
> > > > never freed.
> > > > This makes using the huge_zero_folio without having to pass any mm struct and does
> > > > not tie the lifetime of the zero folio to anything, making it a drop-in
> > > > replacement for ZERO_PAGE.
> > > > 
> > > > I have converted blkdev_issue_zero_pages() as an example as a part of
> > > > this series. I also noticed close to 4% performance improvement just by
> > > > replacing ZERO_PAGE with persistent huge_zero_folio.
> > > > 
> > > > I will send patches to individual subsystems using the huge_zero_folio
> > > > once this gets upstreamed.
> > > > 
> > > > Looking forward to some feedback.
> > > 
> > > Why does it need to be compile-time? Maybe whoever needs huge zero page
> > > would just call get_huge_zero_page()/folio() on initialization to get it
> > > pinned?
> > 
> > That's what v2 did, and this way here is cleaner.
> 
> Sorry, RFC v2 I think. It got a bit confusing with series names/versions.

Well, my worry is that 2M can be a high tax for smaller machines.
Compile-time might be cleaner, but it has downsides.

It is also not clear if these users actually need physical HZP or virtual
is enough. Virtual is cheap.

-- 
Kiryl Shutsemau / Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ