lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 20 Feb 2024 10:26:48 +0000
From: Daniel Gomez <da.gomez@...sung.com>
To: Hugh Dickins <hughd@...gle.com>
CC: "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
	"brauner@...nel.org" <brauner@...nel.org>, "jack@...e.cz" <jack@...e.cz>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "dagmcr@...il.com"
	<dagmcr@...il.com>, "linux-fsdevel@...r.kernel.org"
	<linux-fsdevel@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
	"willy@...radead.org" <willy@...radead.org>, "hch@...radead.org"
	<hch@...radead.org>, "mcgrof@...nel.org" <mcgrof@...nel.org>, Pankaj Raghav
	<p.raghav@...sung.com>, "gost.dev@...sung.com" <gost.dev@...sung.com>
Subject: Re: [RFC PATCH 0/9] shmem: fix llseek in hugepages

On Mon, Feb 19, 2024 at 02:15:47AM -0800, Hugh Dickins wrote:
> On Wed, 14 Feb 2024, Daniel Gomez wrote:
> > On Fri, Feb 09, 2024 at 02:29:01PM +0000, Daniel Gomez wrote:
> > > Hi,
> > > 
> > > The following series fixes the generic/285 and generic/436 fstests for huge
> > > pages (huge=always). These are tests for llseek (SEEK_HOLE and SEEK_DATA).
> > > 
> > > The implementation to fix above tests is based on iomap per-block tracking for
> > > uptodate and dirty states but applied to shmem uptodate flag.
> > 
> > Hi Hugh, Andrew,
> > 
> > Could you kindly provide feedback on these patches/fixes? I'd appreciate your
> > input on whether we're headed in the right direction, or maybe not.
> 
> I am sorry, Daniel, but I see this series as misdirected effort.
> 
> We do not want to add overhead to tmpfs and the kernel, just to pass two
> tests which were (very reasonably) written for fixed block size, before
> the huge page possibility ever came in.

Is this overhead a concern in performance? Can you clarify what do you mean?

I guess is a matter of which kind of granularity we want for a filesystem. Then,
we can either adapt the test to work with different block sizes or change the
filesystem to support this fixed and minimum block size.

I believe the tests should remain unchanged if we still want to operate at this
fixed block size, regardless of how the memory is managed in the filesystem side
(whether is a huge page or a large folio with arbitrary order).

> 
> If one opts for transparent huge pages in the filesystem, then of course
> the dividing line between hole and data becomes more elastic than before.

I'm uncertain when we may want to be more elastic. In the case of XFS with iomap
and support for large folios, for instance, we are 'less' elastic than here So,
what exactly is the rationale behind wanting shmem to be 'more elastic'?

If we ever move shmem to large folios [1], and we use them in an oportunistic way,
then we are going to be more elastic in the default path.

[1] https://lore.kernel.org/all/20230919135536.2165715-1-da.gomez@samsung.com

In addition, I think that having this block granularity can benefit quota
support and the reclaim path. For example, in the generic/100 fstest, around
~26M of data are reported as 1G of used disk when using tmpfs with huge pages.

> 
> It would be a serious bug if lseek ever reported an area of non-0 data as
> in a hole; but I don't think that is what generic/285 or generic/436 find.

I agree this is not the case here. We mark the entire folio (huge page) as
uptodate, hence we report that full area as data, making steps of 2M.

> 
> Beyond that, "man 2 lseek" is very forgiving of filesystem implementation.

Thanks for bringing that up. This got me thinking along the same lines as
before, wanting to understand where we want to draw the line and the reasons
benhind it.

> 
> I'll send you my stack of xfstests patches (which, as usual, I cannot
> afford the time now to re-review and post): there are several tweaks to
> seek_sanity_test in there for tmpfs huge pages, along with other fixes
> for tmpfs (and some fixes to suit an old 32-bit build environment).
> 
> With those tweaks, generic/285 and generic/436 and others (but not all)
> have been passing on huge tmpfs for several years.  If you see something
> you'd like to add your name to in that stack, or can improve upon, please
> go ahead and post to the fstests list (Cc me).

Thanks for the patches Hugh. I see how you are making the seeking tests a bit
more 'elastic'. I will post them shortly and see if we can make sure we can
minimize the number of failures [2].

In kdevops [3], we are discussing the possibility to add tmpfs to 0-day and
track for any regressions.

[2] https://github.com/linux-kdevops/kdevops/tree/master/workflows/fstests/expunges/6.8.0-rc2/tmpfs/unassigned
[3] https://github.com/linux-kdevops/kdevops

> 
> Thanks,
> Hugh
> 
> > 
> > Thanks,
> > Daniel
> > 
> > > 
> > > The motivation is to avoid any regressions in tmpfs once it gets support for
> > > large folios.
> > > 
> > > Testing with kdevops
> > > Testing has been performed using fstests with kdevops for the v6.8-rc2 tag.
> > > There are currently different profiles supported [1] and for each of these,
> > > a baseline of 20 loops has been performed with the following failures for
> > > hugepages profiles: generic/080, generic/126, generic/193, generic/245,
> > > generic/285, generic/436, generic/551, generic/619 and generic/732.
> > > 
> > > If anyone interested, please find all of the failures in the expunges directory:
> > > https://protect2.fireeye.com/v1/url?k=9a7b8131-fbf09401-9a7a0a7e-000babffaa23-2e83e8b120fdf45e&q=1&e=e25c026a-1bb5-45f4-8acb-884e4a5e4d91&u=https%3A%2F%2Fgithub.com%2Flinux-kdevops%2Fkdevops%2Ftree%2Fmaster%2Fworkflows%2Ffstests%2Fexpunges%2F6.8.0-rc2%2Ftmpfs%2Funassigned
> > > 
> > > [1] tmpfs profiles supported in kdevops: default, tmpfs_noswap_huge_never,
> > > tmpfs_noswap_huge_always, tmpfs_noswap_huge_within_size,
> > > tmpfs_noswap_huge_advise, tmpfs_huge_always, tmpfs_huge_within_size and
> > > tmpfs_huge_advise.
> > > 
> > > More information:
> > > https://protect2.fireeye.com/v1/url?k=70096f39-11827a09-7008e476-000babffaa23-4c0e0d7b2ec659b6&q=1&e=e25c026a-1bb5-45f4-8acb-884e4a5e4d91&u=https%3A%2F%2Fgithub.com%2Flinux-kdevops%2Fkdevops%2Ftree%2Fmaster%2Fworkflows%2Ffstests%2Fexpunges%2F6.8.0-rc2%2Ftmpfs%2Funassigned
> > > 
> > > All the patches has been tested on top of v6.8-rc2 and rebased onto latest next
> > > tag available (next-20240209).
> > > 
> > > Daniel
> > > 
> > > Daniel Gomez (8):
> > >   shmem: add per-block uptodate tracking for hugepages
> > >   shmem: move folio zero operation to write_begin()
> > >   shmem: exit shmem_get_folio_gfp() if block is uptodate
> > >   shmem: clear_highpage() if block is not uptodate
> > >   shmem: set folio uptodate when reclaim
> > >   shmem: check if a block is uptodate before splice into pipe
> > >   shmem: clear uptodate blocks after PUNCH_HOLE
> > >   shmem: enable per-block uptodate
> > > 
> > > Pankaj Raghav (1):
> > >   splice: don't check for uptodate if partially uptodate is impl
> > > 
> > >  fs/splice.c |  17 ++-
> > >  mm/shmem.c  | 340 ++++++++++++++++++++++++++++++++++++++++++++++++----
> > >  2 files changed, 332 insertions(+), 25 deletions(-)
> > > 
> > > -- 
> > > 2.43.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ