lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201026151404.GR20115@casper.infradead.org>
Date:   Mon, 26 Oct 2020 15:14:04 +0000
From:   Matthew Wilcox <willy@...radead.org>
To:     Jan Kara <jack@...e.cz>
Cc:     linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-xfs@...r.kernel.org
Subject: Re: Strange SEEK_HOLE / SEEK_DATA behavior

On Mon, Oct 26, 2020 at 03:57:10PM +0100, Jan Kara wrote:
> Hello!
> 
> When reviewing Matthew's THP patches I've noticed one odd behavior which
> got copied from current iomap seek hole/data helpers. Currently we have:
> 
> # fallocate -l 4096 testfile
> # xfs_io -x -c "seek -h 0" testfile
> Whence	Result
> HOLE	0
> # dd if=testfile bs=4096 count=1 of=/dev/null
> # xfs_io -x -c "seek -h 0" testfile
> Whence	Result
> HOLE	4096
> 
> So once we read from an unwritten extent, the areas with cached pages
> suddently become treated as data. Later when pages get evicted, they become
> treated as holes again. Strictly speaking I wouldn't say this is a bug
> since nobody promises we won't treat holes as data but it looks weird.
> Shouldn't we treat clean pages over unwritten extents still as holes and
> only once the page becomes dirty treat is as data? What do other people
> think?

I think we actually discussed this recently.  Unless I misunderstood
one or both messages:

https://lore.kernel.org/linux-fsdevel/20201014223743.GD7391@dread.disaster.area/

I agree it's not great, but I'm not sure it's worth getting it "right"
by tracking whether a page contains only zeroes.

I have been vaguely thinking about optimising for read-mostly workloads
on sparse files by storing a magic entry that means "use the zero
page" in the page cache instead of a page, like DAX does (only better).
It hasn't risen to the top of my list yet.  Does anyone have a workload
that would benefit from it?

(I don't mean "can anybody construct one"; that's trivially possible.
I mean, do any customers care about the performance of that workload?)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ