lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 26 Oct 2020 17:48:10 +0100
From:   Jan Kara <jack@...e.cz>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
        linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: Strange SEEK_HOLE / SEEK_DATA behavior

On Mon 26-10-20 15:14:04, Matthew Wilcox wrote:
> On Mon, Oct 26, 2020 at 03:57:10PM +0100, Jan Kara wrote:
> > Hello!
> > 
> > When reviewing Matthew's THP patches I've noticed one odd behavior which
> > got copied from current iomap seek hole/data helpers. Currently we have:
> > 
> > # fallocate -l 4096 testfile
> > # xfs_io -x -c "seek -h 0" testfile
> > Whence	Result
> > HOLE	0
> > # dd if=testfile bs=4096 count=1 of=/dev/null
> > # xfs_io -x -c "seek -h 0" testfile
> > Whence	Result
> > HOLE	4096
> > 
> > So once we read from an unwritten extent, the areas with cached pages
> > suddently become treated as data. Later when pages get evicted, they become
> > treated as holes again. Strictly speaking I wouldn't say this is a bug
> > since nobody promises we won't treat holes as data but it looks weird.
> > Shouldn't we treat clean pages over unwritten extents still as holes and
> > only once the page becomes dirty treat is as data? What do other people
> > think?
> 
> I think we actually discussed this recently.  Unless I misunderstood
> one or both messages:
> 
> https://lore.kernel.org/linux-fsdevel/20201014223743.GD7391@dread.disaster.area/

Thanks for the link. That indeed explains it, the concern is that if we'd
check for PageDirty like I suggested, then it would be racy (page could
have been written out just before we found it but after we've received
block mapping from the filesystem). So using PageUptodate is less racy
(although still somewhat racy because page could be also reclaimed).

> I agree it's not great, but I'm not sure it's worth getting it "right"
> by tracking whether a page contains only zeroes.

Yeah, I don't think it's worth it just for this.

> I have been vaguely thinking about optimising for read-mostly workloads
> on sparse files by storing a magic entry that means "use the zero
> page" in the page cache instead of a page, like DAX does (only better).
> It hasn't risen to the top of my list yet.  Does anyone have a workload
> that would benefit from it?
> 
> (I don't mean "can anybody construct one"; that's trivially possible.
> I mean, do any customers care about the performance of that workload?)

No workload comes to my mind now.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists