linux-kernel - Re: [PATCH 02/11] vfs: Add better VFS support for page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090629055421.GG11450@wotan.suse.de>
Date:	Mon, 29 Jun 2009 07:54:21 +0200
From:	Nick Piggin <npiggin@...e.de>
To:	Jan Kara <jack@...e.cz>, OM@...e.de
Cc:	LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 02/11] vfs: Add better VFS support for page_mkwrite when blocksize < pagesize

On Fri, Jun 26, 2009 at 06:08:51PM +0200, Jan Kara wrote:
> On Fri 26-06-09 14:55:05, Nick Piggin wrote:
> > On Fri, Jun 26, 2009 at 02:21:41PM +0200, Jan Kara wrote:
> > >   So if you have any idea how to better solve this, you are welcome ;).
> > 
> > Ah thanks, the write(2) case I missed. That does get complex to
> > do with the page lock.
> > 
> > I agree with the semantics you are aiming for, and I agree we should
> > not try to allocate blocks when extending i_size.
> > 
> > We actually could update i_size after dropping the page lock in
> > these paths. That would give a window where we can page_mkclean
> > the old partial page before the i_size update.
>   Yes, that would be fine and make things simpler...

Hopefully.

 
> > However this does actually require that we remove the partial-page
> > zeroing that writepage does. I think it does it in order to attempt
> > to write zeroes into the fs even if the app does mmaped writes
> > past i_size... but it is pretty dumb anyway really because the
> > behaviour is undefined anyway so there is no problem if weird
> > stuff gets written there (it should be zeroed out when extending
> > the file anyway), and also there is nothing to prevent races of
> > subsequent mmapped writes before the DMA completes.
>   We definitely don't zero out the last page when extending the file. But
> if we do it, we should be fine as you write. I'll try to write a patch...
> (I'm on vacation next week though so probably after that).

What I mean is that as of today, write(2) is required to hold page lock
of the page it is operating on if it writes anything past i_size. It
must hold that lock until i_size is extended to include the new data.
If it does not hold the lock, then eg. block_write_full_page can zero
out that data incorrectly

        /*
         * The page straddles i_size.  It must be zeroed out on each and every
         * writepage invokation because it may be mmapped.  "A file is mapped
         * in multiples of the page size.  For a file that is not a multiple of
         * the  page size, the remaining memory is zeroed when mapped, and
         * writes to that region are not written out to the file."
         */

But I argue this is bogus anyway because it is completely racy, and
it should be undefined behaviour anyway. So I think it would be fine
to remove it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/