[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxgofERYwN7AfYFWqQMpQH5y3LV+6UuGfjU29gZXNf7-vQ@mail.gmail.com>
Date: Sun, 2 Oct 2022 10:08:51 +0300
From: Amir Goldstein <amir73il@...il.com>
To: Jeff Layton <jlayton@...nel.org>
Cc: tytso@....edu, adilger.kernel@...ger.ca, djwong@...nel.org,
david@...morbit.com, trondmy@...merspace.com, neilb@...e.de,
viro@...iv.linux.org.uk, zohar@...ux.ibm.com, xiubli@...hat.com,
chuck.lever@...cle.com, lczerner@...hat.com, jack@...e.cz,
bfields@...ldses.org, brauner@...nel.org, fweimer@...hat.com,
linux-btrfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, ceph-devel@...r.kernel.org,
linux-ext4@...r.kernel.org, linux-nfs@...r.kernel.org,
linux-xfs@...r.kernel.org
Subject: Re: [PATCH v6 8/9] vfs: update times after copying data in __generic_file_write_iter
On Fri, Sep 30, 2022 at 2:30 PM Jeff Layton <jlayton@...nel.org> wrote:
>
> The c/mtime and i_version currently get updated before the data is
> copied (or a DIO write is issued), which is problematic for NFS.
>
> READ+GETATTR can race with a write (even a local one) in such a way as
> to make the client associate the state of the file with the wrong change
> attribute. That association can persist indefinitely if the file sees no
> further changes.
>
> Move the setting of times to the bottom of the function in
> __generic_file_write_iter and only update it if something was
> successfully written.
>
This solution is wrong for several reasons:
1. There is still file_update_time() in ->page_mkwrite() so you haven't
solved the problem completely
2. The other side of the coin is that post crash state is more likely to end
up data changes without mtime/ctime change
If I read the problem description correctly, then a solution that invalidates
the NFS cache before AND after the write would be acceptable. Right?
Would an extra i_version bump after the write solve the race?
> If the time update fails, log a warning once, but don't fail the write.
> All of the existing callers use update_time functions that don't fail,
> so we should never trip this.
>
> Signed-off-by: Jeff Layton <jlayton@...nel.org>
> ---
> mm/filemap.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 15800334147b..72c0ceb75176 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3812,10 +3812,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> if (err)
> goto out;
>
> - err = file_update_time(file);
> - if (err)
> - goto out;
> -
> if (iocb->ki_flags & IOCB_DIRECT) {
> loff_t pos, endbyte;
>
> @@ -3868,6 +3864,19 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> iocb->ki_pos += written;
> }
> out:
> + if (written > 0) {
> + err = file_update_time(file);
> + /*
> + * There isn't much we can do at this point if updating the
> + * times fails after a successful write. The times and i_version
> + * should still be updated in the inode, and it should still be
> + * marked dirty, so hopefully the next inode update will catch it.
> + * Log a warning once so we have a record that something untoward
> + * has occurred.
> + */
> + WARN_ONCE(err, "Failed to update m/ctime after write: %ld\n", err);
pr_warn_once() please - this is not a programming assertion.
Thanks,
Amir.
Powered by blists - more mailing lists