[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0133ed11-bb90-9337-e787-66851cbbc379@redhat.com>
Date: Fri, 8 Apr 2022 10:47:10 +0800
From: Xiubo Li <xiubli@...hat.com>
To: Luís Henriques <lhenriques@...e.de>,
Jeff Layton <jlayton@...nel.org>,
Ilya Dryomov <idryomov@...il.com>
Cc: ceph-devel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] ceph: invalidate pages when doing direct/sync writes
On 4/7/22 11:15 PM, Luís Henriques wrote:
> When doing a direct/sync write, we need to invalidate the page cache in
> the range being written to. If we don't do this, the cache will include
> invalid data as we just did a write that avoided the page cache.
>
> Signed-off-by: Luís Henriques <lhenriques@...e.de>
> ---
> fs/ceph/file.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> Changes since v3:
> - Dropped initial call to invalidate_inode_pages2_range()
> - Added extra comment to document invalidation
>
> Changes since v2:
> - Invalidation needs to be done after a write
>
> Changes since v1:
> - Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range
> - Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 5072570c2203..97f764b2fbdd 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1606,11 +1606,6 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> return ret;
>
> ceph_fscache_invalidate(inode, false);
> - ret = invalidate_inode_pages2_range(inode->i_mapping,
> - pos >> PAGE_SHIFT,
> - (pos + count - 1) >> PAGE_SHIFT);
> - if (ret < 0)
> - dout("invalidate_inode_pages2_range returned %d\n", ret);
>
> while ((len = iov_iter_count(from)) > 0) {
> size_t left;
> @@ -1938,6 +1933,20 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> break;
> }
> ceph_clear_error_write(ci);
> +
> + /*
> + * we need to invalidate the page cache here, otherwise the
> + * cache will include invalid data in direct/sync writes.
> + */
> + ret = invalidate_inode_pages2_range(
> + inode->i_mapping,
> + pos >> PAGE_SHIFT,
> + (pos + len - 1) >> PAGE_SHIFT);
> + if (ret < 0) {
> + dout("invalidate_inode_pages2_range returned %d\n",
> + ret);
> + ret = 0;
For this, IMO it's not safe. If we just ignore it the pagecache will
still have invalid data.
I think what the 'ceph_direct_read_write()' does is more correct, it
will make sure all the dirty pages are writeback from the pagecaches by
using 'invalidate_inode_pages2_range()' without blocking and later will
do the invalidate blocked by using 'truncate_inode_pages_range()' if
some pages are not unmaped in 'invalidate_inode_pages2_range()' when EBUSY.
This can always be sure that the pagecache has no invalid data after
write finishes. I think why it use the truncate helper here is because
it's safe and there shouldn't have any buffer write happen for DIO ?
But from my understanding the 'ceph_direct_read_write()' is still buggy.
What if the page fault happen just after 'truncate_inode_pages_range()'
? Will this happen ? Should we leave this to use the file lock to
guarantee it in user space ?
Thought ?
-- Xiubo
> + }
> pos += len;
> written += len;
> dout("sync_write written %d\n", written);
>
Powered by blists - more mailing lists