linux-kernel - Re: [PATCH] ceph: Update the pages in fscache in writepages() path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.1311051543170.9474@cobra.newdream.net>
Date:	Tue, 5 Nov 2013 15:56:11 -0800 (PST)
From:	Sage Weil <sage@...tank.com>
To:	Milosz Tanski <milosz@...in.com>
cc:	Li Wang <liwang@...ntukylin.com>,
	ceph-devel <ceph-devel@...r.kernel.org>,
	"linux-cachefs@...hat.com" <linux-cachefs@...hat.com>,
	linux-kernel@...r.kernel.org, Min Chen <minchen@...ntukylin.com>,
	Yunchuan Wen <yunchuanwen@...ntukylin.com>
Subject: Re: [PATCH] ceph: Update the pages in fscache in writepages() path

On Tue, 5 Nov 2013, Milosz Tanski wrote:
> Li,
> 
> First, sorry for the late reply on this.
> 
> Currently fscache is only supported for files that are open in read
> only mode. I originally was going to let fscache cache in the write
> path as well as long as the file was open in with O_LAZY. I abandoned
> that idea. When a user opens the file in O_LAZY we can cache things
> locally with the assumption that the user will care of the
> synchronization in some other manner. There is no way of invalidating
> a subset of the pages in object cached by fscache, there is no way we
> can make O_LAZY work well.
> 
> The ceph_readpage_to_fscache() in writepage has no effect and it
> should be removed. ceph_readpage_to_fscache() calls cache_valid() to
> see if it should perform the page save, and since the file can't have
> a CACHE cap at the point in time it doesn't do it.

(Hmm, Dusting off my understanding of fscache and reading 
fs/ceph/cache.c; watch out!)  It looks like cache_valid is

static inline int cache_valid(struct ceph_inode_info *ci)
{
	return ((ceph_caps_issued(ci) & CEPH_CAP_FILE_CACHE) &&
		(ci->i_fscache_gen == ci->i_rdcache_gen));
}

and in the FILE_EXCL case, the MDS will issue CACHE|BUFFER caps.  But I 
think the aux key (size+mtime) will prevent any use of the cache as soon 
as the first write happens and mtime changes, right?

I think that in order to make this work, we need to fix/create a 
file_version (or something similar) field in the (mds) inode_t to have 
some useful value.  I.e., increment it any time

 - a different client/writer comes along
 - a file is modified by the mds (e.g., truncated or recovered)

but allow it to otherwise remain the same as long as only a single client 
is working with the file exclusively.  This will be more precise than the 
(size, mtime) check that is currently used, and would remain valid when a 
single client opens the same file for exclusive read/write multiple times 
but there are no other intervening changes.

Milosz, if that were in place, is there any reason not to wire up 
writepage and allow the fscache to be used write-through?

sage




> 
> Thanks,
> - Milosz
> 
> On Thu, Oct 31, 2013 at 11:56 PM, Li Wang <liwang@...ntukylin.com> wrote:
> > Currently, the pages in fscache only are updated in writepage() path,
> > add the process in writepages().
> >
> > Signed-off-by: Min Chen <minchen@...ntukylin.com>
> > Signed-off-by: Li Wang <liwang@...ntukylin.com>
> > Signed-off-by: Yunchuan Wen <yunchuanwen@...ntukylin.com>
> > ---
> >  fs/ceph/addr.c |    8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index 6df8bd4..cc57911 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -746,7 +746,7 @@ retry:
> >
> >         while (!done && index <= end) {
> >                 int num_ops = do_sync ? 2 : 1;
> > -               unsigned i;
> > +               unsigned i, j;
> >                 int first;
> >                 pgoff_t next;
> >                 int pvec_pages, locked_pages;
> > @@ -894,7 +894,6 @@ get_more_pages:
> >                 if (!locked_pages)
> >                         goto release_pvec_pages;
> >                 if (i) {
> > -                       int j;
> >                         BUG_ON(!locked_pages || first < 0);
> >
> >                         if (pvec_pages && i == pvec_pages &&
> > @@ -924,7 +923,10 @@ get_more_pages:
> >
> >                 osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0,
> >                                                         !!pool, false);
> > -
> > +               for(j = 0; j < locked_pages; j++) {
> > +                       struct page *page = pages[j];
> > +                       ceph_readpage_to_fscache(inode, page);
> > +               }
> >                 pages = NULL;   /* request message now owns the pages array */
> >                 pool = NULL;
> >
> > --
> > 1.7.9.5
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Milosz Tanski
> CTO
> 10 East 53rd Street, 37th floor
> New York, NY 10022
> 
> p: 646-253-9055
> e: milosz@...in.com
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/