linux-kernel - Re: [RFC PATCH 28/35] netfs: Adjust group handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3173668.1742484160@warthog.procyon.org.uk>
Date: Thu, 20 Mar 2025 15:22:40 +0000
From: David Howells <dhowells@...hat.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@....com>
Cc: dhowells@...hat.com, Alex Markuze <amarkuze@...hat.com>,
    "slava@...eyko.com" <slava@...eyko.com>,
    "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
    "idryomov@...il.com" <idryomov@...il.com>,
    "jlayton@...nel.org" <jlayton@...nel.org>,
    "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
    "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
    "dongsheng.yang@...ystack.cn" <dongsheng.yang@...ystack.cn>,
    "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 28/35] netfs: Adjust group handling

Viacheslav Dubeyko <Slava.Dubeyko@....com> wrote:

> >  		if (unlikely(group != netfs_group) &&
> > -		    group != NETFS_FOLIO_COPY_TO_CACHE)
> > +		    group != NETFS_FOLIO_COPY_TO_CACHE &&
> > +		    (group || folio_test_dirty(folio)))
> 
> I am trying to follow to this complex condition. Is it possible case that
> folio is dirty but we don't flush the content?

It's slightly complicated by fscache.

The way I have made local caching work for things that use netfslib fully is
that the writeback code copies the data to the cache.  We achieve this by
marking the pages dirty when we read them from the server.

However, so that we don't *also* write the clean data back to the server, the
writeback group[*] field is set to a special value (NETFS_FOLIO_COPY_TO_CACHE)
and we make the assumption that the writeback group is only actually going to
be used by the filesystem if the page is actually modified - in which case the
writeback group field is overwritten.

[*] This is either folio->private or in a netfs_folio struct attached to
folio->private.  Note that folio->private is set to be removed in the future.

In the event that a page is modified it will be written back to the server(s)
and the cache, assuming there is a cache.  Also note the netfs_io_stream
struct.  There are two in the netfs_io_request struct and these are used to
separately manage and divide up the writes to a server and to the cache.  I've
also left the possibility open that we can have more than two streams in the
event that we need to write the data to multiple servers.

Further, another reason for making writeback write the data to both the cache
and the server is that if you are using content encryption, the data is
encrypted and then the ciphertext is written to both the server and the cache.

> Is it possible case that folio is dirty but we don't flush the content?

Anyway, to answer the question more specifically, yes.  If the folio is dirty
and in the same writeback group (e.g. most recent ceph snap context), then we
can presumably keep modifying it.

And if the folio is marked dirty and is marked NETFS_FOLIO_COPY_TO_CACHE, then
we can just overwrite it, replace or clear the NETFS_FOLIO_COPY_TO_CACHE mark
and then it just becomes a regular dirty page.  It will get written to fscache
either way.

> > +		if ((++flush_counter & 0xf) == 0xf)
> > +			msleep(10);
> 
> Do we really need to use sleep? And why is it 10 ms? And even if we would
> like to use sleep, then it is better to introduce the named constant. And
> what is teh justification for 10 ms?

At the moment, debugging and stopping it from running wild in a tight loop
when a mistake is made.  Remember: at this point, this is a WIP.

But in reality, we might see this if we're indulging in cache ping-pong
between two clients.  I'm not sure how this might be mitigated in the ceph
environment - if that's not already done.

> > -		kdebug("wrong group");
> > +		kdebug("wrong group %px != %px", fgroup, wreq->group);
> 
> I believe to use the %px is not very good practice. Do we really need to show
> the real pointer?

At some point I need to test interference from someone cranking the snaps and
I'll probably need this then - though it might be better to make a tracepoint
for it.

> > +/*
> > + * Get a ref on a netfs group attached to a dirty page (e.g. a ceph snap).
> > + */
> > +static inline struct netfs_group *netfs_get_group(struct netfs_group *netfs_group)
> > +{
> > +	if (netfs_group && netfs_group != NETFS_FOLIO_COPY_TO_CACHE)
> 
> The netfs_group is a pointer. Is it correct comparison of pointer with the
> NETFS_FOLIO_COPY_TO_CACHE constant?

This constant?

#define NETFS_FOLIO_COPY_TO_CACHE ((struct netfs_group *)0x356UL) /* Write to the cache only */

Yes.  See explanation above.

David