lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 27 May 2022 09:08:47 +0800
From:   Xiubo Li <xiubli@...hat.com>
To:     Jeff Layton <jlayton@...nel.org>,
        Luís Henriques <lhenriques@...e.de>,
        Ilya Dryomov <idryomov@...il.com>
Cc:     ceph-devel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2] ceph: prevent a client from exceeding the MDS
 maximum xattr size


On 5/27/22 8:44 AM, Jeff Layton wrote:
> On Fri, 2022-05-27 at 08:36 +0800, Xiubo Li wrote:
>> On 5/27/22 2:39 AM, Jeff Layton wrote:
>>> On Wed, 2022-05-25 at 18:24 +0100, Luís Henriques wrote:
>>>> The MDS tries to enforce a limit on the total key/values in extended
>>>> attributes.  However, this limit is enforced only if doing a synchronous
>>>> operation (MDS_OP_SETXATTR) -- if we're buffering the xattrs, the MDS
>>>> doesn't have a chance to enforce these limits.
>>>>
>>>> This patch adds support for an extra feature bit that will allow the
>>>> client to get the MDS max_xattr_pairs_size setting in the session message.
>>>> Then, when setting an xattr, the kernel will revert to do a synchronous
>>>> operation if that maximum size is exceeded.
>>>>
>>>> While there, fix a dout() that would trigger a printk warning:
>>>>
>>>> [   98.718078] ------------[ cut here ]------------
>>>> [   98.719012] precision 65536 too large
>>>> [   98.719039] WARNING: CPU: 1 PID: 3755 at lib/vsprintf.c:2703 vsnprintf+0x5e3/0x600
>>>> ...
>>>>
>>>> URL: https://tracker.ceph.com/issues/55725
>>>> Signed-off-by: Luís Henriques <lhenriques@...e.de>
>>>> ---
>>>>    fs/ceph/mds_client.c | 12 ++++++++++++
>>>>    fs/ceph/mds_client.h | 15 ++++++++++++++-
>>>>    fs/ceph/xattr.c      | 12 ++++++++----
>>>>    3 files changed, 34 insertions(+), 5 deletions(-)
>>>>
>>>> * Changes since v1
>>>>
>>>> Added support for new feature bit to get the MDS max_xattr_pairs_size
>>>> setting.
>>>>
>>>> Also note that this patch relies on a patch that hasn't been merged yet
>>>> ("ceph: use correct index when encoding client supported features"),
>>>> otherwise the new feature bit won't be correctly encoded.
>>>>
>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>> index 35597fafb48c..87a25b7cf496 100644
>>>> --- a/fs/ceph/mds_client.c
>>>> +++ b/fs/ceph/mds_client.c
>>>> @@ -3500,6 +3500,7 @@ static void handle_session(struct ceph_mds_session *session,
>>>>    	struct ceph_mds_session_head *h;
>>>>    	u32 op;
>>>>    	u64 seq, features = 0;
>>>> +	u64 max_xattr_pairs_size = 0;
>>>>    	int wake = 0;
>>>>    	bool blocklisted = false;
>>>>    
>>>> @@ -3545,6 +3546,9 @@ static void handle_session(struct ceph_mds_session *session,
>>>>    		}
>>>>    	}
>>>>    
>>>> +	if (msg_version >= 6)
>>>> +		ceph_decode_64_safe(&p, end, max_xattr_pairs_size, bad);
>>>> +
>>>>    	mutex_lock(&mdsc->mutex);
>>>>    	if (op == CEPH_SESSION_CLOSE) {
>>>>    		ceph_get_mds_session(session);
>>>> @@ -3552,6 +3556,12 @@ static void handle_session(struct ceph_mds_session *session,
>>>>    	}
>>>>    	/* FIXME: this ttl calculation is generous */
>>>>    	session->s_ttl = jiffies + HZ*mdsc->mdsmap->m_session_autoclose;
>>>> +
>>>> +	if (max_xattr_pairs_size && (op == CEPH_SESSION_OPEN)) {
>>>> +		dout("Changing MDS max xattrs pairs size: %llu => %llu\n",
>>>> +		     mdsc->max_xattr_pairs_size, max_xattr_pairs_size);
>>>> +		mdsc->max_xattr_pairs_size = max_xattr_pairs_size;
>>>> +	}
>>>>    	mutex_unlock(&mdsc->mutex);
>>>>    
>>>>    	mutex_lock(&session->s_mutex);
>>>> @@ -4761,6 +4771,8 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
>>>>    	strscpy(mdsc->nodename, utsname()->nodename,
>>>>    		sizeof(mdsc->nodename));
>>>>    
>>>> +	mdsc->max_xattr_pairs_size = MDS_MAX_XATTR_PAIRS_SIZE;
>>>> +
>>>>    	fsc->mdsc = mdsc;
>>>>    	return 0;
>>>>    
>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>>> index ca32f26f5eed..3db777df6d88 100644
>>>> --- a/fs/ceph/mds_client.h
>>>> +++ b/fs/ceph/mds_client.h
>>>> @@ -29,8 +29,11 @@ enum ceph_feature_type {
>>>>    	CEPHFS_FEATURE_MULTI_RECONNECT,
>>>>    	CEPHFS_FEATURE_DELEG_INO,
>>>>    	CEPHFS_FEATURE_METRIC_COLLECT,
>>>> +	CEPHFS_FEATURE_ALTERNATE_NAME,
>>>> +	CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
>>>> +	CEPHFS_FEATURE_MAX_XATTR_PAIRS_SIZE,
>>> Having to make this feature-bit-dependent kind of sucks. I wonder if it
>>> could be avoided...
>>>
>>> A question:
>>>
>>> How do the MDS's discover this setting? Do they get it from the mons? If
>>> so, I wonder if there is a way for the clients to query the mon for this
>>> instead of having to extend the MDS protocol?
>> It sounds like what the "max_file_size" does, which will be recorded in
>> the 'mdsmap'.
>>
>> While currently the "max_xattr_pairs_size" is one MDS's option for each
>> daemon and could set different values for each MDS.
>>
>>
> Right, but the MDS's in general don't use local config files. Where are
> these settings stored? Could the client (potentially) query for them?

AFAIK, each process in ceph it will have its own copy of the 
"CephContext". I don't know how to query all of them but I know there 
have some API such as "rados_conf_set/get" could do similar things.

Not sure whether will it work in our case.

>
> I'm pretty sure the client does fetch and parse the mdsmap. If it's
> there then it could grab the setting for all of the MDS's at mount time
> and settle on the lowest one.
>
> I think a solution like that might be more resilient than having to
> fiddle with feature bits and such...

Yeah, IMO just making this option to be like the "max_file_size" is more 
appropriate.

-- Xiubo

>
>>>>    
>>>> -	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
>>>> +	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_MAX_XATTR_PAIRS_SIZE,
>>>>    };
>>>>    
>>>>    /*
>>>> @@ -45,9 +48,16 @@ enum ceph_feature_type {
>>>>    	CEPHFS_FEATURE_MULTI_RECONNECT,		\
>>>>    	CEPHFS_FEATURE_DELEG_INO,		\
>>>>    	CEPHFS_FEATURE_METRIC_COLLECT,		\
>>>> +	CEPHFS_FEATURE_MAX_XATTR_PAIRS_SIZE,	\
>>>>    }
>>>>    #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
>>>>    
>>>> +/*
>>>> + * Maximum size of xattrs the MDS can handle per inode by default.  This
>>>> + * includes the attribute name and 4+4 bytes for the key/value sizes.
>>>> + */
>>>> +#define MDS_MAX_XATTR_PAIRS_SIZE (1<<16) /* 64K */
>>>> +
>>>>    /*
>>>>     * Some lock dependencies:
>>>>     *
>>>> @@ -404,6 +414,9 @@ struct ceph_mds_client {
>>>>    	struct rb_root		quotarealms_inodes;
>>>>    	struct mutex		quotarealms_inodes_mutex;
>>>>    
>>>> +	/* maximum aggregate size of extended attributes on a file */
>>>> +	u64			max_xattr_pairs_size;
>>>> +
>>>>    	/*
>>>>    	 * snap_rwsem will cover cap linkage into snaprealms, and
>>>>    	 * realm snap contexts.  (later, we can do per-realm snap
>>>> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
>>>> index 8c2dc2c762a4..175a8c1449aa 100644
>>>> --- a/fs/ceph/xattr.c
>>>> +++ b/fs/ceph/xattr.c
>>>> @@ -1086,7 +1086,7 @@ static int ceph_sync_setxattr(struct inode *inode, const char *name,
>>>>    			flags |= CEPH_XATTR_REMOVE;
>>>>    	}
>>>>    
>>>> -	dout("setxattr value=%.*s\n", (int)size, value);
>>>> +	dout("setxattr value size: %ld\n", size);
>>>>    
>>>>    	/* do request */
>>>>    	req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
>>>> @@ -1184,8 +1184,14 @@ int __ceph_setxattr(struct inode *inode, const char *name,
>>>>    	spin_lock(&ci->i_ceph_lock);
>>>>    retry:
>>>>    	issued = __ceph_caps_issued(ci, NULL);
>>>> -	if (ci->i_xattrs.version == 0 || !(issued & CEPH_CAP_XATTR_EXCL))
>>>> +	required_blob_size = __get_required_blob_size(ci, name_len, val_len);
>>>> +	if ((ci->i_xattrs.version == 0) || !(issued & CEPH_CAP_XATTR_EXCL) ||
>>>> +	    (required_blob_size >= mdsc->max_xattr_pairs_size)) {
>>>> +		dout("%s do sync setxattr: version: %llu size: %d max: %llu\n",
>>>> +		     __func__, ci->i_xattrs.version, required_blob_size,
>>>> +		     mdsc->max_xattr_pairs_size);
>>>>    		goto do_sync;
>>>> +	}
>>>>    
>>>>    	if (!lock_snap_rwsem && !ci->i_head_snapc) {
>>>>    		lock_snap_rwsem = true;
>>>> @@ -1201,8 +1207,6 @@ int __ceph_setxattr(struct inode *inode, const char *name,
>>>>    	     ceph_cap_string(issued));
>>>>    	__build_xattrs(inode);
>>>>    
>>>> -	required_blob_size = __get_required_blob_size(ci, name_len, val_len);
>>>> -
>>>>    	if (!ci->i_xattrs.prealloc_blob ||
>>>>    	    required_blob_size > ci->i_xattrs.prealloc_blob->alloc_len) {
>>>>    		struct ceph_buffer *blob;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ