lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <EA72E40F-1AE0-4157-85F5-24AB58AE6A6B@dilger.ca>
Date:   Mon, 17 Apr 2017 13:19:18 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     Alexey Lyashkov <alexey.lyashkov@...il.com>
Cc:     Ts'o Theodore <tytso@....edu>,
        linux-ext4 <linux-ext4@...r.kernel.org>,
        James Simmons <jsimmons@...radead.org>
Subject: Re: [PATCH] ext4: xattr-in-inode support

On Apr 16, 2017, at 1:09 PM, Alexey Lyashkov <alexey.lyashkov@...il.com> wrote:
> 
> Andreas,
> 
> I don’t sure it’s good idea to allocate one more inode to store a large EA.
> It dramatically decrease a speed with accessing a EA data in this case.
> And now we have already a hit a limit of inode count with large disks.
> I think it code need to be rewritten to use an special extents to store a
> large EA, as it avoid so much problems related to bad credits while unlinking
> a parent inode, some kind problems with integer overflow as backlink stored on mdata field, and other.
> 
> I know we don’t hit a problems in this area for last year, but anyway - i prefer a different solution.

We are of course not able to change the format of the large xattrs used in
existing filesystems for many years already (the first version of this
feature was in use since 2008).

It would be great if you can work with Ted to implement an improved solution
for large xattrs for ext4.  Since the new version would be using a different
feature flag, with some small amount of compatibility effort there is no
reason why the two cannot exist at the same time, and at some point migrate
from the old feature xattrs to the new one via e2fsck or userspace tool.

Cheers, Andreas

>> 13 апр. 2017 г., в 22:58, Andreas Dilger <adilger@...ger.ca> написал(а):
>> 
>> Large xattr support is implemented for EXT4_FEATURE_INCOMPAT_EA_INODE.
>> 
>> If the size of an xattr value is larger than will fit in a single
>> external block, then the xattr value will be saved into the body
>> of an external xattr inode.
>> 
>> The also helps support a larger number of xattr, since only the headers
>> will be stored in the in-inode space or the single external block.
>> 
>> The inode is referenced from the xattr header via "e_value_inum",
>> which was formerly "e_value_block", but that field was never used.
>> The e_value_size still contains the xattr size so that listing
>> xattrs does not need to look up the inode if the data is not accessed.
>> 
>> struct ext4_xattr_entry {
>> 	__u8	e_name_len;	/* length of name */
>> 	__u8	e_name_index;	/* attribute name index */
>> 	__le16	e_value_offs;	/* offset in disk block of value */
>> 	__le32	e_value_inum;	/* inode in which value is stored */
>> 	__le32	e_value_size;	/* size of attribute value */
>> 	__le32	e_hash;		/* hash value of name and value */
>> 	char	e_name[0];	/* attribute name */
>> };
>> 
>> The xattr inode is marked with the EXT4_EA_INODE_FL flag and also
>> holds a back-reference to the owning inode in its i_mtime field,
>> allowing the ext4/e2fsck to verify the correct inode is accessed.
>> 
>> Lustre-Jira: https://jira.hpdd.intel.com/browse/LU-80
>> Lustre-bugzilla: https://bugzilla.lustre.org/show_bug.cgi?id=4424
>> Signed-off-by: Kalpak Shah <kalpak.shah@....com>
>> Signed-off-by: James Simmons <uja.ornl@...il.com>
>> Signed-off-by: Andreas Dilger <andreas.dilger@...el.com>
>> ---
>> 
>> Per recent discussion, here is the latest version of the xattr-in-inode
>> patch.  This has just been freshly updated to the current kernel (from
>> 4.4) and has not even been compiled, so it is unlikely to work properly.
>> The functional parts of the feature and on-disk format are unchanged,
>> and is really what Ted is interested in.
>> 
>> Cheers, Andreas
>> --
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index fb69ee2..afe830b 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1797,6 +1797,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
>> 					 EXT4_FEATURE_INCOMPAT_EXTENTS| \
>> 					 EXT4_FEATURE_INCOMPAT_64BIT| \
>> 					 EXT4_FEATURE_INCOMPAT_FLEX_BG| \
>> +					 EXT4_FEATURE_INCOMPAT_EA_INODE| \
>> 					 EXT4_FEATURE_INCOMPAT_MMP | \
>> 					 EXT4_FEATURE_INCOMPAT_INLINE_DATA | \
>> 					 EXT4_FEATURE_INCOMPAT_ENCRYPT | \
>> @@ -2220,6 +2221,12 @@ struct mmpd_data {
>> #define EXT4_MMP_MAX_CHECK_INTERVAL	300UL
>> 
>> /*
>> + * Maximum size of xattr attributes for FEATURE_INCOMPAT_EA_INODE 1Mb
>> + * This limit is arbitrary, but is reasonable for the xattr API.
>> + */
>> +#define EXT4_XATTR_MAX_LARGE_EA_SIZE    (1024 * 1024)
>> +
>> +/*
>> * Function prototypes
>> */
>> 
>> @@ -2231,6 +2238,10 @@ struct mmpd_data {
>> # define ATTRIB_NORET	__attribute__((noreturn))
>> # define NORET_AND	noreturn,
>> 
>> +struct ext4_xattr_ino_array {
>> +	unsigned int xia_count;		/* # of used item in the array */
>> +	unsigned int xia_inodes[0];
>> +};
>> /* bitmap.c */
>> extern unsigned int ext4_count_free(char *bitmap, unsigned numchars);
>> void ext4_inode_bitmap_csum_set(struct super_block *sb, ext4_group_t group,
>> @@ -2480,6 +2491,7 @@ int do_journal_get_write_access(handle_t *handle,
>> extern void ext4_get_inode_flags(struct ext4_inode_info *);
>> extern int ext4_alloc_da_blocks(struct inode *inode);
>> extern void ext4_set_aops(struct inode *inode);
>> +extern int ext4_meta_trans_blocks(struct inode *, int nrblocks, int chunk);
>> extern int ext4_writepage_trans_blocks(struct inode *);
>> extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
>> extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>> index 17bc043..01eaad6 100644
>> --- a/fs/ext4/ialloc.c
>> +++ b/fs/ext4/ialloc.c
>> @@ -294,7 +294,6 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
>> 	 * as writing the quota to disk may need the lock as well.
>> 	 */
>> 	dquot_initialize(inode);
>> -	ext4_xattr_delete_inode(handle, inode);
>> 	dquot_free_inode(inode);
>> 	dquot_drop(inode);
>> 
>> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
>> index 375fb1c..9601496 100644
>> --- a/fs/ext4/inline.c
>> +++ b/fs/ext4/inline.c
>> @@ -61,7 +61,7 @@ static int get_max_inline_xattr_value_size(struct inode *inode,
>> 
>> 	/* Compute min_offs. */
>> 	for (; !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) {
>> -		if (!entry->e_value_block && entry->e_value_size) {
>> +		if (!entry->e_value_inum && entry->e_value_size) {
>> 			size_t offs = le16_to_cpu(entry->e_value_offs);
>> 			if (offs < min_offs)
>> 				min_offs = offs;
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index b9ffa9f..70069e0 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -139,8 +139,6 @@ static void ext4_invalidatepage(struct page *page, unsigned int offset,
>> 				unsigned int length);
>> static int __ext4_journalled_writepage(struct page *page, unsigned int len);
>> static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
>> -static int ext4_meta_trans_blocks(struct inode *inode, int lblocks,
>> -				  int pextents);
>> 
>> /*
>> * Test whether an inode is a fast symlink.
>> @@ -189,6 +187,8 @@ void ext4_evict_inode(struct inode *inode)
>> {
>> 	handle_t *handle;
>> 	int err;
>> +	int extra_credits = 3;
>> +	struct ext4_xattr_ino_array *lea_ino_array = NULL;
>> 
>> 	trace_ext4_evict_inode(inode);
>> 
>> @@ -238,8 +238,8 @@ void ext4_evict_inode(struct inode *inode)
>> 	 * protection against it
>> 	 */
>> 	sb_start_intwrite(inode->i_sb);
>> -	handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE,
>> -				    ext4_blocks_for_truncate(inode)+3);
>> +
>> +	handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, extra_credits);
>> 	if (IS_ERR(handle)) {
>> 		ext4_std_error(inode->i_sb, PTR_ERR(handle));
>> 		/*
>> @@ -251,9 +251,36 @@ void ext4_evict_inode(struct inode *inode)
>> 		sb_end_intwrite(inode->i_sb);
>> 		goto no_delete;
>> 	}
>> -
>> 	if (IS_SYNC(inode))
>> 		ext4_handle_sync(handle);
>> +
>> +	/*
>> +	 * Delete xattr inode before deleting the main inode.
>> +	 */
>> +	err = ext4_xattr_delete_inode(handle, inode, &lea_ino_array);
>> +	if (err) {
>> +		ext4_warning(inode->i_sb,
>> +			     "couldn't delete inode's xattr (err %d)", err);
>> +		goto stop_handle;
>> +	}
>> +
>> +	if (!IS_NOQUOTA(inode))
>> +		extra_credits += 2 * EXT4_QUOTA_DEL_BLOCKS(inode->i_sb);
>> +
>> +	if (!ext4_handle_has_enough_credits(handle,
>> +			ext4_blocks_for_truncate(inode) + extra_credits)) {
>> +		err = ext4_journal_extend(handle,
>> +			ext4_blocks_for_truncate(inode) + extra_credits);
>> +		if (err > 0)
>> +			err = ext4_journal_restart(handle,
>> +			ext4_blocks_for_truncate(inode) + extra_credits);
>> +		if (err != 0) {
>> +			ext4_warning(inode->i_sb,
>> +				     "couldn't extend journal (err %d)", err);
>> +			goto stop_handle;
>> +		}
>> +	}
>> +
>> 	inode->i_size = 0;
>> 	err = ext4_mark_inode_dirty(handle, inode);
>> 	if (err) {
>> @@ -277,10 +304,10 @@ void ext4_evict_inode(struct inode *inode)
>> 	 * enough credits left in the handle to remove the inode from
>> 	 * the orphan list and set the dtime field.
>> 	 */
>> -	if (!ext4_handle_has_enough_credits(handle, 3)) {
>> -		err = ext4_journal_extend(handle, 3);
>> +	if (!ext4_handle_has_enough_credits(handle, extra_credits)) {
>> +		err = ext4_journal_extend(handle, extra_credits);
>> 		if (err > 0)
>> -			err = ext4_journal_restart(handle, 3);
>> +			err = ext4_journal_restart(handle, extra_credits);
>> 		if (err != 0) {
>> 			ext4_warning(inode->i_sb,
>> 				     "couldn't extend journal (err %d)", err);
>> @@ -315,8 +342,12 @@ void ext4_evict_inode(struct inode *inode)
>> 		ext4_clear_inode(inode);
>> 	else
>> 		ext4_free_inode(handle, inode);
>> +
>> 	ext4_journal_stop(handle);
>> 	sb_end_intwrite(inode->i_sb);
>> +
>> +	if (lea_ino_array != NULL)
>> +		ext4_xattr_inode_array_free(inode, lea_ino_array);
>> 	return;
>> no_delete:
>> 	ext4_clear_inode(inode);	/* We must guarantee clearing of inode... */
>> @@ -5475,7 +5506,7 @@ static int ext4_index_trans_blocks(struct inode *inode, int lblocks,
>> *
>> * Also account for superblock, inode, quota and xattr blocks
>> */
>> -static int ext4_meta_trans_blocks(struct inode *inode, int lblocks,
>> +int ext4_meta_trans_blocks(struct inode *inode, int lblocks,
>> 				  int pextents)
>> {
>> 	ext4_group_t groups, ngroups = ext4_get_groups_count(inode->i_sb);
>> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
>> index 996e790..f158798 100644
>> --- a/fs/ext4/xattr.c
>> +++ b/fs/ext4/xattr.c
>> @@ -190,9 +190,8 @@ static void ext4_xattr_block_csum_set(struct inode *inode,
>> 
>> 	/* Check the values */
>> 	while (!IS_LAST_ENTRY(entry)) {
>> -		if (entry->e_value_block != 0)
>> -			return -EFSCORRUPTED;
>> -		if (entry->e_value_size != 0) {
>> +		if (entry->e_value_size != 0 &&
>> +		    entry->e_value_inum == 0) {
>> 			u16 offs = le16_to_cpu(entry->e_value_offs);
>> 			u32 size = le32_to_cpu(entry->e_value_size);
>> 			void *value;
>> @@ -258,19 +257,26 @@ static void ext4_xattr_block_csum_set(struct inode *inode,
>> 	__xattr_check_inode((inode), (header), (end), __func__, __LINE__)
>> 
>> static inline int
>> -ext4_xattr_check_entry(struct ext4_xattr_entry *entry, size_t size)
>> +ext4_xattr_check_entry(struct ext4_xattr_entry *entry, size_t size,
>> +		       struct inode *inode)
>> {
>> 	size_t value_size = le32_to_cpu(entry->e_value_size);
>> 
>> -	if (entry->e_value_block != 0 || value_size > size ||
>> +	if (!entry->e_value_inum &&
>> 	    le16_to_cpu(entry->e_value_offs) + value_size > size)
>> 		return -EFSCORRUPTED;
>> +	if (entry->e_value_inum &&
>> +	    (le32_to_cpu(entry->e_value_inum) < EXT4_FIRST_INO(inode->i_sb) ||
>> +	     le32_to_cpu(entry->e_value_inum) >
>> +	     le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_inodes_count)))
>> +		return -EFSCORRUPTED;
>> 	return 0;
>> }
>> 
>> static int
>> ext4_xattr_find_entry(struct ext4_xattr_entry **pentry, int name_index,
>> -		      const char *name, size_t size, int sorted)
>> +		      const char *name, size_t size, int sorted,
>> +		      struct inode *inode)
>> {
>> 	struct ext4_xattr_entry *entry;
>> 	size_t name_len;
>> @@ -290,11 +296,104 @@ static void ext4_xattr_block_csum_set(struct inode *inode,
>> 			break;
>> 	}
>> 	*pentry = entry;
>> -	if (!cmp && ext4_xattr_check_entry(entry, size))
>> +	if (!cmp && ext4_xattr_check_entry(entry, size, inode))
>> 		return -EFSCORRUPTED;
>> 	return cmp ? -ENODATA : 0;
>> }
>> 
>> +/*
>> + * Read the EA value from an inode.
>> + */
>> +static int
>> +ext4_xattr_inode_read(struct inode *ea_inode, void *buf, size_t *size)
>> +{
>> +	unsigned long block = 0;
>> +	struct buffer_head *bh = NULL;
>> +	int blocksize;
>> +	size_t csize, ret_size = 0;
>> +
>> +	if (*size == 0)
>> +		return 0;
>> +
>> +	blocksize = ea_inode->i_sb->s_blocksize;
>> +
>> +	while (ret_size < *size) {
>> +		csize = (*size - ret_size) > blocksize ? blocksize :
>> +							*size - ret_size;
>> +		bh = ext4_bread(NULL, ea_inode, block, 0);
>> +		if (IS_ERR(bh)) {
>> +			*size = ret_size;
>> +			return PTR_ERR(bh);
>> +		}
>> +		memcpy(buf, bh->b_data, csize);
>> +		brelse(bh);
>> +
>> +		buf += csize;
>> +		block += 1;
>> +		ret_size += csize;
>> +	}
>> +
>> +	*size = ret_size;
>> +
>> +	return 0;
>> +}
>> +
>> +struct inode *ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino, int *err)
>> +{
>> +	struct inode *ea_inode = NULL;
>> +
>> +	ea_inode = ext4_iget(parent->i_sb, ea_ino);
>> +	if (IS_ERR(ea_inode) || is_bad_inode(ea_inode)) {
>> +		int rc = IS_ERR(ea_inode) ? PTR_ERR(ea_inode) : 0;
>> +		ext4_error(parent->i_sb, "error while reading EA inode %lu "
>> +			   "/ %d %d", ea_ino, rc, is_bad_inode(ea_inode));
>> +		*err = rc != 0 ? rc : -EIO;
>> +		return NULL;
>> +	}
>> +
>> +	if (EXT4_XATTR_INODE_GET_PARENT(ea_inode) != parent->i_ino ||
>> +	    ea_inode->i_generation != parent->i_generation) {
>> +		ext4_error(parent->i_sb, "Backpointer from EA inode %lu "
>> +			   "to parent invalid.", ea_ino);
>> +		*err = -EINVAL;
>> +		goto error;
>> +	}
>> +
>> +	if (!(EXT4_I(ea_inode)->i_flags & EXT4_EA_INODE_FL)) {
>> +		ext4_error(parent->i_sb, "EA inode %lu does not have "
>> +			   "EXT4_EA_INODE_FL flag set.\n", ea_ino);
>> +		*err = -EINVAL;
>> +		goto error;
>> +	}
>> +
>> +	*err = 0;
>> +	return ea_inode;
>> +
>> +error:
>> +	iput(ea_inode);
>> +	return NULL;
>> +}
>> +
>> +/*
>> + * Read the value from the EA inode.
>> + */
>> +static int
>> +ext4_xattr_inode_get(struct inode *inode, unsigned long ea_ino, void *buffer,
>> +		     size_t *size)
>> +{
>> +	struct inode *ea_inode = NULL;
>> +	int err;
>> +
>> +	ea_inode = ext4_xattr_inode_iget(inode, ea_ino, &err);
>> +	if (err)
>> +		return err;
>> +
>> +	err = ext4_xattr_inode_read(ea_inode, buffer, size);
>> +	iput(ea_inode);
>> +
>> +	return err;
>> +}
>> +
>> static int
>> ext4_xattr_block_get(struct inode *inode, int name_index, const char *name,
>> 		     void *buffer, size_t buffer_size)
>> @@ -327,7 +426,8 @@ static void ext4_xattr_block_csum_set(struct inode *inode,
>> 	}
>> 	ext4_xattr_cache_insert(ext4_mb_cache, bh);
>> 	entry = BFIRST(bh);
>> -	error = ext4_xattr_find_entry(&entry, name_index, name, bh->b_size, 1);
>> +	error = ext4_xattr_find_entry(&entry, name_index, name, bh->b_size, 1,
>> +				      inode);
>> 	if (error == -EFSCORRUPTED)
>> 		goto bad_block;
>> 	if (error)
>> @@ -337,8 +437,16 @@ static void ext4_xattr_block_csum_set(struct inode *inode,
>> 		error = -ERANGE;
>> 		if (size > buffer_size)
>> 			goto cleanup;
>> -		memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
>> -		       size);
>> +		if (entry->e_value_inum) {
>> +			error = ext4_xattr_inode_get(inode,
>> +					     le32_to_cpu(entry->e_value_inum),
>> +					     buffer, &size);
>> +			if (error)
>> +				goto cleanup;
>> +		} else {
>> +			memcpy(buffer, bh->b_data +
>> +			       le16_to_cpu(entry->e_value_offs), size);
>> +		}
>> 	}
>> 	error = size;
>> 
>> @@ -372,7 +480,7 @@ static void ext4_xattr_block_csum_set(struct inode *inode,
>> 	if (error)
>> 		goto cleanup;
>> 	error = ext4_xattr_find_entry(&entry, name_index, name,
>> -				      end - (void *)entry, 0);
>> +				      end - (void *)entry, 0, inode);
>> 	if (error)
>> 		goto cleanup;
>> 	size = le32_to_cpu(entry->e_value_size);
>> @@ -380,8 +488,16 @@ static void ext4_xattr_block_csum_set(struct inode *inode,
>> 		error = -ERANGE;
>> 		if (size > buffer_size)
>> 			goto cleanup;
>> -		memcpy(buffer, (void *)IFIRST(header) +
>> -		       le16_to_cpu(entry->e_value_offs), size);
>> +		if (entry->e_value_inum) {
>> +			error = ext4_xattr_inode_get(inode,
>> +					     le32_to_cpu(entry->e_value_inum),
>> +					     buffer, &size);
>> +			if (error)
>> +				goto cleanup;
>> +		} else {
>> +			memcpy(buffer, (void *)IFIRST(header) +
>> +			       le16_to_cpu(entry->e_value_offs), size);
>> +		}
>> 	}
>> 	error = size;
>> 
>> @@ -648,7 +764,7 @@ static size_t ext4_xattr_free_space(struct ext4_xattr_entry *last,
>> 				    size_t *min_offs, void *base, int *total)
>> {
>> 	for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) {
>> -		if (last->e_value_size) {
>> +		if (!last->e_value_inum && last->e_value_size) {
>> 			size_t offs = le16_to_cpu(last->e_value_offs);
>> 			if (offs < *min_offs)
>> 				*min_offs = offs;
>> @@ -659,16 +775,172 @@ static size_t ext4_xattr_free_space(struct ext4_xattr_entry *last,
>> 	return (*min_offs - ((void *)last - base) - sizeof(__u32));
>> }
>> 
>> -static int
>> -ext4_xattr_set_entry(struct ext4_xattr_info *i, struct ext4_xattr_search *s)
>> +/*
>> + * Write the value of the EA in an inode.
>> + */
>> +static int ext4_xattr_inode_write(handle_t *handle, struct inode *ea_inode,
>> +				  const void *buf, int bufsize)
>> +{
>> +	struct buffer_head *bh = NULL;
>> +	unsigned long block = 0;
>> +	unsigned blocksize = ea_inode->i_sb->s_blocksize;
>> +	unsigned max_blocks = (bufsize + blocksize - 1) >> ea_inode->i_blkbits;
>> +	int csize, wsize = 0;
>> +	int ret = 0;
>> +	int retries = 0;
>> +
>> +retry:
>> +	while (ret >= 0 && ret < max_blocks) {
>> +		struct ext4_map_blocks map;
>> +		map.m_lblk = block += ret;
>> +		map.m_len = max_blocks -= ret;
>> +
>> +		ret = ext4_map_blocks(handle, ea_inode, &map,
>> +				      EXT4_GET_BLOCKS_CREATE);
>> +		if (ret <= 0) {
>> +			ext4_mark_inode_dirty(handle, ea_inode);
>> +			if (ret == -ENOSPC &&
>> +			    ext4_should_retry_alloc(ea_inode->i_sb, &retries)) {
>> +				ret = 0;
>> +				goto retry;
>> +			}
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	block = 0;
>> +	while (wsize < bufsize) {
>> +		if (bh != NULL)
>> +			brelse(bh);
>> +		csize = (bufsize - wsize) > blocksize ? blocksize :
>> +								bufsize - wsize;
>> +		bh = ext4_getblk(handle, ea_inode, block, 0);
>> +		if (IS_ERR(bh)) {
>> +			ret = PTR_ERR(bh);
>> +			goto out;
>> +		}
>> +		ret = ext4_journal_get_write_access(handle, bh);
>> +		if (ret)
>> +			goto out;
>> +
>> +		memcpy(bh->b_data, buf, csize);
>> +		set_buffer_uptodate(bh);
>> +		ext4_handle_dirty_metadata(handle, ea_inode, bh);
>> +
>> +		buf += csize;
>> +		wsize += csize;
>> +		block += 1;
>> +	}
>> +
>> +	mutex_lock(&ea_inode->i_mutex);
>> +	i_size_write(ea_inode, wsize);
>> +	ext4_update_i_disksize(ea_inode, wsize);
>> +	mutex_unlock(&ea_inode->i_mutex);
>> +
>> +	ext4_mark_inode_dirty(handle, ea_inode);
>> +
>> +out:
>> +	brelse(bh);
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>> + * Create an inode to store the value of a large EA.
>> + */
>> +static struct inode *ext4_xattr_inode_create(handle_t *handle,
>> +					     struct inode *inode)
>> +{
>> +	struct inode *ea_inode = NULL;
>> +
>> +	/*
>> +	 * Let the next inode be the goal, so we try and allocate the EA inode
>> +	 * in the same group, or nearby one.
>> +	 */
>> +	ea_inode = ext4_new_inode(handle, inode->i_sb->s_root->d_inode,
>> +				  S_IFREG | 0600, NULL, inode->i_ino + 1, NULL);
>> +	if (!IS_ERR(ea_inode)) {
>> +		ea_inode->i_op = &ext4_file_inode_operations;
>> +		ea_inode->i_fop = &ext4_file_operations;
>> +		ext4_set_aops(ea_inode);
>> +		ea_inode->i_generation = inode->i_generation;
>> +		EXT4_I(ea_inode)->i_flags |= EXT4_EA_INODE_FL;
>> +
>> +		/*
>> +		 * A back-pointer from EA inode to parent inode will be useful
>> +		 * for e2fsck.
>> +		 */
>> +		EXT4_XATTR_INODE_SET_PARENT(ea_inode, inode->i_ino);
>> +		unlock_new_inode(ea_inode);
>> +	}
>> +
>> +	return ea_inode;
>> +}
>> +
>> +/*
>> + * Unlink the inode storing the value of the EA.
>> + */
>> +int ext4_xattr_inode_unlink(struct inode *inode, unsigned long ea_ino)
>> +{
>> +	struct inode *ea_inode = NULL;
>> +	int err;
>> +
>> +	ea_inode = ext4_xattr_inode_iget(inode, ea_ino, &err);
>> +	if (err)
>> +		return err;
>> +
>> +	clear_nlink(ea_inode);
>> +	iput(ea_inode);
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Add value of the EA in an inode.
>> + */
>> +static int ext4_xattr_inode_set(handle_t *handle, struct inode *inode,
>> +				unsigned long *ea_ino, const void *value,
>> +				size_t value_len)
>> +{
>> +	struct inode *ea_inode;
>> +	int err;
>> +
>> +	/* Create an inode for the EA value */
>> +	ea_inode = ext4_xattr_inode_create(handle, inode);
>> +	if (IS_ERR(ea_inode))
>> +		return PTR_ERR(ea_inode);
>> +
>> +	err = ext4_xattr_inode_write(handle, ea_inode, value, value_len);
>> +	if (err)
>> +		clear_nlink(ea_inode);
>> +	else
>> +		*ea_ino = ea_inode->i_ino;
>> +
>> +	iput(ea_inode);
>> +
>> +	return err;
>> +}
>> +
>> +static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
>> +				struct ext4_xattr_search *s,
>> +				handle_t *handle, struct inode *inode)
>> {
>> 	struct ext4_xattr_entry *last;
>> 	size_t free, min_offs = s->end - s->base, name_len = strlen(i->name);
>> +	int in_inode = i->in_inode;
>> +
>> +	if (ext4_feature_incompat(inode->i_sb, EA_INODE) &&
>> +	    (EXT4_XATTR_SIZE(i->value_len) >
>> +	     EXT4_XATTR_MIN_LARGE_EA_SIZE(inode->i_sb->s_blocksize)))
>> +		in_inode = 1;
>> 
>> 	/* Compute min_offs and last. */
>> 	last = s->first;
>> 	for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) {
>> -		if (last->e_value_size) {
>> +		if (!last->e_value_inum && last->e_value_size) {
>> 			size_t offs = le16_to_cpu(last->e_value_offs);
>> 			if (offs < min_offs)
>> 				min_offs = offs;
>> @@ -676,15 +948,20 @@ static size_t ext4_xattr_free_space(struct ext4_xattr_entry *last,
>> 	}
>> 	free = min_offs - ((void *)last - s->base) - sizeof(__u32);
>> 	if (!s->not_found) {
>> -		if (s->here->e_value_size) {
>> +		if (!in_inode &&
>> +		    !s->here->e_value_inum && s->here->e_value_size) {
>> 			size_t size = le32_to_cpu(s->here->e_value_size);
>> 			free += EXT4_XATTR_SIZE(size);
>> 		}
>> 		free += EXT4_XATTR_LEN(name_len);
>> 	}
>> 	if (i->value) {
>> -		if (free < EXT4_XATTR_LEN(name_len) +
>> -			   EXT4_XATTR_SIZE(i->value_len))
>> +		size_t value_len = EXT4_XATTR_SIZE(i->value_len);
>> +
>> +		if (in_inode)
>> +			value_len = 0;
>> +
>> +		if (free < EXT4_XATTR_LEN(name_len) + value_len)
>> 			return -ENOSPC;
>> 	}
>> 
>> @@ -698,7 +975,8 @@ static size_t ext4_xattr_free_space(struct ext4_xattr_entry *last,
>> 		s->here->e_name_len = name_len;
>> 		memcpy(s->here->e_name, i->name, name_len);
>> 	} else {
>> -		if (s->here->e_value_size) {
>> +		if (!s->here->e_value_inum && s->here->e_value_size &&
>> +		    s->here->e_value_offs > 0) {
>> 			void *first_val = s->base + min_offs;
>> 			size_t offs = le16_to_cpu(s->here->e_value_offs);
>> 			void *val = s->base + offs;
>> @@ -732,12 +1010,18 @@ static size_t ext4_xattr_free_space(struct ext4_xattr_entry *last,
>> 			last = s->first;
>> 			while (!IS_LAST_ENTRY(last)) {
>> 				size_t o = le16_to_cpu(last->e_value_offs);
>> -				if (last->e_value_size && o < offs)
>> +				if (!last->e_value_inum &&
>> +				    last->e_value_size && o < offs)
>> 					last->e_value_offs =
>> 						cpu_to_le16(o + size);
>> 				last = EXT4_XATTR_NEXT(last);
>> 			}
>> 		}
>> +		if (s->here->e_value_inum) {
>> +			ext4_xattr_inode_unlink(inode,
>> +					    le32_to_cpu(s->here->e_value_inum);
>> +			s->here->e_value_inum = 0;
>> +		}
>> 		if (!i->value) {
>> 			/* Remove the old name. */
>> 			size_t size = EXT4_XATTR_LEN(name_len);
>> @@ -750,11 +1034,20 @@ static size_t ext4_xattr_free_space(struct ext4_xattr_entry *last,
>> 
>> 	if (i->value) {
>> 		/* Insert the new value. */
>> -		s->here->e_value_size = cpu_to_le32(i->value_len);
>> -		if (i->value_len) {
>> +		if (in_inode) {
>> +			unsigned long ea_ino =
>> +				le32_to_cpu(s->here->e_value_inum);
>> +			rc = ext4_xattr_inode_set(handle, inode, &ea_ino,
>> +						  i->value, i->value_len);
>> +			if (rc)
>> +				goto out;
>> +			s->here->e_value_inum = cpu_to_le32(ea_ino);
>> +			s->here->e_value_offs = 0;
>> +		} else if (i->value_len) {
>> 			size_t size = EXT4_XATTR_SIZE(i->value_len);
>> 			void *val = s->base + min_offs - size;
>> 			s->here->e_value_offs = cpu_to_le16(min_offs - size);
>> +			s->here->e_value_inum = 0;
>> 			if (i->value == EXT4_ZERO_XATTR_VALUE) {
>> 				memset(val, 0, size);
>> 			} else {
>> @@ -764,8 +1057,11 @@ static size_t ext4_xattr_free_space(struct ext4_xattr_entry *last,
>> 				memcpy(val, i->value, i->value_len);
>> 			}
>> 		}
>> +		s->here->e_value_size = cpu_to_le32(i->value_len);
>> 	}
>> -	return 0;
>> +
>> +out:
>> +	return rc;
>> }
>> 
>> struct ext4_xattr_block_find {
>> @@ -804,7 +1100,7 @@ struct ext4_xattr_block_find {
>> 		bs->s.end = bs->bh->b_data + bs->bh->b_size;
>> 		bs->s.here = bs->s.first;
>> 		error = ext4_xattr_find_entry(&bs->s.here, i->name_index,
>> -					      i->name, bs->bh->b_size, 1);
>> +					     i->name, bs->bh->b_size, 1, inode);
>> 		if (error && error != -ENODATA)
>> 			goto cleanup;
>> 		bs->s.not_found = error;
>> @@ -829,8 +1125,6 @@ struct ext4_xattr_block_find {
>> 
>> #define header(x) ((struct ext4_xattr_header *)(x))
>> 
>> -	if (i->value && i->value_len > sb->s_blocksize)
>> -		return -ENOSPC;
>> 	if (s->base) {
>> 		BUFFER_TRACE(bs->bh, "get_write_access");
>> 		error = ext4_journal_get_write_access(handle, bs->bh);
>> @@ -849,7 +1143,7 @@ struct ext4_xattr_block_find {
>> 			mb_cache_entry_delete_block(ext4_mb_cache, hash,
>> 						    bs->bh->b_blocknr);
>> 			ea_bdebug(bs->bh, "modifying in-place");
>> -			error = ext4_xattr_set_entry(i, s);
>> +			error = ext4_xattr_set_entry(i, s, handle, inode);
>> 			if (!error) {
>> 				if (!IS_LAST_ENTRY(s->first))
>> 					ext4_xattr_rehash(header(s->base),
>> @@ -898,7 +1192,7 @@ struct ext4_xattr_block_find {
>> 		s->end = s->base + sb->s_blocksize;
>> 	}
>> 
>> -	error = ext4_xattr_set_entry(i, s);
>> +	error = ext4_xattr_set_entry(i, s, handle, inode);
>> 	if (error == -EFSCORRUPTED)
>> 		goto bad_block;
>> 	if (error)
>> @@ -1077,7 +1371,7 @@ int ext4_xattr_ibody_find(struct inode *inode, struct ext4_xattr_info *i,
>> 		/* Find the named attribute. */
>> 		error = ext4_xattr_find_entry(&is->s.here, i->name_index,
>> 					      i->name, is->s.end -
>> -					      (void *)is->s.base, 0);
>> +					      (void *)is->s.base, 0, inode);
>> 		if (error && error != -ENODATA)
>> 			return error;
>> 		is->s.not_found = error;
>> @@ -1095,7 +1389,7 @@ int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode,
>> 
>> 	if (EXT4_I(inode)->i_extra_isize == 0)
>> 		return -ENOSPC;
>> -	error = ext4_xattr_set_entry(i, s);
>> +	error = ext4_xattr_set_entry(i, s, handle, inode);
>> 	if (error) {
>> 		if (error == -ENOSPC &&
>> 		    ext4_has_inline_data(inode)) {
>> @@ -1107,7 +1401,7 @@ int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode,
>> 			error = ext4_xattr_ibody_find(inode, i, is);
>> 			if (error)
>> 				return error;
>> -			error = ext4_xattr_set_entry(i, s);
>> +			error = ext4_xattr_set_entry(i, s, handle, inode);
>> 		}
>> 		if (error)
>> 			return error;
>> @@ -1133,7 +1427,7 @@ static int ext4_xattr_ibody_set(struct inode *inode,
>> 
>> 	if (EXT4_I(inode)->i_extra_isize == 0)
>> 		return -ENOSPC;
>> -	error = ext4_xattr_set_entry(i, s);
>> +	error = ext4_xattr_set_entry(i, s, handle, inode);
>> 	if (error)
>> 		return error;
>> 	header = IHDR(inode, ext4_raw_inode(&is->iloc));
>> @@ -1180,7 +1474,7 @@ static int ext4_xattr_value_same(struct ext4_xattr_search *s,
>> 		.name = name,
>> 		.value = value,
>> 		.value_len = value_len,
>> -
>> +		.in_inode = 0,
>> 	};
>> 	struct ext4_xattr_ibody_find is = {
>> 		.s = { .not_found = -ENODATA, },
>> @@ -1250,6 +1544,15 @@ static int ext4_xattr_value_same(struct ext4_xattr_search *s,
>> 					goto cleanup;
>> 			}
>> 			error = ext4_xattr_block_set(handle, inode, &i, &bs);
>> +			if (EXT4_HAS_INCOMPAT_FEATURE(inode->i_sb,
>> +					EXT4_FEATURE_INCOMPAT_EA_INODE) &&
>> +			    error == -ENOSPC) {
>> +				/* xattr not fit to block, store at external
>> +				 * inode */
>> +				i.in_inode = 1;
>> +				error = ext4_xattr_ibody_set(handle, inode,
>> +							     &i, &is);
>> +			}
>> 			if (error)
>> 				goto cleanup;
>> 			if (!is.s.not_found) {
>> @@ -1293,9 +1596,22 @@ static int ext4_xattr_value_same(struct ext4_xattr_search *s,
>> 	       const void *value, size_t value_len, int flags)
>> {
>> 	handle_t *handle;
>> +	struct super_block *sb = inode->i_sb;
>> 	int error, retries = 0;
>> 	int credits = ext4_jbd2_credits_xattr(inode);
>> 
>> +	if ((value_len >= EXT4_XATTR_MIN_LARGE_EA_SIZE(sb->s_blocksize)) &&
>> +	    EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EA_INODE)) {
>> +		int nrblocks = (value_len + sb->s_blocksize - 1) >>
>> +					sb->s_blocksize_bits;
>> +
>> +		/* For new inode */
>> +		credits += EXT4_SINGLEDATA_TRANS_BLOCKS(sb) + 3;
>> +
>> +		/* For data blocks of EA inode */
>> +		credits += ext4_meta_trans_blocks(inode, nrblocks, 0);
>> +	}
>> +
>> retry:
>> 	handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits);
>> 	if (IS_ERR(handle)) {
>> @@ -1307,7 +1623,7 @@ static int ext4_xattr_value_same(struct ext4_xattr_search *s,
>> 					      value, value_len, flags);
>> 		error2 = ext4_journal_stop(handle);
>> 		if (error == -ENOSPC &&
>> -		    ext4_should_retry_alloc(inode->i_sb, &retries))
>> +		    ext4_should_retry_alloc(sb, &retries))
>> 			goto retry;
>> 		if (error == 0)
>> 			error = error2;
>> @@ -1332,7 +1648,7 @@ static void ext4_xattr_shift_entries(struct ext4_xattr_entry *entry,
>> 
>> 	/* Adjust the value offsets of the entries */
>> 	for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) {
>> -		if (last->e_value_size) {
>> +		if (!last->e_value_inum && last->e_value_size) {
>> 			new_offs = le16_to_cpu(last->e_value_offs) +
>> 							value_offs_shift;
>> 			last->e_value_offs = cpu_to_le16(new_offs);
>> @@ -1593,21 +1909,135 @@ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
>> }
>> 
>> 
>> +#define EIA_INCR 16 /* must be 2^n */
>> +#define EIA_MASK (EIA_INCR - 1)
>> +/* Add the large xattr @ino into @lea_ino_array for later deletion.
>> + * If @lea_ino_array is new or full it will be grown and the old
>> + * contents copied over.
>> + */
>> +static int
>> +ext4_expand_ino_array(struct ext4_xattr_ino_array **lea_ino_array, __u32 ino)
>> +{
>> +	if (*lea_ino_array == NULL) {
>> +		/*
>> +		 * Start with 15 inodes, so it fits into a power-of-two size.
>> +		 * If *lea_ino_array is NULL, this is essentially offsetof()
>> +		 */
>> +		(*lea_ino_array) =
>> +			kmalloc(offsetof(struct ext4_xattr_ino_array,
>> +					 xia_inodes[EIA_MASK]),
>> +				GFP_NOFS);
>> +		if (*lea_ino_array == NULL)
>> +			return -ENOMEM;
>> +		(*lea_ino_array)->xia_count = 0;
>> +	} else if (((*lea_ino_array)->xia_count & EIA_MASK) == EIA_MASK) {
>> +		/* expand the array once all 15 + n * 16 slots are full */
>> +		struct ext4_xattr_ino_array *new_array = NULL;
>> +		int count = (*lea_ino_array)->xia_count;
>> +
>> +		/* if new_array is NULL, this is essentially offsetof() */
>> +		new_array = kmalloc(
>> +				offsetof(struct ext4_xattr_ino_array,
>> +					 xia_inodes[count + EIA_INCR]),
>> +				GFP_NOFS);
>> +		if (new_array == NULL)
>> +			return -ENOMEM;
>> +		memcpy(new_array, *lea_ino_array,
>> +		       offsetof(struct ext4_xattr_ino_array,
>> +				xia_inodes[count]));
>> +		kfree(*lea_ino_array);
>> +		*lea_ino_array = new_array;
>> +	}
>> +	(*lea_ino_array)->xia_inodes[(*lea_ino_array)->xia_count++] = ino;
>> +	return 0;
>> +}
>> +
>> +/**
>> + * Add xattr inode to orphan list
>> + */
>> +static int
>> +ext4_xattr_inode_orphan_add(handle_t *handle, struct inode *inode,
>> +			int credits, struct ext4_xattr_ino_array *lea_ino_array)
>> +{
>> +	struct inode *ea_inode = NULL;
>> +	int idx = 0, error = 0;
>> +
>> +	if (lea_ino_array == NULL)
>> +		return 0;
>> +
>> +	for (; idx < lea_ino_array->xia_count; ++idx) {
>> +		if (!ext4_handle_has_enough_credits(handle, credits)) {
>> +			error = ext4_journal_extend(handle, credits);
>> +			if (error > 0)
>> +				error = ext4_journal_restart(handle, credits);
>> +
>> +			if (error != 0) {
>> +				ext4_warning(inode->i_sb,
>> +					"couldn't extend journal "
>> +					"(err %d)", error);
>> +				return error;
>> +			}
>> +		}
>> +		ea_inode = ext4_xattr_inode_iget(inode,
>> +				lea_ino_array->xia_inodes[idx], &error);
>> +		if (error)
>> +			continue;
>> +		ext4_orphan_add(handle, ea_inode);
>> +		/* the inode's i_count will be released by caller */
>> +	}
>> +
>> +	return 0;
>> +}
>> 
>> /*
>> * ext4_xattr_delete_inode()
>> *
>> - * Free extended attribute resources associated with this inode. This
>> + * Free extended attribute resources associated with this inode. Traverse
>> + * all entries and unlink any xattr inodes associated with this inode. This
>> * is called immediately before an inode is freed. We have exclusive
>> - * access to the inode.
>> + * access to the inode. If an orphan inode is deleted it will also delete any
>> + * xattr block and all xattr inodes. They are checked by ext4_xattr_inode_iget()
>> + * to ensure they belong to the parent inode and were not deleted already.
>> */
>> -void
>> -ext4_xattr_delete_inode(handle_t *handle, struct inode *inode)
>> +int
>> +ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
>> +			struct ext4_xattr_ino_array **lea_ino_array)
>> {
>> 	struct buffer_head *bh = NULL;
>> +	struct ext4_xattr_ibody_header *header;
>> +	struct ext4_inode *raw_inode;
>> +	struct ext4_iloc iloc;
>> +	struct ext4_xattr_entry *entry;
>> +	int credits = 3, error = 0;
>> 
>> -	if (!EXT4_I(inode)->i_file_acl)
>> +	if (!ext4_test_inode_state(inode, EXT4_STATE_XATTR))
>> +		goto delete_external_ea;
>> +
>> +	error = ext4_get_inode_loc(inode, &iloc);
>> +	if (error)
>> +		goto cleanup;
>> +	raw_inode = ext4_raw_inode(&iloc);
>> +	header = IHDR(inode, raw_inode);
>> +	for (entry = IFIRST(header); !IS_LAST_ENTRY(entry);
>> +	     entry = EXT4_XATTR_NEXT(entry)) {
>> +		if (!entry->e_value_inum)
>> +			continue;
>> +		if (ext4_expand_ino_array(lea_ino_array,
>> +					  entry->e_value_inum) != 0) {
>> +			brelse(iloc.bh);
>> +			goto cleanup;
>> +		}
>> +		entry->e_value_inum = 0;
>> +	}
>> +	brelse(iloc.bh);
>> +
>> +delete_external_ea:
>> +	if (!EXT4_I(inode)->i_file_acl) {
>> +		/* add xattr inode to orphan list */
>> +		ext4_xattr_inode_orphan_add(handle, inode, credits,
>> +						*lea_ino_array);
>> 		goto cleanup;
>> +	}
>> 	bh = sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl);
>> 	if (!bh) {
>> 		EXT4_ERROR_INODE(inode, "block %llu read error",
>> @@ -1620,11 +2050,69 @@ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
>> 				 EXT4_I(inode)->i_file_acl);
>> 		goto cleanup;
>> 	}
>> +
>> +	for (entry = BFIRST(bh); !IS_LAST_ENTRY(entry);
>> +	     entry = EXT4_XATTR_NEXT(entry)) {
>> +		if (!entry->e_value_inum)
>> +			continue;
>> +		if (ext4_expand_ino_array(lea_ino_array,
>> +					  entry->e_value_inum) != 0)
>> +			goto cleanup;
>> +		entry->e_value_inum = 0;
>> +	}
>> +
>> +	/* add xattr inode to orphan list */
>> +	error = ext4_xattr_inode_orphan_add(handle, inode, credits,
>> +					*lea_ino_array);
>> +	if (error != 0)
>> +		goto cleanup;
>> +
>> +	if (!IS_NOQUOTA(inode))
>> +		credits += 2 * EXT4_QUOTA_DEL_BLOCKS(inode->i_sb);
>> +
>> +	if (!ext4_handle_has_enough_credits(handle, credits)) {
>> +		error = ext4_journal_extend(handle, credits);
>> +		if (error > 0)
>> +			error = ext4_journal_restart(handle, credits);
>> +		if (error != 0) {
>> +			ext4_warning(inode->i_sb,
>> +				"couldn't extend journal (err %d)", error);
>> +			goto cleanup;
>> +		}
>> +	}
>> +
>> 	ext4_xattr_release_block(handle, inode, bh);
>> 	EXT4_I(inode)->i_file_acl = 0;
>> 
>> cleanup:
>> 	brelse(bh);
>> +
>> +	return error;
>> +}
>> +
>> +void
>> +ext4_xattr_inode_array_free(struct inode *inode,
>> +			    struct ext4_xattr_ino_array *lea_ino_array)
>> +{
>> +	struct inode	*ea_inode = NULL;
>> +	int		idx = 0;
>> +	int		err;
>> +
>> +	if (lea_ino_array == NULL)
>> +		return;
>> +
>> +	for (; idx < lea_ino_array->xia_count; ++idx) {
>> +		ea_inode = ext4_xattr_inode_iget(inode,
>> +				lea_ino_array->xia_inodes[idx], &err);
>> +		if (err)
>> +			continue;
>> +		/* for inode's i_count get from ext4_xattr_delete_inode */
>> +		if (!list_empty(&EXT4_I(ea_inode)->i_orphan))
>> +			iput(ea_inode);
>> +		clear_nlink(ea_inode);
>> +		iput(ea_inode);
>> +	}
>> +	kfree(lea_ino_array);
>> }
>> 
>> /*
>> @@ -1676,10 +2164,9 @@ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
>> 		    entry1->e_name_index != entry2->e_name_index ||
>> 		    entry1->e_name_len != entry2->e_name_len ||
>> 		    entry1->e_value_size != entry2->e_value_size ||
>> +		    entry1->e_value_inum != entry2->e_value_inum ||
>> 		    memcmp(entry1->e_name, entry2->e_name, entry1->e_name_len))
>> 			return 1;
>> -		if (entry1->e_value_block != 0 || entry2->e_value_block != 0)
>> -			return -EFSCORRUPTED;
>> 		if (memcmp((char *)header1 + le16_to_cpu(entry1->e_value_offs),
>> 			   (char *)header2 + le16_to_cpu(entry2->e_value_offs),
>> 			   le32_to_cpu(entry1->e_value_size)))
>> @@ -1751,7 +2238,7 @@ static inline void ext4_xattr_hash_entry(struct ext4_xattr_header *header,
>> 		       *name++;
>> 	}
>> 
>> -	if (entry->e_value_size != 0) {
>> +	if (!entry->e_value_inum && entry->e_value_size) {
>> 		__le32 *value = (__le32 *)((char *)header +
>> 			le16_to_cpu(entry->e_value_offs));
>> 		for (n = (le32_to_cpu(entry->e_value_size) +
>> diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
>> index 099c8b6..6e10ff9 100644
>> --- a/fs/ext4/xattr.h
>> +++ b/fs/ext4/xattr.h
>> @@ -44,7 +44,7 @@ struct ext4_xattr_entry {
>> 	__u8	e_name_len;	/* length of name */
>> 	__u8	e_name_index;	/* attribute name index */
>> 	__le16	e_value_offs;	/* offset in disk block of value */
>> -	__le32	e_value_block;	/* disk block attribute is stored on (n/i) */
>> +	__le32	e_value_inum;	/* inode in which the value is stored */
>> 	__le32	e_value_size;	/* size of attribute value */
>> 	__le32	e_hash;		/* hash value of name and value */
>> 	char	e_name[0];	/* attribute name */
>> @@ -69,6 +69,26 @@ struct ext4_xattr_entry {
>> 		EXT4_I(inode)->i_extra_isize))
>> #define IFIRST(hdr) ((struct ext4_xattr_entry *)((hdr)+1))
>> 
>> +/*
>> + * Link EA inode back to parent one using i_mtime field.
>> + * Extra integer type conversion added to ignore higher
>> + * bits in i_mtime.tv_sec which might be set by ext4_get()
>> + */
>> +#define EXT4_XATTR_INODE_SET_PARENT(inode, inum)      \
>> +do {                                                  \
>> +      (inode)->i_mtime.tv_sec = inum;                 \
>> +} while(0)
>> +
>> +#define EXT4_XATTR_INODE_GET_PARENT(inode)            \
>> +((__u32)(inode)->i_mtime.tv_sec)
>> +
>> +/*
>> + * The minimum size of EA value when you start storing it in an external inode
>> + * size of block - size of header - size of 1 entry - 4 null bytes
>> +*/
>> +#define EXT4_XATTR_MIN_LARGE_EA_SIZE(b)					\
>> +	((b) - EXT4_XATTR_LEN(3) - sizeof(struct ext4_xattr_header) - 4)
>> +
>> #define BHDR(bh) ((struct ext4_xattr_header *)((bh)->b_data))
>> #define ENTRY(ptr) ((struct ext4_xattr_entry *)(ptr))
>> #define BFIRST(bh) ENTRY(BHDR(bh)+1)
>> @@ -77,10 +97,11 @@ struct ext4_xattr_entry {
>> #define EXT4_ZERO_XATTR_VALUE ((void *)-1)
>> 
>> struct ext4_xattr_info {
>> -	int name_index;
>> 	const char *name;
>> 	const void *value;
>> 	size_t value_len;
>> +	int name_index;
>> +	int in_inode;
>> };
>> 
>> struct ext4_xattr_search {
>> @@ -140,7 +161,13 @@ static inline void ext4_write_unlock_xattr(struct inode *inode, int *save)
>> extern int ext4_xattr_set(struct inode *, int, const char *, const void *, size_t, int);
>> extern int ext4_xattr_set_handle(handle_t *, struct inode *, int, const char *, const void *, size_t, int);
>> 
>> -extern void ext4_xattr_delete_inode(handle_t *, struct inode *);
>> +extern struct inode *ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino,
>> +					   int *err);
>> +extern int ext4_xattr_inode_unlink(struct inode *inode, unsigned long ea_ino);
>> +extern int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
>> +				   struct ext4_xattr_ino_array **array);
>> +extern void ext4_xattr_inode_array_free(struct inode *inode,
>> +					struct ext4_xattr_ino_array *array);
>> 
>> extern int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
>> 			    struct ext4_inode *raw_inode, handle_t *handle);
>> diff --git a/include/uapi/linux/netfilter/xt_CONNMARK.h b/include/uapi/linux/netfilter/xt_CONNMARK.h
>> index 2f2e48e..efc17a8 100644
>> --- a/include/uapi/linux/netfilter/xt_CONNMARK.h
>> +++ b/include/uapi/linux/netfilter/xt_CONNMARK.h
>> @@ -1,6 +1,31 @@
>> -#ifndef _XT_CONNMARK_H_target
>> -#define _XT_CONNMARK_H_target
>> +#ifndef _XT_CONNMARK_H
>> +#define _XT_CONNMARK_H
>> 
>> -#include <linux/netfilter/xt_connmark.h>
>> +#include <linux/types.h>
>> 
>> -#endif /*_XT_CONNMARK_H_target*/
>> +/* Copyright (C) 2002,2004 MARA Systems AB <http://www.marasystems.com>
>> + * by Henrik Nordstrom <hno@...asystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + */
>> +
>> +enum {
>> +	XT_CONNMARK_SET = 0,
>> +	XT_CONNMARK_SAVE,
>> +	XT_CONNMARK_RESTORE
>> +};
>> +
>> +struct xt_connmark_tginfo1 {
>> +	__u32 ctmark, ctmask, nfmask;
>> +	__u8 mode;
>> +};
>> +
>> +struct xt_connmark_mtinfo1 {
>> +	__u32 mark, mask;
>> +	__u8 invert;
>> +};
>> +
>> +#endif /*_XT_CONNMARK_H*/
>> diff --git a/include/uapi/linux/netfilter/xt_DSCP.h b/include/uapi/linux/netfilter/xt_DSCP.h
>> index 648e0b3..15f8932 100644
>> --- a/include/uapi/linux/netfilter/xt_DSCP.h
>> +++ b/include/uapi/linux/netfilter/xt_DSCP.h
>> @@ -1,26 +1,31 @@
>> -/* x_tables module for setting the IPv4/IPv6 DSCP field
>> +/* x_tables module for matching the IPv4/IPv6 DSCP field
>> *
>> * (C) 2002 Harald Welte <laforge@...monks.org>
>> - * based on ipt_FTOS.c (C) 2000 by Matthew G. Marsh <mgm@...tronix.com>
>> * This software is distributed under GNU GPL v2, 1991
>> *
>> * See RFC2474 for a description of the DSCP field within the IP Header.
>> *
>> - * xt_DSCP.h,v 1.7 2002/03/14 12:03:13 laforge Exp
>> + * xt_dscp.h,v 1.3 2002/08/05 19:00:21 laforge Exp
>> */
>> -#ifndef _XT_DSCP_TARGET_H
>> -#define _XT_DSCP_TARGET_H
>> -#include <linux/netfilter/xt_dscp.h>
>> +#ifndef _XT_DSCP_H
>> +#define _XT_DSCP_H
>> +
>> #include <linux/types.h>
>> 
>> -/* target info */
>> -struct xt_DSCP_info {
>> +#define XT_DSCP_MASK	0xfc	/* 11111100 */
>> +#define XT_DSCP_SHIFT	2
>> +#define XT_DSCP_MAX	0x3f	/* 00111111 */
>> +
>> +/* match info */
>> +struct xt_dscp_info {
>> 	__u8 dscp;
>> +	__u8 invert;
>> };
>> 
>> -struct xt_tos_target_info {
>> -	__u8 tos_value;
>> +struct xt_tos_match_info {
>> 	__u8 tos_mask;
>> +	__u8 tos_value;
>> +	__u8 invert;
>> };
>> 
>> -#endif /* _XT_DSCP_TARGET_H */
>> +#endif /* _XT_DSCP_H */
>> diff --git a/include/uapi/linux/netfilter/xt_MARK.h b/include/uapi/linux/netfilter/xt_MARK.h
>> index 41c456d..ecadc40 100644
>> --- a/include/uapi/linux/netfilter/xt_MARK.h
>> +++ b/include/uapi/linux/netfilter/xt_MARK.h
>> @@ -1,6 +1,15 @@
>> -#ifndef _XT_MARK_H_target
>> -#define _XT_MARK_H_target
>> +#ifndef _XT_MARK_H
>> +#define _XT_MARK_H
>> 
>> -#include <linux/netfilter/xt_mark.h>
>> +#include <linux/types.h>
>> 
>> -#endif /*_XT_MARK_H_target */
>> +struct xt_mark_tginfo2 {
>> +	__u32 mark, mask;
>> +};
>> +
>> +struct xt_mark_mtinfo1 {
>> +	__u32 mark, mask;
>> +	__u8 invert;
>> +};
>> +
>> +#endif /*_XT_MARK_H*/
>> diff --git a/include/uapi/linux/netfilter/xt_TCPMSS.h b/include/uapi/linux/netfilter/xt_TCPMSS.h
>> index 9a6960a..fbac56b 100644
>> --- a/include/uapi/linux/netfilter/xt_TCPMSS.h
>> +++ b/include/uapi/linux/netfilter/xt_TCPMSS.h
>> @@ -1,12 +1,11 @@
>> -#ifndef _XT_TCPMSS_H
>> -#define _XT_TCPMSS_H
>> +#ifndef _XT_TCPMSS_MATCH_H
>> +#define _XT_TCPMSS_MATCH_H
>> 
>> #include <linux/types.h>
>> 
>> -struct xt_tcpmss_info {
>> -	__u16 mss;
>> +struct xt_tcpmss_match_info {
>> +    __u16 mss_min, mss_max;
>> +    __u8 invert;
>> };
>> 
>> -#define XT_TCPMSS_CLAMP_PMTU 0xffff
>> -
>> -#endif /* _XT_TCPMSS_H */
>> +#endif /*_XT_TCPMSS_MATCH_H*/
>> diff --git a/include/uapi/linux/netfilter/xt_rateest.h b/include/uapi/linux/netfilter/xt_rateest.h
>> index 13fe50d..ec1b570 100644
>> --- a/include/uapi/linux/netfilter/xt_rateest.h
>> +++ b/include/uapi/linux/netfilter/xt_rateest.h
>> @@ -1,38 +1,16 @@
>> -#ifndef _XT_RATEEST_MATCH_H
>> -#define _XT_RATEEST_MATCH_H
>> +#ifndef _XT_RATEEST_TARGET_H
>> +#define _XT_RATEEST_TARGET_H
>> 
>> #include <linux/types.h>
>> #include <linux/if.h>
>> 
>> -enum xt_rateest_match_flags {
>> -	XT_RATEEST_MATCH_INVERT	= 1<<0,
>> -	XT_RATEEST_MATCH_ABS	= 1<<1,
>> -	XT_RATEEST_MATCH_REL	= 1<<2,
>> -	XT_RATEEST_MATCH_DELTA	= 1<<3,
>> -	XT_RATEEST_MATCH_BPS	= 1<<4,
>> -	XT_RATEEST_MATCH_PPS	= 1<<5,
>> -};
>> -
>> -enum xt_rateest_match_mode {
>> -	XT_RATEEST_MATCH_NONE,
>> -	XT_RATEEST_MATCH_EQ,
>> -	XT_RATEEST_MATCH_LT,
>> -	XT_RATEEST_MATCH_GT,
>> -};
>> -
>> -struct xt_rateest_match_info {
>> -	char			name1[IFNAMSIZ];
>> -	char			name2[IFNAMSIZ];
>> -	__u16		flags;
>> -	__u16		mode;
>> -	__u32		bps1;
>> -	__u32		pps1;
>> -	__u32		bps2;
>> -	__u32		pps2;
>> +struct xt_rateest_target_info {
>> +	char			name[IFNAMSIZ];
>> +	__s8			interval;
>> +	__u8		ewma_log;
>> 
>> 	/* Used internally by the kernel */
>> -	struct xt_rateest	*est1 __attribute__((aligned(8)));
>> -	struct xt_rateest	*est2 __attribute__((aligned(8)));
>> +	struct xt_rateest	*est __attribute__((aligned(8)));
>> };
>> 
>> -#endif /* _XT_RATEEST_MATCH_H */
>> +#endif /* _XT_RATEEST_TARGET_H */
>> diff --git a/include/uapi/linux/netfilter_ipv4/ipt_ECN.h b/include/uapi/linux/netfilter_ipv4/ipt_ECN.h
>> index bb88d53..0e0c063 100644
>> --- a/include/uapi/linux/netfilter_ipv4/ipt_ECN.h
>> +++ b/include/uapi/linux/netfilter_ipv4/ipt_ECN.h
>> @@ -1,33 +1,15 @@
>> -/* Header file for iptables ipt_ECN target
>> - *
>> - * (C) 2002 by Harald Welte <laforge@...monks.org>
>> - *
>> - * This software is distributed under GNU GPL v2, 1991
>> - *
>> - * ipt_ECN.h,v 1.3 2002/05/29 12:17:40 laforge Exp
>> -*/
>> -#ifndef _IPT_ECN_TARGET_H
>> -#define _IPT_ECN_TARGET_H
>> -
>> -#include <linux/types.h>
>> -#include <linux/netfilter/xt_DSCP.h>
>> -
>> -#define IPT_ECN_IP_MASK	(~XT_DSCP_MASK)
>> -
>> -#define IPT_ECN_OP_SET_IP	0x01	/* set ECN bits of IPv4 header */
>> -#define IPT_ECN_OP_SET_ECE	0x10	/* set ECE bit of TCP header */
>> -#define IPT_ECN_OP_SET_CWR	0x20	/* set CWR bit of TCP header */
>> -
>> -#define IPT_ECN_OP_MASK		0xce
>> -
>> -struct ipt_ECN_info {
>> -	__u8 operation;	/* bitset of operations */
>> -	__u8 ip_ect;	/* ECT codepoint of IPv4 header, pre-shifted */
>> -	union {
>> -		struct {
>> -			__u8 ece:1, cwr:1; /* TCP ECT bits */
>> -		} tcp;
>> -	} proto;
>> +#ifndef _IPT_ECN_H
>> +#define _IPT_ECN_H
>> +
>> +#include <linux/netfilter/xt_ecn.h>
>> +#define ipt_ecn_info xt_ecn_info
>> +
>> +enum {
>> +	IPT_ECN_IP_MASK       = XT_ECN_IP_MASK,
>> +	IPT_ECN_OP_MATCH_IP   = XT_ECN_OP_MATCH_IP,
>> +	IPT_ECN_OP_MATCH_ECE  = XT_ECN_OP_MATCH_ECE,
>> +	IPT_ECN_OP_MATCH_CWR  = XT_ECN_OP_MATCH_CWR,
>> +	IPT_ECN_OP_MATCH_MASK = XT_ECN_OP_MATCH_MASK,
>> };
>> 
>> -#endif /* _IPT_ECN_TARGET_H */
>> +#endif /* IPT_ECN_H */
>> diff --git a/include/uapi/linux/netfilter_ipv4/ipt_TTL.h b/include/uapi/linux/netfilter_ipv4/ipt_TTL.h
>> index f6ac169..37bee44 100644
>> --- a/include/uapi/linux/netfilter_ipv4/ipt_TTL.h
>> +++ b/include/uapi/linux/netfilter_ipv4/ipt_TTL.h
>> @@ -1,5 +1,5 @@
>> -/* TTL modification module for IP tables
>> - * (C) 2000 by Harald Welte <laforge@...filter.org> */
>> +/* IP tables module for matching the value of the TTL
>> + * (C) 2000 by Harald Welte <laforge@...monks.org> */
>> 
>> #ifndef _IPT_TTL_H
>> #define _IPT_TTL_H
>> @@ -7,14 +7,14 @@
>> #include <linux/types.h>
>> 
>> enum {
>> -	IPT_TTL_SET = 0,
>> -	IPT_TTL_INC,
>> -	IPT_TTL_DEC
>> +	IPT_TTL_EQ = 0,		/* equals */
>> +	IPT_TTL_NE,		/* not equals */
>> +	IPT_TTL_LT,		/* less than */
>> +	IPT_TTL_GT,		/* greater than */
>> };
>> 
>> -#define IPT_TTL_MAXMODE	IPT_TTL_DEC
>> 
>> -struct ipt_TTL_info {
>> +struct ipt_ttl_info {
>> 	__u8	mode;
>> 	__u8	ttl;
>> };
>> diff --git a/include/uapi/linux/netfilter_ipv6/ip6t_HL.h b/include/uapi/linux/netfilter_ipv6/ip6t_HL.h
>> index ebd8ead..6e76dbc 100644
>> --- a/include/uapi/linux/netfilter_ipv6/ip6t_HL.h
>> +++ b/include/uapi/linux/netfilter_ipv6/ip6t_HL.h
>> @@ -1,6 +1,6 @@
>> -/* Hop Limit modification module for ip6tables
>> +/* ip6tables module for matching the Hop Limit value
>> * Maciej Soltysiak <solt@....toxicfilms.tv>
>> - * Based on HW's TTL module */
>> + * Based on HW's ttl module */
>> 
>> #ifndef _IP6T_HL_H
>> #define _IP6T_HL_H
>> @@ -8,14 +8,14 @@
>> #include <linux/types.h>
>> 
>> enum {
>> -	IP6T_HL_SET = 0,
>> -	IP6T_HL_INC,
>> -	IP6T_HL_DEC
>> +	IP6T_HL_EQ = 0,		/* equals */
>> +	IP6T_HL_NE,		/* not equals */
>> +	IP6T_HL_LT,		/* less than */
>> +	IP6T_HL_GT,		/* greater than */
>> };
>> 
>> -#define IP6T_HL_MAXMODE	IP6T_HL_DEC
>> 
>> -struct ip6t_HL_info {
>> +struct ip6t_hl_info {
>> 	__u8	mode;
>> 	__u8	hop_limit;
>> };
>> diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
>> index 498b54f..755d2f6 100644
>> --- a/net/netfilter/xt_RATEEST.c
>> +++ b/net/netfilter/xt_RATEEST.c
>> @@ -8,184 +8,149 @@
>> #include <linux/module.h>
>> #include <linux/skbuff.h>
>> #include <linux/gen_stats.h>
>> -#include <linux/jhash.h>
>> -#include <linux/rtnetlink.h>
>> -#include <linux/random.h>
>> -#include <linux/slab.h>
>> -#include <net/gen_stats.h>
>> -#include <net/netlink.h>
>> 
>> #include <linux/netfilter/x_tables.h>
>> -#include <linux/netfilter/xt_RATEEST.h>
>> +#include <linux/netfilter/xt_rateest.h>
>> #include <net/netfilter/xt_rateest.h>
>> 
>> -static DEFINE_MUTEX(xt_rateest_mutex);
>> 
>> -#define RATEEST_HSIZE	16
>> -static struct hlist_head rateest_hash[RATEEST_HSIZE] __read_mostly;
>> -static unsigned int jhash_rnd __read_mostly;
>> -
>> -static unsigned int xt_rateest_hash(const char *name)
>> -{
>> -	return jhash(name, FIELD_SIZEOF(struct xt_rateest, name), jhash_rnd) &
>> -	       (RATEEST_HSIZE - 1);
>> -}
>> -
>> -static void xt_rateest_hash_insert(struct xt_rateest *est)
>> +static bool
>> +xt_rateest_mt(const struct sk_buff *skb, struct xt_action_param *par)
>> {
>> -	unsigned int h;
>> -
>> -	h = xt_rateest_hash(est->name);
>> -	hlist_add_head(&est->list, &rateest_hash[h]);
>> -}
>> +	const struct xt_rateest_match_info *info = par->matchinfo;
>> +	struct gnet_stats_rate_est64 sample = {0};
>> +	u_int32_t bps1, bps2, pps1, pps2;
>> +	bool ret = true;
>> +
>> +	gen_estimator_read(&info->est1->rate_est, &sample);
>> +
>> +	if (info->flags & XT_RATEEST_MATCH_DELTA) {
>> +		bps1 = info->bps1 >= sample.bps ? info->bps1 - sample.bps : 0;
>> +		pps1 = info->pps1 >= sample.pps ? info->pps1 - sample.pps : 0;
>> +	} else {
>> +		bps1 = sample.bps;
>> +		pps1 = sample.pps;
>> +	}
>> 
>> -struct xt_rateest *xt_rateest_lookup(const char *name)
>> -{
>> -	struct xt_rateest *est;
>> -	unsigned int h;
>> -
>> -	h = xt_rateest_hash(name);
>> -	mutex_lock(&xt_rateest_mutex);
>> -	hlist_for_each_entry(est, &rateest_hash[h], list) {
>> -		if (strcmp(est->name, name) == 0) {
>> -			est->refcnt++;
>> -			mutex_unlock(&xt_rateest_mutex);
>> -			return est;
>> +	if (info->flags & XT_RATEEST_MATCH_ABS) {
>> +		bps2 = info->bps2;
>> +		pps2 = info->pps2;
>> +	} else {
>> +		gen_estimator_read(&info->est2->rate_est, &sample);
>> +
>> +		if (info->flags & XT_RATEEST_MATCH_DELTA) {
>> +			bps2 = info->bps2 >= sample.bps ? info->bps2 - sample.bps : 0;
>> +			pps2 = info->pps2 >= sample.pps ? info->pps2 - sample.pps : 0;
>> +		} else {
>> +			bps2 = sample.bps;
>> +			pps2 = sample.pps;
>> 		}
>> 	}
>> -	mutex_unlock(&xt_rateest_mutex);
>> -	return NULL;
>> -}
>> -EXPORT_SYMBOL_GPL(xt_rateest_lookup);
>> 
>> -void xt_rateest_put(struct xt_rateest *est)
>> -{
>> -	mutex_lock(&xt_rateest_mutex);
>> -	if (--est->refcnt == 0) {
>> -		hlist_del(&est->list);
>> -		gen_kill_estimator(&est->rate_est);
>> -		/*
>> -		 * gen_estimator est_timer() might access est->lock or bstats,
>> -		 * wait a RCU grace period before freeing 'est'
>> -		 */
>> -		kfree_rcu(est, rcu);
>> +	switch (info->mode) {
>> +	case XT_RATEEST_MATCH_LT:
>> +		if (info->flags & XT_RATEEST_MATCH_BPS)
>> +			ret &= bps1 < bps2;
>> +		if (info->flags & XT_RATEEST_MATCH_PPS)
>> +			ret &= pps1 < pps2;
>> +		break;
>> +	case XT_RATEEST_MATCH_GT:
>> +		if (info->flags & XT_RATEEST_MATCH_BPS)
>> +			ret &= bps1 > bps2;
>> +		if (info->flags & XT_RATEEST_MATCH_PPS)
>> +			ret &= pps1 > pps2;
>> +		break;
>> +	case XT_RATEEST_MATCH_EQ:
>> +		if (info->flags & XT_RATEEST_MATCH_BPS)
>> +			ret &= bps1 == bps2;
>> +		if (info->flags & XT_RATEEST_MATCH_PPS)
>> +			ret &= pps1 == pps2;
>> +		break;
>> 	}
>> -	mutex_unlock(&xt_rateest_mutex);
>> +
>> +	ret ^= info->flags & XT_RATEEST_MATCH_INVERT ? true : false;
>> +	return ret;
>> }
>> -EXPORT_SYMBOL_GPL(xt_rateest_put);
>> 
>> -static unsigned int
>> -xt_rateest_tg(struct sk_buff *skb, const struct xt_action_param *par)
>> +static int xt_rateest_mt_checkentry(const struct xt_mtchk_param *par)
>> {
>> -	const struct xt_rateest_target_info *info = par->targinfo;
>> -	struct gnet_stats_basic_packed *stats = &info->est->bstats;
>> +	struct xt_rateest_match_info *info = par->matchinfo;
>> +	struct xt_rateest *est1, *est2;
>> +	int ret = -EINVAL;
>> 
>> -	spin_lock_bh(&info->est->lock);
>> -	stats->bytes += skb->len;
>> -	stats->packets++;
>> -	spin_unlock_bh(&info->est->lock);
>> +	if (hweight32(info->flags & (XT_RATEEST_MATCH_ABS |
>> +				     XT_RATEEST_MATCH_REL)) != 1)
>> +		goto err1;
>> 
>> -	return XT_CONTINUE;
>> -}
>> +	if (!(info->flags & (XT_RATEEST_MATCH_BPS | XT_RATEEST_MATCH_PPS)))
>> +		goto err1;
>> 
>> -static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par)
>> -{
>> -	struct xt_rateest_target_info *info = par->targinfo;
>> -	struct xt_rateest *est;
>> -	struct {
>> -		struct nlattr		opt;
>> -		struct gnet_estimator	est;
>> -	} cfg;
>> -	int ret;
>> -
>> -	net_get_random_once(&jhash_rnd, sizeof(jhash_rnd));
>> -
>> -	est = xt_rateest_lookup(info->name);
>> -	if (est) {
>> -		/*
>> -		 * If estimator parameters are specified, they must match the
>> -		 * existing estimator.
>> -		 */
>> -		if ((!info->interval && !info->ewma_log) ||
>> -		    (info->interval != est->params.interval ||
>> -		     info->ewma_log != est->params.ewma_log)) {
>> -			xt_rateest_put(est);
>> -			return -EINVAL;
>> -		}
>> -		info->est = est;
>> -		return 0;
>> +	switch (info->mode) {
>> +	case XT_RATEEST_MATCH_EQ:
>> +	case XT_RATEEST_MATCH_LT:
>> +	case XT_RATEEST_MATCH_GT:
>> +		break;
>> +	default:
>> +		goto err1;
>> 	}
>> 
>> -	ret = -ENOMEM;
>> -	est = kzalloc(sizeof(*est), GFP_KERNEL);
>> -	if (!est)
>> +	ret  = -ENOENT;
>> +	est1 = xt_rateest_lookup(info->name1);
>> +	if (!est1)
>> 		goto err1;
>> 
>> -	strlcpy(est->name, info->name, sizeof(est->name));
>> -	spin_lock_init(&est->lock);
>> -	est->refcnt		= 1;
>> -	est->params.interval	= info->interval;
>> -	est->params.ewma_log	= info->ewma_log;
>> -
>> -	cfg.opt.nla_len		= nla_attr_size(sizeof(cfg.est));
>> -	cfg.opt.nla_type	= TCA_STATS_RATE_EST;
>> -	cfg.est.interval	= info->interval;
>> -	cfg.est.ewma_log	= info->ewma_log;
>> -
>> -	ret = gen_new_estimator(&est->bstats, NULL, &est->rate_est,
>> -				&est->lock, NULL, &cfg.opt);
>> -	if (ret < 0)
>> -		goto err2;
>> +	est2 = NULL;
>> +	if (info->flags & XT_RATEEST_MATCH_REL) {
>> +		est2 = xt_rateest_lookup(info->name2);
>> +		if (!est2)
>> +			goto err2;
>> +	}
>> 
>> -	info->est = est;
>> -	xt_rateest_hash_insert(est);
>> +	info->est1 = est1;
>> +	info->est2 = est2;
>> 	return 0;
>> 
>> err2:
>> -	kfree(est);
>> +	xt_rateest_put(est1);
>> err1:
>> 	return ret;
>> }
>> 
>> -static void xt_rateest_tg_destroy(const struct xt_tgdtor_param *par)
>> +static void xt_rateest_mt_destroy(const struct xt_mtdtor_param *par)
>> {
>> -	struct xt_rateest_target_info *info = par->targinfo;
>> +	struct xt_rateest_match_info *info = par->matchinfo;
>> 
>> -	xt_rateest_put(info->est);
>> +	xt_rateest_put(info->est1);
>> +	if (info->est2)
>> +		xt_rateest_put(info->est2);
>> }
>> 
>> -static struct xt_target xt_rateest_tg_reg __read_mostly = {
>> -	.name       = "RATEEST",
>> +static struct xt_match xt_rateest_mt_reg __read_mostly = {
>> +	.name       = "rateest",
>> 	.revision   = 0,
>> 	.family     = NFPROTO_UNSPEC,
>> -	.target     = xt_rateest_tg,
>> -	.checkentry = xt_rateest_tg_checkentry,
>> -	.destroy    = xt_rateest_tg_destroy,
>> -	.targetsize = sizeof(struct xt_rateest_target_info),
>> -	.usersize   = offsetof(struct xt_rateest_target_info, est),
>> +	.match      = xt_rateest_mt,
>> +	.checkentry = xt_rateest_mt_checkentry,
>> +	.destroy    = xt_rateest_mt_destroy,
>> +	.matchsize  = sizeof(struct xt_rateest_match_info),
>> +	.usersize   = offsetof(struct xt_rateest_match_info, est1),
>> 	.me         = THIS_MODULE,
>> };
>> 
>> -static int __init xt_rateest_tg_init(void)
>> +static int __init xt_rateest_mt_init(void)
>> {
>> -	unsigned int i;
>> -
>> -	for (i = 0; i < ARRAY_SIZE(rateest_hash); i++)
>> -		INIT_HLIST_HEAD(&rateest_hash[i]);
>> -
>> -	return xt_register_target(&xt_rateest_tg_reg);
>> +	return xt_register_match(&xt_rateest_mt_reg);
>> }
>> 
>> -static void __exit xt_rateest_tg_fini(void)
>> +static void __exit xt_rateest_mt_fini(void)
>> {
>> -	xt_unregister_target(&xt_rateest_tg_reg);
>> +	xt_unregister_match(&xt_rateest_mt_reg);
>> }
>> 
>> -
>> MODULE_AUTHOR("Patrick McHardy <kaber@...sh.net>");
>> MODULE_LICENSE("GPL");
>> -MODULE_DESCRIPTION("Xtables: packet rate estimator");
>> -MODULE_ALIAS("ipt_RATEEST");
>> -MODULE_ALIAS("ip6t_RATEEST");
>> -module_init(xt_rateest_tg_init);
>> -module_exit(xt_rateest_tg_fini);
>> +MODULE_DESCRIPTION("xtables rate estimator match");
>> +MODULE_ALIAS("ipt_rateest");
>> +MODULE_ALIAS("ip6t_rateest");
>> +module_init(xt_rateest_mt_init);
>> +module_exit(xt_rateest_mt_fini);
>> diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
>> index 27241a7..c53d4d1 100644
>> --- a/net/netfilter/xt_TCPMSS.c
>> +++ b/net/netfilter/xt_TCPMSS.c
>> @@ -1,351 +1,110 @@
>> -/*
>> - * This is a module which is used for setting the MSS option in TCP packets.
>> - *
>> - * Copyright (C) 2000 Marc Boucher <marc@...i.ca>
>> - * Copyright (C) 2007 Patrick McHardy <kaber@...sh.net>
>> +/* Kernel module to match TCP MSS values. */
>> +
>> +/* Copyright (C) 2000 Marc Boucher <marc@...i.ca>
>> + * Portions (C) 2005 by Harald Welte <laforge@...filter.org>
>> *
>> * This program is free software; you can redistribute it and/or modify
>> * it under the terms of the GNU General Public License version 2 as
>> * published by the Free Software Foundation.
>> */
>> -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +
>> #include <linux/module.h>
>> #include <linux/skbuff.h>
>> -#include <linux/ip.h>
>> -#include <linux/gfp.h>
>> -#include <linux/ipv6.h>
>> -#include <linux/tcp.h>
>> -#include <net/dst.h>
>> -#include <net/flow.h>
>> -#include <net/ipv6.h>
>> -#include <net/route.h>
>> #include <net/tcp.h>
>> 
>> +#include <linux/netfilter/xt_tcpmss.h>
>> +#include <linux/netfilter/x_tables.h>
>> +
>> #include <linux/netfilter_ipv4/ip_tables.h>
>> #include <linux/netfilter_ipv6/ip6_tables.h>
>> -#include <linux/netfilter/x_tables.h>
>> -#include <linux/netfilter/xt_tcpudp.h>
>> -#include <linux/netfilter/xt_TCPMSS.h>
>> 
>> MODULE_LICENSE("GPL");
>> MODULE_AUTHOR("Marc Boucher <marc@...i.ca>");
>> -MODULE_DESCRIPTION("Xtables: TCP Maximum Segment Size (MSS) adjustment");
>> -MODULE_ALIAS("ipt_TCPMSS");
>> -MODULE_ALIAS("ip6t_TCPMSS");
>> -
>> -static inline unsigned int
>> -optlen(const u_int8_t *opt, unsigned int offset)
>> -{
>> -	/* Beware zero-length options: make finite progress */
>> -	if (opt[offset] <= TCPOPT_NOP || opt[offset+1] == 0)
>> -		return 1;
>> -	else
>> -		return opt[offset+1];
>> -}
>> -
>> -static u_int32_t tcpmss_reverse_mtu(struct net *net,
>> -				    const struct sk_buff *skb,
>> -				    unsigned int family)
>> -{
>> -	struct flowi fl;
>> -	const struct nf_afinfo *ai;
>> -	struct rtable *rt = NULL;
>> -	u_int32_t mtu     = ~0U;
>> -
>> -	if (family == PF_INET) {
>> -		struct flowi4 *fl4 = &fl.u.ip4;
>> -		memset(fl4, 0, sizeof(*fl4));
>> -		fl4->daddr = ip_hdr(skb)->saddr;
>> -	} else {
>> -		struct flowi6 *fl6 = &fl.u.ip6;
>> -
>> -		memset(fl6, 0, sizeof(*fl6));
>> -		fl6->daddr = ipv6_hdr(skb)->saddr;
>> -	}
>> -	rcu_read_lock();
>> -	ai = nf_get_afinfo(family);
>> -	if (ai != NULL)
>> -		ai->route(net, (struct dst_entry **)&rt, &fl, false);
>> -	rcu_read_unlock();
>> -
>> -	if (rt != NULL) {
>> -		mtu = dst_mtu(&rt->dst);
>> -		dst_release(&rt->dst);
>> -	}
>> -	return mtu;
>> -}
>> +MODULE_DESCRIPTION("Xtables: TCP MSS match");
>> +MODULE_ALIAS("ipt_tcpmss");
>> +MODULE_ALIAS("ip6t_tcpmss");
>> 
>> -static int
>> -tcpmss_mangle_packet(struct sk_buff *skb,
>> -		     const struct xt_action_param *par,
>> -		     unsigned int family,
>> -		     unsigned int tcphoff,
>> -		     unsigned int minlen)
>> +static bool
>> +tcpmss_mt(const struct sk_buff *skb, struct xt_action_param *par)
>> {
>> -	const struct xt_tcpmss_info *info = par->targinfo;
>> -	struct tcphdr *tcph;
>> -	int len, tcp_hdrlen;
>> -	unsigned int i;
>> -	__be16 oldval;
>> -	u16 newmss;
>> -	u8 *opt;
>> -
>> -	/* This is a fragment, no TCP header is available */
>> -	if (par->fragoff != 0)
>> -		return 0;
>> -
>> -	if (!skb_make_writable(skb, skb->len))
>> -		return -1;
>> -
>> -	len = skb->len - tcphoff;
>> -	if (len < (int)sizeof(struct tcphdr))
>> -		return -1;
>> -
>> -	tcph = (struct tcphdr *)(skb_network_header(skb) + tcphoff);
>> -	tcp_hdrlen = tcph->doff * 4;
>> -
>> -	if (len < tcp_hdrlen)
>> -		return -1;
>> -
>> -	if (info->mss == XT_TCPMSS_CLAMP_PMTU) {
>> -		struct net *net = xt_net(par);
>> -		unsigned int in_mtu = tcpmss_reverse_mtu(net, skb, family);
>> -		unsigned int min_mtu = min(dst_mtu(skb_dst(skb)), in_mtu);
>> -
>> -		if (min_mtu <= minlen) {
>> -			net_err_ratelimited("unknown or invalid path-MTU (%u)\n",
>> -					    min_mtu);
>> -			return -1;
>> -		}
>> -		newmss = min_mtu - minlen;
>> -	} else
>> -		newmss = info->mss;
>> -
>> -	opt = (u_int8_t *)tcph;
>> -	for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += optlen(opt, i)) {
>> -		if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) {
>> -			u_int16_t oldmss;
>> -
>> -			oldmss = (opt[i+2] << 8) | opt[i+3];
>> -
>> -			/* Never increase MSS, even when setting it, as
>> -			 * doing so results in problems for hosts that rely
>> -			 * on MSS being set correctly.
>> -			 */
>> -			if (oldmss <= newmss)
>> -				return 0;
>> -
>> -			opt[i+2] = (newmss & 0xff00) >> 8;
>> -			opt[i+3] = newmss & 0x00ff;
>> -
>> -			inet_proto_csum_replace2(&tcph->check, skb,
>> -						 htons(oldmss), htons(newmss),
>> -						 false);
>> -			return 0;
>> +	const struct xt_tcpmss_match_info *info = par->matchinfo;
>> +	const struct tcphdr *th;
>> +	struct tcphdr _tcph;
>> +	/* tcp.doff is only 4 bits, ie. max 15 * 4 bytes */
>> +	const u_int8_t *op;
>> +	u8 _opt[15 * 4 - sizeof(_tcph)];
>> +	unsigned int i, optlen;
>> +
>> +	/* If we don't have the whole header, drop packet. */
>> +	th = skb_header_pointer(skb, par->thoff, sizeof(_tcph), &_tcph);
>> +	if (th == NULL)
>> +		goto dropit;
>> +
>> +	/* Malformed. */
>> +	if (th->doff*4 < sizeof(*th))
>> +		goto dropit;
>> +
>> +	optlen = th->doff*4 - sizeof(*th);
>> +	if (!optlen)
>> +		goto out;
>> +
>> +	/* Truncated options. */
>> +	op = skb_header_pointer(skb, par->thoff + sizeof(*th), optlen, _opt);
>> +	if (op == NULL)
>> +		goto dropit;
>> +
>> +	for (i = 0; i < optlen; ) {
>> +		if (op[i] == TCPOPT_MSS
>> +		    && (optlen - i) >= TCPOLEN_MSS
>> +		    && op[i+1] == TCPOLEN_MSS) {
>> +			u_int16_t mssval;
>> +
>> +			mssval = (op[i+2] << 8) | op[i+3];
>> +
>> +			return (mssval >= info->mss_min &&
>> +				mssval <= info->mss_max) ^ info->invert;
>> 		}
>> +		if (op[i] < 2)
>> +			i++;
>> +		else
>> +			i += op[i+1] ? : 1;
>> 	}
>> +out:
>> +	return info->invert;
>> 
>> -	/* There is data after the header so the option can't be added
>> -	 * without moving it, and doing so may make the SYN packet
>> -	 * itself too large. Accept the packet unmodified instead.
>> -	 */
>> -	if (len > tcp_hdrlen)
>> -		return 0;
>> -
>> -	/*
>> -	 * MSS Option not found ?! add it..
>> -	 */
>> -	if (skb_tailroom(skb) < TCPOLEN_MSS) {
>> -		if (pskb_expand_head(skb, 0,
>> -				     TCPOLEN_MSS - skb_tailroom(skb),
>> -				     GFP_ATOMIC))
>> -			return -1;
>> -		tcph = (struct tcphdr *)(skb_network_header(skb) + tcphoff);
>> -	}
>> -
>> -	skb_put(skb, TCPOLEN_MSS);
>> -
>> -	/*
>> -	 * IPv4: RFC 1122 states "If an MSS option is not received at
>> -	 * connection setup, TCP MUST assume a default send MSS of 536".
>> -	 * IPv6: RFC 2460 states IPv6 has a minimum MTU of 1280 and a minimum
>> -	 * length IPv6 header of 60, ergo the default MSS value is 1220
>> -	 * Since no MSS was provided, we must use the default values
>> -	 */
>> -	if (xt_family(par) == NFPROTO_IPV4)
>> -		newmss = min(newmss, (u16)536);
>> -	else
>> -		newmss = min(newmss, (u16)1220);
>> -
>> -	opt = (u_int8_t *)tcph + sizeof(struct tcphdr);
>> -	memmove(opt + TCPOLEN_MSS, opt, len - sizeof(struct tcphdr));
>> -
>> -	inet_proto_csum_replace2(&tcph->check, skb,
>> -				 htons(len), htons(len + TCPOLEN_MSS), true);
>> -	opt[0] = TCPOPT_MSS;
>> -	opt[1] = TCPOLEN_MSS;
>> -	opt[2] = (newmss & 0xff00) >> 8;
>> -	opt[3] = newmss & 0x00ff;
>> -
>> -	inet_proto_csum_replace4(&tcph->check, skb, 0, *((__be32 *)opt), false);
>> -
>> -	oldval = ((__be16 *)tcph)[6];
>> -	tcph->doff += TCPOLEN_MSS/4;
>> -	inet_proto_csum_replace2(&tcph->check, skb,
>> -				 oldval, ((__be16 *)tcph)[6], false);
>> -	return TCPOLEN_MSS;
>> -}
>> -
>> -static unsigned int
>> -tcpmss_tg4(struct sk_buff *skb, const struct xt_action_param *par)
>> -{
>> -	struct iphdr *iph = ip_hdr(skb);
>> -	__be16 newlen;
>> -	int ret;
>> -
>> -	ret = tcpmss_mangle_packet(skb, par,
>> -				   PF_INET,
>> -				   iph->ihl * 4,
>> -				   sizeof(*iph) + sizeof(struct tcphdr));
>> -	if (ret < 0)
>> -		return NF_DROP;
>> -	if (ret > 0) {
>> -		iph = ip_hdr(skb);
>> -		newlen = htons(ntohs(iph->tot_len) + ret);
>> -		csum_replace2(&iph->check, iph->tot_len, newlen);
>> -		iph->tot_len = newlen;
>> -	}
>> -	return XT_CONTINUE;
>> -}
>> -
>> -#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
>> -static unsigned int
>> -tcpmss_tg6(struct sk_buff *skb, const struct xt_action_param *par)
>> -{
>> -	struct ipv6hdr *ipv6h = ipv6_hdr(skb);
>> -	u8 nexthdr;
>> -	__be16 frag_off, oldlen, newlen;
>> -	int tcphoff;
>> -	int ret;
>> -
>> -	nexthdr = ipv6h->nexthdr;
>> -	tcphoff = ipv6_skip_exthdr(skb, sizeof(*ipv6h), &nexthdr, &frag_off);
>> -	if (tcphoff < 0)
>> -		return NF_DROP;
>> -	ret = tcpmss_mangle_packet(skb, par,
>> -				   PF_INET6,
>> -				   tcphoff,
>> -				   sizeof(*ipv6h) + sizeof(struct tcphdr));
>> -	if (ret < 0)
>> -		return NF_DROP;
>> -	if (ret > 0) {
>> -		ipv6h = ipv6_hdr(skb);
>> -		oldlen = ipv6h->payload_len;
>> -		newlen = htons(ntohs(oldlen) + ret);
>> -		if (skb->ip_summed == CHECKSUM_COMPLETE)
>> -			skb->csum = csum_add(csum_sub(skb->csum, oldlen),
>> -					     newlen);
>> -		ipv6h->payload_len = newlen;
>> -	}
>> -	return XT_CONTINUE;
>> -}
>> -#endif
>> -
>> -/* Must specify -p tcp --syn */
>> -static inline bool find_syn_match(const struct xt_entry_match *m)
>> -{
>> -	const struct xt_tcp *tcpinfo = (const struct xt_tcp *)m->data;
>> -
>> -	if (strcmp(m->u.kernel.match->name, "tcp") == 0 &&
>> -	    tcpinfo->flg_cmp & TCPHDR_SYN &&
>> -	    !(tcpinfo->invflags & XT_TCP_INV_FLAGS))
>> -		return true;
>> -
>> +dropit:
>> +	par->hotdrop = true;
>> 	return false;
>> }
>> 
>> -static int tcpmss_tg4_check(const struct xt_tgchk_param *par)
>> -{
>> -	const struct xt_tcpmss_info *info = par->targinfo;
>> -	const struct ipt_entry *e = par->entryinfo;
>> -	const struct xt_entry_match *ematch;
>> -
>> -	if (info->mss == XT_TCPMSS_CLAMP_PMTU &&
>> -	    (par->hook_mask & ~((1 << NF_INET_FORWARD) |
>> -			   (1 << NF_INET_LOCAL_OUT) |
>> -			   (1 << NF_INET_POST_ROUTING))) != 0) {
>> -		pr_info("path-MTU clamping only supported in "
>> -			"FORWARD, OUTPUT and POSTROUTING hooks\n");
>> -		return -EINVAL;
>> -	}
>> -	if (par->nft_compat)
>> -		return 0;
>> -
>> -	xt_ematch_foreach(ematch, e)
>> -		if (find_syn_match(ematch))
>> -			return 0;
>> -	pr_info("Only works on TCP SYN packets\n");
>> -	return -EINVAL;
>> -}
>> -
>> -#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
>> -static int tcpmss_tg6_check(const struct xt_tgchk_param *par)
>> -{
>> -	const struct xt_tcpmss_info *info = par->targinfo;
>> -	const struct ip6t_entry *e = par->entryinfo;
>> -	const struct xt_entry_match *ematch;
>> -
>> -	if (info->mss == XT_TCPMSS_CLAMP_PMTU &&
>> -	    (par->hook_mask & ~((1 << NF_INET_FORWARD) |
>> -			   (1 << NF_INET_LOCAL_OUT) |
>> -			   (1 << NF_INET_POST_ROUTING))) != 0) {
>> -		pr_info("path-MTU clamping only supported in "
>> -			"FORWARD, OUTPUT and POSTROUTING hooks\n");
>> -		return -EINVAL;
>> -	}
>> -	if (par->nft_compat)
>> -		return 0;
>> -
>> -	xt_ematch_foreach(ematch, e)
>> -		if (find_syn_match(ematch))
>> -			return 0;
>> -	pr_info("Only works on TCP SYN packets\n");
>> -	return -EINVAL;
>> -}
>> -#endif
>> -
>> -static struct xt_target tcpmss_tg_reg[] __read_mostly = {
>> +static struct xt_match tcpmss_mt_reg[] __read_mostly = {
>> 	{
>> +		.name		= "tcpmss",
>> 		.family		= NFPROTO_IPV4,
>> -		.name		= "TCPMSS",
>> -		.checkentry	= tcpmss_tg4_check,
>> -		.target		= tcpmss_tg4,
>> -		.targetsize	= sizeof(struct xt_tcpmss_info),
>> +		.match		= tcpmss_mt,
>> +		.matchsize	= sizeof(struct xt_tcpmss_match_info),
>> 		.proto		= IPPROTO_TCP,
>> 		.me		= THIS_MODULE,
>> 	},
>> -#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
>> 	{
>> +		.name		= "tcpmss",
>> 		.family		= NFPROTO_IPV6,
>> -		.name		= "TCPMSS",
>> -		.checkentry	= tcpmss_tg6_check,
>> -		.target		= tcpmss_tg6,
>> -		.targetsize	= sizeof(struct xt_tcpmss_info),
>> +		.match		= tcpmss_mt,
>> +		.matchsize	= sizeof(struct xt_tcpmss_match_info),
>> 		.proto		= IPPROTO_TCP,
>> 		.me		= THIS_MODULE,
>> 	},
>> -#endif
>> };
>> 
>> -static int __init tcpmss_tg_init(void)
>> +static int __init tcpmss_mt_init(void)
>> {
>> -	return xt_register_targets(tcpmss_tg_reg, ARRAY_SIZE(tcpmss_tg_reg));
>> +	return xt_register_matches(tcpmss_mt_reg, ARRAY_SIZE(tcpmss_mt_reg));
>> }
>> 
>> -static void __exit tcpmss_tg_exit(void)
>> +static void __exit tcpmss_mt_exit(void)
>> {
>> -	xt_unregister_targets(tcpmss_tg_reg, ARRAY_SIZE(tcpmss_tg_reg));
>> +	xt_unregister_matches(tcpmss_mt_reg, ARRAY_SIZE(tcpmss_mt_reg));
>> }
>> 
>> -module_init(tcpmss_tg_init);
>> -module_exit(tcpmss_tg_exit);
>> +module_init(tcpmss_mt_init);
>> +module_exit(tcpmss_mt_exit);
>> diff --git a/net/netfilter/xt_dscp.c b/net/netfilter/xt_dscp.c
>> index 236ac80..3f83d38 100644
>> --- a/net/netfilter/xt_dscp.c
>> +++ b/net/netfilter/xt_dscp.c
>> @@ -1,11 +1,14 @@
>> -/* IP tables module for matching the value of the IPv4/IPv6 DSCP field
>> +/* x_tables module for setting the IPv4/IPv6 DSCP field, Version 1.8
>> *
>> * (C) 2002 by Harald Welte <laforge@...filter.org>
>> + * based on ipt_FTOS.c (C) 2000 by Matthew G. Marsh <mgm@...tronix.com>
>> *
>> * This program is free software; you can redistribute it and/or modify
>> * it under the terms of the GNU General Public License version 2 as
>> * published by the Free Software Foundation.
>> - */
>> + *
>> + * See RFC2474 for a description of the DSCP field within the IP Header.
>> +*/
>> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> #include <linux/module.h>
>> #include <linux/skbuff.h>
>> @@ -14,102 +17,150 @@
>> #include <net/dsfield.h>
>> 
>> #include <linux/netfilter/x_tables.h>
>> -#include <linux/netfilter/xt_dscp.h>
>> +#include <linux/netfilter/xt_DSCP.h>
>> 
>> MODULE_AUTHOR("Harald Welte <laforge@...filter.org>");
>> -MODULE_DESCRIPTION("Xtables: DSCP/TOS field match");
>> +MODULE_DESCRIPTION("Xtables: DSCP/TOS field modification");
>> MODULE_LICENSE("GPL");
>> -MODULE_ALIAS("ipt_dscp");
>> -MODULE_ALIAS("ip6t_dscp");
>> -MODULE_ALIAS("ipt_tos");
>> -MODULE_ALIAS("ip6t_tos");
>> +MODULE_ALIAS("ipt_DSCP");
>> +MODULE_ALIAS("ip6t_DSCP");
>> +MODULE_ALIAS("ipt_TOS");
>> +MODULE_ALIAS("ip6t_TOS");
>> 
>> -static bool
>> -dscp_mt(const struct sk_buff *skb, struct xt_action_param *par)
>> +static unsigned int
>> +dscp_tg(struct sk_buff *skb, const struct xt_action_param *par)
>> {
>> -	const struct xt_dscp_info *info = par->matchinfo;
>> +	const struct xt_DSCP_info *dinfo = par->targinfo;
>> 	u_int8_t dscp = ipv4_get_dsfield(ip_hdr(skb)) >> XT_DSCP_SHIFT;
>> 
>> -	return (dscp == info->dscp) ^ !!info->invert;
>> +	if (dscp != dinfo->dscp) {
>> +		if (!skb_make_writable(skb, sizeof(struct iphdr)))
>> +			return NF_DROP;
>> +
>> +		ipv4_change_dsfield(ip_hdr(skb),
>> +				    (__force __u8)(~XT_DSCP_MASK),
>> +				    dinfo->dscp << XT_DSCP_SHIFT);
>> +
>> +	}
>> +	return XT_CONTINUE;
>> }
>> 
>> -static bool
>> -dscp_mt6(const struct sk_buff *skb, struct xt_action_param *par)
>> +static unsigned int
>> +dscp_tg6(struct sk_buff *skb, const struct xt_action_param *par)
>> {
>> -	const struct xt_dscp_info *info = par->matchinfo;
>> +	const struct xt_DSCP_info *dinfo = par->targinfo;
>> 	u_int8_t dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> XT_DSCP_SHIFT;
>> 
>> -	return (dscp == info->dscp) ^ !!info->invert;
>> +	if (dscp != dinfo->dscp) {
>> +		if (!skb_make_writable(skb, sizeof(struct ipv6hdr)))
>> +			return NF_DROP;
>> +
>> +		ipv6_change_dsfield(ipv6_hdr(skb),
>> +				    (__force __u8)(~XT_DSCP_MASK),
>> +				    dinfo->dscp << XT_DSCP_SHIFT);
>> +	}
>> +	return XT_CONTINUE;
>> }
>> 
>> -static int dscp_mt_check(const struct xt_mtchk_param *par)
>> +static int dscp_tg_check(const struct xt_tgchk_param *par)
>> {
>> -	const struct xt_dscp_info *info = par->matchinfo;
>> +	const struct xt_DSCP_info *info = par->targinfo;
>> 
>> 	if (info->dscp > XT_DSCP_MAX) {
>> 		pr_info("dscp %x out of range\n", info->dscp);
>> 		return -EDOM;
>> 	}
>> -
>> 	return 0;
>> }
>> 
>> -static bool tos_mt(const struct sk_buff *skb, struct xt_action_param *par)
>> +static unsigned int
>> +tos_tg(struct sk_buff *skb, const struct xt_action_param *par)
>> +{
>> +	const struct xt_tos_target_info *info = par->targinfo;
>> +	struct iphdr *iph = ip_hdr(skb);
>> +	u_int8_t orig, nv;
>> +
>> +	orig = ipv4_get_dsfield(iph);
>> +	nv   = (orig & ~info->tos_mask) ^ info->tos_value;
>> +
>> +	if (orig != nv) {
>> +		if (!skb_make_writable(skb, sizeof(struct iphdr)))
>> +			return NF_DROP;
>> +		iph = ip_hdr(skb);
>> +		ipv4_change_dsfield(iph, 0, nv);
>> +	}
>> +
>> +	return XT_CONTINUE;
>> +}
>> +
>> +static unsigned int
>> +tos_tg6(struct sk_buff *skb, const struct xt_action_param *par)
>> {
>> -	const struct xt_tos_match_info *info = par->matchinfo;
>> -
>> -	if (xt_family(par) == NFPROTO_IPV4)
>> -		return ((ip_hdr(skb)->tos & info->tos_mask) ==
>> -		       info->tos_value) ^ !!info->invert;
>> -	else
>> -		return ((ipv6_get_dsfield(ipv6_hdr(skb)) & info->tos_mask) ==
>> -		       info->tos_value) ^ !!info->invert;
>> +	const struct xt_tos_target_info *info = par->targinfo;
>> +	struct ipv6hdr *iph = ipv6_hdr(skb);
>> +	u_int8_t orig, nv;
>> +
>> +	orig = ipv6_get_dsfield(iph);
>> +	nv   = (orig & ~info->tos_mask) ^ info->tos_value;
>> +
>> +	if (orig != nv) {
>> +		if (!skb_make_writable(skb, sizeof(struct iphdr)))
>> +			return NF_DROP;
>> +		iph = ipv6_hdr(skb);
>> +		ipv6_change_dsfield(iph, 0, nv);
>> +	}
>> +
>> +	return XT_CONTINUE;
>> }
>> 
>> -static struct xt_match dscp_mt_reg[] __read_mostly = {
>> +static struct xt_target dscp_tg_reg[] __read_mostly = {
>> 	{
>> -		.name		= "dscp",
>> +		.name		= "DSCP",
>> 		.family		= NFPROTO_IPV4,
>> -		.checkentry	= dscp_mt_check,
>> -		.match		= dscp_mt,
>> -		.matchsize	= sizeof(struct xt_dscp_info),
>> +		.checkentry	= dscp_tg_check,
>> +		.target		= dscp_tg,
>> +		.targetsize	= sizeof(struct xt_DSCP_info),
>> +		.table		= "mangle",
>> 		.me		= THIS_MODULE,
>> 	},
>> 	{
>> -		.name		= "dscp",
>> +		.name		= "DSCP",
>> 		.family		= NFPROTO_IPV6,
>> -		.checkentry	= dscp_mt_check,
>> -		.match		= dscp_mt6,
>> -		.matchsize	= sizeof(struct xt_dscp_info),
>> +		.checkentry	= dscp_tg_check,
>> +		.target		= dscp_tg6,
>> +		.targetsize	= sizeof(struct xt_DSCP_info),
>> +		.table		= "mangle",
>> 		.me		= THIS_MODULE,
>> 	},
>> 	{
>> -		.name		= "tos",
>> +		.name		= "TOS",
>> 		.revision	= 1,
>> 		.family		= NFPROTO_IPV4,
>> -		.match		= tos_mt,
>> -		.matchsize	= sizeof(struct xt_tos_match_info),
>> +		.table		= "mangle",
>> +		.target		= tos_tg,
>> +		.targetsize	= sizeof(struct xt_tos_target_info),
>> 		.me		= THIS_MODULE,
>> 	},
>> 	{
>> -		.name		= "tos",
>> +		.name		= "TOS",
>> 		.revision	= 1,
>> 		.family		= NFPROTO_IPV6,
>> -		.match		= tos_mt,
>> -		.matchsize	= sizeof(struct xt_tos_match_info),
>> +		.table		= "mangle",
>> +		.target		= tos_tg6,
>> +		.targetsize	= sizeof(struct xt_tos_target_info),
>> 		.me		= THIS_MODULE,
>> 	},
>> };
>> 
>> -static int __init dscp_mt_init(void)
>> +static int __init dscp_tg_init(void)
>> {
>> -	return xt_register_matches(dscp_mt_reg, ARRAY_SIZE(dscp_mt_reg));
>> +	return xt_register_targets(dscp_tg_reg, ARRAY_SIZE(dscp_tg_reg));
>> }
>> 
>> -static void __exit dscp_mt_exit(void)
>> +static void __exit dscp_tg_exit(void)
>> {
>> -	xt_unregister_matches(dscp_mt_reg, ARRAY_SIZE(dscp_mt_reg));
>> +	xt_unregister_targets(dscp_tg_reg, ARRAY_SIZE(dscp_tg_reg));
>> }
>> 
>> -module_init(dscp_mt_init);
>> -module_exit(dscp_mt_exit);
>> +module_init(dscp_tg_init);
>> +module_exit(dscp_tg_exit);
>> diff --git a/net/netfilter/xt_hl.c b/net/netfilter/xt_hl.c
>> index 0039511..1535e87 100644
>> --- a/net/netfilter/xt_hl.c
>> +++ b/net/netfilter/xt_hl.c
>> @@ -1,96 +1,169 @@
>> /*
>> - * IP tables module for matching the value of the TTL
>> - * (C) 2000,2001 by Harald Welte <laforge@...filter.org>
>> + * TTL modification target for IP tables
>> + * (C) 2000,2005 by Harald Welte <laforge@...filter.org>
>> *
>> - * Hop Limit matching module
>> - * (C) 2001-2002 Maciej Soltysiak <solt@....toxicfilms.tv>
>> + * Hop Limit modification target for ip6tables
>> + * Maciej Soltysiak <solt@....toxicfilms.tv>
>> *
>> * This program is free software; you can redistribute it and/or modify
>> * it under the terms of the GNU General Public License version 2 as
>> * published by the Free Software Foundation.
>> */
>> -
>> -#include <linux/ip.h>
>> -#include <linux/ipv6.h>
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> #include <linux/module.h>
>> #include <linux/skbuff.h>
>> +#include <linux/ip.h>
>> +#include <linux/ipv6.h>
>> +#include <net/checksum.h>
>> 
>> #include <linux/netfilter/x_tables.h>
>> -#include <linux/netfilter_ipv4/ipt_ttl.h>
>> -#include <linux/netfilter_ipv6/ip6t_hl.h>
>> +#include <linux/netfilter_ipv4/ipt_TTL.h>
>> +#include <linux/netfilter_ipv6/ip6t_HL.h>
>> 
>> +MODULE_AUTHOR("Harald Welte <laforge@...filter.org>");
>> MODULE_AUTHOR("Maciej Soltysiak <solt@....toxicfilms.tv>");
>> -MODULE_DESCRIPTION("Xtables: Hoplimit/TTL field match");
>> +MODULE_DESCRIPTION("Xtables: Hoplimit/TTL Limit field modification target");
>> MODULE_LICENSE("GPL");
>> -MODULE_ALIAS("ipt_ttl");
>> -MODULE_ALIAS("ip6t_hl");
>> 
>> -static bool ttl_mt(const struct sk_buff *skb, struct xt_action_param *par)
>> +static unsigned int
>> +ttl_tg(struct sk_buff *skb, const struct xt_action_param *par)
>> {
>> -	const struct ipt_ttl_info *info = par->matchinfo;
>> -	const u8 ttl = ip_hdr(skb)->ttl;
>> +	struct iphdr *iph;
>> +	const struct ipt_TTL_info *info = par->targinfo;
>> +	int new_ttl;
>> +
>> +	if (!skb_make_writable(skb, skb->len))
>> +		return NF_DROP;
>> +
>> +	iph = ip_hdr(skb);
>> 
>> 	switch (info->mode) {
>> -	case IPT_TTL_EQ:
>> -		return ttl == info->ttl;
>> -	case IPT_TTL_NE:
>> -		return ttl != info->ttl;
>> -	case IPT_TTL_LT:
>> -		return ttl < info->ttl;
>> -	case IPT_TTL_GT:
>> -		return ttl > info->ttl;
>> +	case IPT_TTL_SET:
>> +		new_ttl = info->ttl;
>> +		break;
>> +	case IPT_TTL_INC:
>> +		new_ttl = iph->ttl + info->ttl;
>> +		if (new_ttl > 255)
>> +			new_ttl = 255;
>> +		break;
>> +	case IPT_TTL_DEC:
>> +		new_ttl = iph->ttl - info->ttl;
>> +		if (new_ttl < 0)
>> +			new_ttl = 0;
>> +		break;
>> +	default:
>> +		new_ttl = iph->ttl;
>> +		break;
>> +	}
>> +
>> +	if (new_ttl != iph->ttl) {
>> +		csum_replace2(&iph->check, htons(iph->ttl << 8),
>> +					   htons(new_ttl << 8));
>> +		iph->ttl = new_ttl;
>> 	}
>> 
>> -	return false;
>> +	return XT_CONTINUE;
>> }
>> 
>> -static bool hl_mt6(const struct sk_buff *skb, struct xt_action_param *par)
>> +static unsigned int
>> +hl_tg6(struct sk_buff *skb, const struct xt_action_param *par)
>> {
>> -	const struct ip6t_hl_info *info = par->matchinfo;
>> -	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
>> +	struct ipv6hdr *ip6h;
>> +	const struct ip6t_HL_info *info = par->targinfo;
>> +	int new_hl;
>> +
>> +	if (!skb_make_writable(skb, skb->len))
>> +		return NF_DROP;
>> +
>> +	ip6h = ipv6_hdr(skb);
>> 
>> 	switch (info->mode) {
>> -	case IP6T_HL_EQ:
>> -		return ip6h->hop_limit == info->hop_limit;
>> -	case IP6T_HL_NE:
>> -		return ip6h->hop_limit != info->hop_limit;
>> -	case IP6T_HL_LT:
>> -		return ip6h->hop_limit < info->hop_limit;
>> -	case IP6T_HL_GT:
>> -		return ip6h->hop_limit > info->hop_limit;
>> +	case IP6T_HL_SET:
>> +		new_hl = info->hop_limit;
>> +		break;
>> +	case IP6T_HL_INC:
>> +		new_hl = ip6h->hop_limit + info->hop_limit;
>> +		if (new_hl > 255)
>> +			new_hl = 255;
>> +		break;
>> +	case IP6T_HL_DEC:
>> +		new_hl = ip6h->hop_limit - info->hop_limit;
>> +		if (new_hl < 0)
>> +			new_hl = 0;
>> +		break;
>> +	default:
>> +		new_hl = ip6h->hop_limit;
>> +		break;
>> 	}
>> 
>> -	return false;
>> +	ip6h->hop_limit = new_hl;
>> +
>> +	return XT_CONTINUE;
>> +}
>> +
>> +static int ttl_tg_check(const struct xt_tgchk_param *par)
>> +{
>> +	const struct ipt_TTL_info *info = par->targinfo;
>> +
>> +	if (info->mode > IPT_TTL_MAXMODE) {
>> +		pr_info("TTL: invalid or unknown mode %u\n", info->mode);
>> +		return -EINVAL;
>> +	}
>> +	if (info->mode != IPT_TTL_SET && info->ttl == 0)
>> +		return -EINVAL;
>> +	return 0;
>> +}
>> +
>> +static int hl_tg6_check(const struct xt_tgchk_param *par)
>> +{
>> +	const struct ip6t_HL_info *info = par->targinfo;
>> +
>> +	if (info->mode > IP6T_HL_MAXMODE) {
>> +		pr_info("invalid or unknown mode %u\n", info->mode);
>> +		return -EINVAL;
>> +	}
>> +	if (info->mode != IP6T_HL_SET && info->hop_limit == 0) {
>> +		pr_info("increment/decrement does not "
>> +			"make sense with value 0\n");
>> +		return -EINVAL;
>> +	}
>> +	return 0;
>> }
>> 
>> -static struct xt_match hl_mt_reg[] __read_mostly = {
>> +static struct xt_target hl_tg_reg[] __read_mostly = {
>> 	{
>> -		.name       = "ttl",
>> +		.name       = "TTL",
>> 		.revision   = 0,
>> 		.family     = NFPROTO_IPV4,
>> -		.match      = ttl_mt,
>> -		.matchsize  = sizeof(struct ipt_ttl_info),
>> +		.target     = ttl_tg,
>> +		.targetsize = sizeof(struct ipt_TTL_info),
>> +		.table      = "mangle",
>> +		.checkentry = ttl_tg_check,
>> 		.me         = THIS_MODULE,
>> 	},
>> 	{
>> -		.name       = "hl",
>> +		.name       = "HL",
>> 		.revision   = 0,
>> 		.family     = NFPROTO_IPV6,
>> -		.match      = hl_mt6,
>> -		.matchsize  = sizeof(struct ip6t_hl_info),
>> +		.target     = hl_tg6,
>> +		.targetsize = sizeof(struct ip6t_HL_info),
>> +		.table      = "mangle",
>> +		.checkentry = hl_tg6_check,
>> 		.me         = THIS_MODULE,
>> 	},
>> };
>> 
>> -static int __init hl_mt_init(void)
>> +static int __init hl_tg_init(void)
>> {
>> -	return xt_register_matches(hl_mt_reg, ARRAY_SIZE(hl_mt_reg));
>> +	return xt_register_targets(hl_tg_reg, ARRAY_SIZE(hl_tg_reg));
>> }
>> 
>> -static void __exit hl_mt_exit(void)
>> +static void __exit hl_tg_exit(void)
>> {
>> -	xt_unregister_matches(hl_mt_reg, ARRAY_SIZE(hl_mt_reg));
>> +	xt_unregister_targets(hl_tg_reg, ARRAY_SIZE(hl_tg_reg));
>> }
>> 
>> -module_init(hl_mt_init);
>> -module_exit(hl_mt_exit);
>> +module_init(hl_tg_init);
>> +module_exit(hl_tg_exit);
>> +MODULE_ALIAS("ipt_TTL");
>> +MODULE_ALIAS("ip6t_HL");
>> 
>> 
>> 
>> 
> 


Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ