lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 26 Jun 2012 09:13:35 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	"Theodore Ts'o" <tytso@....edu>
CC:	Fredrick <fjohnber@...o.com>, linux-ext4@...r.kernel.org,
	Andreas Dilger <adilger@...ger.ca>, wenqing.lz@...bao.com
Subject: Re: ext4_fallocate

On 06/25/2012 03:17 PM, Theodore Ts'o wrote:
> On Mon, Jun 25, 2012 at 04:51:59PM +0800, Zheng Liu wrote:
>> Actually I want to send a url for you from linux mailing list archive but
>> I cannot find it.  After applying this patch, you can call ioctl(2) to
>> enable expose_stale_data flag, and then when you call fallocate(2), ext4
>> create initialized extents for you.  This patch cannot be merged into
>> upstream kernel because it brings a huge security hole.
> This is what we're using internally inside Google.... this allows the
> security exposure to be restricted to those programs running with a
> specific group id (which is better than giving programs access to
> CAP_SYS_RAWIO).  We also require the use of a specific fallocate flag
> so that programs have to explicitly ask for this feature.
>
> Also note that I restrict the combination of NO_HIDE_STALE &&
> KEEP_SIZE since it causes e2fsck to complain --- and if you're trying
> to avoid fs metadata I/O, you want to avoid the extra i_size update
> anyway, so it's not worth trying to make this work w/o causing e2fsck
> complaints.
>
> This patch is versus the v3.3 kernel (as it happens, I was just in the
> middle of rebasing this patch from 2.6.34 :-)
>
> 					- Ted

Hi Ted,

Has anyone made progress digging into the performance impact of running without 
this patch? We should definitely see if there is some low hanging fruit there, 
especially given that XFS does not seem to suffer such a huge hit.

I think that we need to get a good reproducer for the workload that causes the 
pain and start to dig into this.

Opening this security exposure is still something that is clearly a hack and 
best avoided if we can fix the root cause :)

Ric

>
> P.S.  It just occurred to me that there are some patches being
> discussed that assign new fallocate flags for volatile data handling.
> So it would probably be a good idea to move the fallocate flag
> codepoint assignment up out of the way to avoid future conflicts.
>
> commit 5f12f1bc2b0fb0866d52763a611b022780780f05
> Author: Theodore Ts'o <tytso@...gle.com>
> Date:   Fri Jun 22 17:19:53 2012 -0400
>
>      ext4: add an fallocate flag to mark newly allocated extents initialized
>      
>      This commit adds a new flag to ext4's fallocate that allows new,
>      uninitialized extents to be marked as initialized. This flag,
>      FALLOC_FL_NO_HIDE_STALE requires that the nohide_stale_gid=<gid> mount
>      option be used when the file system is mounted, and that the user is
>      in the group <gid>.
>      
>      The benefit is to a program fallocates a larger space, but then writes
>      to that space in small increments.  This option prevents ext4 from
>      having to split the unallocated extent and merge the newly initialized
>      extent with the extent to its left.  Even though this usually happens
>      in-memory, this option is useful for tight memory situations and for
>      ext4 on flash.  Note: This allows an application in ths hohide_stale
>      group to see stale data on the filesystem.
>      
>      Tested: Updated xfstests g002 to test a case where
>        fallocate:no-hide-stale is not allowed.  The existing tests now pass
>        because I added a remount with a group that user root is in.
>      Rebase-Tested-v3.3: same
>      
>      Effort: fs/nohide-stale
>      Origin-2.6.34-SHA1: c3099bf61be1baf94bc91c481995bb0d77f05786
>      Origin-2.6.34-SHA1: 004dd33b9ebc5d860781c3435526658cc8aa8ccb
>      Change-Id: I0d2a7f2a4cf34443269acbcedb7b7074e0055e69
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index aaaece6..ac7aa42 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1240,6 +1240,9 @@ struct ext4_sb_info {
>   	unsigned long s_mb_last_group;
>   	unsigned long s_mb_last_start;
>   
> +	/* gid that's allowed to see stale data via falloc flag. */
> +	gid_t no_hide_stale_gid;
> +
>   	/* stats for buddy allocator */
>   	atomic_t s_bal_reqs;	/* number of reqs with len > 1 */
>   	atomic_t s_bal_success;	/* we found long enough chunks */
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index cb99346..cc57c85 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4375,6 +4375,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   	int retries = 0;
>   	int flags;
>   	struct ext4_map_blocks map;
> +	struct ext4_sb_info *sbi;
>   	unsigned int credits, blkbits = inode->i_blkbits;
>   
>   	/*
> @@ -4385,12 +4386,28 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   		return -EOPNOTSUPP;
>   
>   	/* Return error if mode is not supported */
> -	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> +		     FALLOC_FL_NO_HIDE_STALE))
> +		return -EOPNOTSUPP;
> +
> +	/* The combination of NO_HIDE_STALE and KEEP_SIZE is not supported */
> +	if ((mode & FALLOC_FL_NO_HIDE_STALE) &&
> +	    (mode & FALLOC_FL_KEEP_SIZE))
>   		return -EOPNOTSUPP;
>   
>   	if (mode & FALLOC_FL_PUNCH_HOLE)
>   		return ext4_punch_hole(file, offset, len);
>   
> +	sbi = EXT4_SB(inode->i_sb);
> +	/* Must have RAWIO to see stale data. */
> +	if ((mode & FALLOC_FL_NO_HIDE_STALE) &&
> +	    !in_egroup_p(sbi->no_hide_stale_gid))
> +		return -EACCES;
> +
> +	/* preallocation to directories is currently not supported */
> +	if (S_ISDIR(inode->i_mode))
> +		return -ENODEV;
> +
>   	trace_ext4_fallocate_enter(inode, offset, len, mode);
>   	map.m_lblk = offset >> blkbits;
>   	/*
> @@ -4429,6 +4446,8 @@ retry:
>   			ret = PTR_ERR(handle);
>   			break;
>   		}
> +		if (mode & FALLOC_FL_NO_HIDE_STALE)
> +			flags &= ~EXT4_GET_BLOCKS_UNINIT_EXT;
>   		ret = ext4_map_blocks(handle, inode, &map, flags);
>   		if (ret <= 0) {
>   #ifdef EXT4FS_DEBUG
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 5b443a8..d976ec1 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1175,6 +1175,8 @@ static int ext4_show_options(struct seq_file *seq, struct dentry *root)
>   	if (test_opt2(sb, BIG_EXT))
>   		seq_puts(seq, ",big_extent");
>   #endif
> +	if (sbi->no_hide_stale_gid != -1)
> +		seq_printf(seq, ",nohide_stale_gid=%u", sbi->no_hide_stale_gid);
>   
>   	ext4_show_quota_options(seq, sb);
>   
> @@ -1353,6 +1355,7 @@ enum {
>   #ifdef CONFIG_EXT4_BIG_EXTENT
>   	Opt_big_extent, Opt_nobig_extent,
>   #endif
> +	Opt_nohide_stale_gid,
>   };
>   
>   static const match_table_t tokens = {
> @@ -1432,6 +1435,7 @@ static const match_table_t tokens = {
>   	{Opt_big_extent, "big_extent"},
>   	{Opt_nobig_extent, "nobig_extent"},
>   #endif
> +	{Opt_nohide_stale_gid, "nohide_stale_gid=%u"},
>   	{Opt_err, NULL},
>   };
>   
> @@ -1931,6 +1935,12 @@ set_qf_format:
>   				return 0;
>   			sbi->s_li_wait_mult = option;
>   			break;
> +		case Opt_nohide_stale_gid:
> +			if (match_int(&args[0], &option))
> +				return 0;
> +			/* -1 for disabled, otherwise it's valid. */
> +			sbi->no_hide_stale_gid = option;
> +			break;
>   		case Opt_noinit_itable:
>   			clear_opt(sb, INIT_INODE_TABLE);
>   			break;
> @@ -3274,6 +3284,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>   #ifdef CONFIG_EXT4_BIG_EXTENT
>   	sbi->s_min_big_ext_size = EXT4_DEFAULT_MIN_BIG_EXT_SIZE;
>   #endif
> +	/* Default to having no-hide-stale disabled. */
> +	sbi->no_hide_stale_gid = -1;
>   
>   	if ((def_mount_opts & EXT4_DEFM_NOBARRIER) == 0)
>   		set_opt(sb, BARRIER);
> diff --git a/fs/open.c b/fs/open.c
> index 201431a..4edc0cd 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -224,7 +224,9 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   		return -EINVAL;
>   
>   	/* Return error if mode is not supported */
> -	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> +	if (mode & ~(FALLOC_FL_KEEP_SIZE |
> +		     FALLOC_FL_PUNCH_HOLE |
> +		     FALLOC_FL_NO_HIDE_STALE))
>   		return -EOPNOTSUPP;
>   
>   	/* Punch hole must have keep size set */
> diff --git a/include/linux/falloc.h b/include/linux/falloc.h
> index 73e0b62..a2489ac 100644
> --- a/include/linux/falloc.h
> +++ b/include/linux/falloc.h
> @@ -3,6 +3,7 @@
>   
>   #define FALLOC_FL_KEEP_SIZE	0x01 /* default is extend size */
>   #define FALLOC_FL_PUNCH_HOLE	0x02 /* de-allocates range */
> +#define FALLOC_FL_NO_HIDE_STALE	0x04 /* default is hide stale data */
>   
>   #ifdef __KERNEL__
>   
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ