lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <100D68C7BA14664A8938383216E40DE040853440@FMSMSX114.amr.corp.intel.com>
Date:	Fri, 16 Jan 2015 21:16:03 +0000
From:	"Wilcox, Matthew R" <matthew.r.wilcox@...el.com>
To:	Jan Kara <jack@...e.cz>,
	"ross.zwisler@...ux.intel.com" <ross.zwisler@...ux.intel.com>
CC:	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"Dilger, Andreas" <andreas.dilger@...el.com>,
	"axboe@...nel.dk" <axboe@...nel.dk>,
	"boaz@...xistor.com" <boaz@...xistor.com>,
	"david@...morbit.com" <david@...morbit.com>,
	"hch@....de" <hch@....de>,
	"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
	"mathieu.desnoyers@...icios.com" <mathieu.desnoyers@...icios.com>,
	"rdunlap@...radead.org" <rdunlap@...radead.org>,
	"tytso@....edu" <tytso@....edu>,
	"mm-commits@...r.kernel.org" <mm-commits@...r.kernel.org>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	Matthew Wilcox <willy@...ux.intel.com>
Subject: RE: + ext4-add-dax-functionality.patch added to -mm tree

-----Original Message-----
From: Jan Kara [mailto:jack@...e.cz] 
Sent: Thursday, January 15, 2015 4:41 AM
To: ross.zwisler@...ux.intel.com
Cc: akpm@...ux-foundation.org; Dilger, Andreas; axboe@...nel.dk; boaz@...xistor.com; david@...morbit.com; hch@....de; jack@...e.cz; kirill.shutemov@...ux.intel.com; mathieu.desnoyers@...icios.com; Wilcox, Matthew R; rdunlap@...radead.org; tytso@....edu; mm-commits@...r.kernel.org; linux-ext4@...r.kernel.org
Subject: Re: + ext4-add-dax-functionality.patch added to -mm tree

On Mon 12-01-15 15:11:17, Andrew Morton wrote:
> +#ifdef CONFIG_FS_DAX
> +static int ext4_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> +	return dax_fault(vma, vmf, ext4_get_block);
> +					/* Is this the right get_block? */
  You can remove the comment. It is the right get_block function.

Are you sure it shouldn't be ext4_get_block_write, or _write_nolock?  According to the comments, ext4_get_block() doesn't allocate uninitialized extents, which we do want it to do.

> diff -puN fs/ext4/inode.c~ext4-add-dax-functionality fs/ext4/inode.c
> --- a/fs/ext4/inode.c~ext4-add-dax-functionality
> +++ a/fs/ext4/inode.c
> @@ -657,6 +657,18 @@ has_zeroout:
>  	return retval;
>  }
>  
> +static void ext4_end_io_unwritten(struct buffer_head *bh, int uptodate)
> +{
> +	struct inode *inode = bh->b_assoc_map->host;
> +	/* XXX: breaks on 32-bit > 16GB. Is that even supported? */
  That should be 16 TB if I'm doing the math right - 32-bit block number *
block size (4k) = 16 TB. And that's the max limit of ext4 (as logical file
offset in blocks has to fit in 32-bits for ext4). So I think you can just
remove the comment. But also see comment below.

Blargh, yes, you're right.

> @@ -694,6 +706,11 @@ static int _ext4_get_block(struct inode
>  
>  		map_bh(bh, inode->i_sb, map.m_pblk);
>  		bh->b_state = (bh->b_state & ~EXT4_MAP_FLAGS) | map.m_flags;
> +		if (IS_DAX(inode) && buffer_unwritten(bh) && !io_end) {
> +			bh->b_assoc_map = inode->i_mapping;
> +			bh->b_private = (void *)(unsigned long)iblock;
> +			bh->b_end_io = ext4_end_io_unwritten;
> +		}
  So why is this needed? It would deserve a comment. It confuses me in
particular because:
1) This is a often a phony bh used just as a container for passed data and
   b_end_io is just ignored.
2) Even if it was real bh attached to a page, for DAX we don't do any
   writeback and thus ->b_end_io will never get called?
3) And if it does get called, you certainly cannot call
   ext4_convert_unwritten_extents() from softirq context where ->b_end_io
   gets called.

This got added to fix a problem that Dave Chinner pointed out.  We need the allocated extent to either be zeroed (as ext2 does), or marked as unwritten (ext4, XFS) so that a racing read/page fault doesn't return uninitialized data.  If it's marked as unwritten, we need to convert it to a written extent after we've initialised the contents.  We use the b_end_io() callback to do this, and it's called from the DAX code, not in softirq context.

>  		if (io_end && io_end->flag & EXT4_IO_END_UNWRITTEN)
>  			set_buffer_defer_completion(bh);
>  		bh->b_size = inode->i_sb->s_blocksize * map.m_len;

								Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ