lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1448055854.29114.9.camel@intel.com>
Date:	Fri, 20 Nov 2015 21:44:16 +0000
From:	"Williams, Dan J" <dan.j.williams@...el.com>
To:	"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
	"Zwisler, Ross" <ross.zwisler@...el.com>
Subject: [GIT PULL] libnvdimm fixes for 4.4-rc2

Hi Linus, please pull from...

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes

...to receive:

1/ A collection of crash and deadlock fixes for DAX that are also
tagged for -stable.  We will look to re-enable DAX pmd mappings in 4.5,
but for now 4.4 and -stable should disable it by default.

2/ A fixup to ext2 and ext4 to mirror the same warning emitted by XFS
when mounting with "-o dax"

This set has received a build success notification from the kbuild
robot.

The following changes since commit 8005c49d9aea74d382f474ce11afbbc7d7130bec:

  Linux 4.4-rc1 (2015-11-15 17:00:27 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes

for you to fetch changes up to 2e6edc95382cc36423aff18a237173ad62d5ab52:

  block: protect rw_page against device teardown (2015-11-19 13:47:10 -0800)

----------------------------------------------------------------
Dan Williams (3):
      ext2, ext4: warn when mounting with dax enabled
      dax: disable pmd mappings
      block: protect rw_page against device teardown

Yigal Korman (1):
      mm, dax: fix DAX deadlocks (COW fault)

 block/blk.h            |  2 --
 fs/Kconfig             |  6 ++++++
 fs/block_dev.c         | 18 ++++++++++++++++--
 fs/dax.c               |  4 ++++
 fs/ext2/super.c        |  2 ++
 fs/ext4/super.c        |  6 +++++-
 include/linux/blkdev.h |  2 ++
 mm/memory.c            |  8 ++++----
 8 files changed, 39 insertions(+), 9 deletions(-)

commit 2e6edc95382cc36423aff18a237173ad62d5ab52
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Thu Nov 19 13:29:28 2015 -0800

    block: protect rw_page against device teardown
    
    Fix use after free crashes like the following:
    
     general protection fault: 0000 [#1] SMP
     Call Trace:
      [<ffffffffa0050216>] ? pmem_do_bvec.isra.12+0xa6/0xf0 [nd_pmem]
      [<ffffffffa0050ba2>] pmem_rw_page+0x42/0x80 [nd_pmem]
      [<ffffffff8128fd90>] bdev_read_page+0x50/0x60
      [<ffffffff812972f0>] do_mpage_readpage+0x510/0x770
      [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20
      [<ffffffff811d86dc>] ? lru_cache_add+0x1c/0x50
      [<ffffffff81297657>] mpage_readpages+0x107/0x170
      [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20
      [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20
      [<ffffffff8129058d>] blkdev_readpages+0x1d/0x20
      [<ffffffff811d615f>] __do_page_cache_readahead+0x28f/0x310
      [<ffffffff811d6039>] ? __do_page_cache_readahead+0x169/0x310
      [<ffffffff811c5abd>] ? pagecache_get_page+0x2d/0x1d0
      [<ffffffff811c76f6>] filemap_fault+0x396/0x530
      [<ffffffff811f816e>] __do_fault+0x4e/0xf0
      [<ffffffff811fce7d>] handle_mm_fault+0x11bd/0x1b50
    
    Cc: <stable@...r.kernel.org>
    Cc: Jens Axboe <axboe@...com>
    Cc: Alexander Viro <viro@...iv.linux.org.uk>
    Reported-by: kbuild test robot <lkp@...el.com>
    Acked-by: Matthew Wilcox <willy@...ux.intel.com>
    [willy: symmetry fixups]
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit 0df9d41ab5d43dc5b20abc8b22a6b6d098b03994
Author: Yigal Korman <yigal@...xistor.com>
Date:   Mon Nov 16 14:09:15 2015 +0200

    mm, dax: fix DAX deadlocks (COW fault)
    
    DAX handling of COW faults has wrong locking sequence:
    	dax_fault does i_mmap_lock_read
    	do_cow_fault does i_mmap_unlock_write
    
    Ross's commit[1] missed a fix[2] that Kirill added to Matthew's
    commit[3].
    
    Original COW locking logic was introduced by Matthew here[4].
    
    This should be applied to v4.3 as well.
    
    [1] 0f90cc6609c7 mm, dax: fix DAX deadlocks
    [2] 52a2b53ffde6 mm, dax: use i_mmap_unlock_write() in do_cow_fault()
    [3] 843172978bb9 dax: fix race between simultaneous faults
    [4] 2e4cdab0584f mm: allow page fault handlers to perform the COW
    
    Cc: <stable@...r.kernel.org>
    Cc: Boaz Harrosh <boaz@...xistor.com>
    Cc: Alexander Viro <viro@...iv.linux.org.uk>
    Cc: Dave Chinner <dchinner@...hat.com>
    Cc: Jan Kara <jack@...e.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
    Cc: Matthew Wilcox <matthew.r.wilcox@...el.com>
    Acked-by: Ross Zwisler <ross.zwisler@...ux.intel.com>
    Signed-off-by: Yigal Korman <yigal@...xistor.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit ee82c9ed41e896bd47e121d87e4628de0f2656a3
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Sun Nov 15 16:06:32 2015 -0800

    dax: disable pmd mappings
    
    While dax pmd mappings are functional in the nominal path they trigger
    kernel crashes in the following paths:
    
     BUG: unable to handle kernel paging request at ffffea0004098000
     IP: [<ffffffff812362f7>] follow_trans_huge_pmd+0x117/0x3b0
     [..]
     Call Trace:
      [<ffffffff811f6573>] follow_page_mask+0x2d3/0x380
      [<ffffffff811f6708>] __get_user_pages+0xe8/0x6f0
      [<ffffffff811f7045>] get_user_pages_unlocked+0x165/0x1e0
      [<ffffffff8106f5b1>] get_user_pages_fast+0xa1/0x1b0
    
     kernel BUG at arch/x86/mm/gup.c:131!
     [..]
     Call Trace:
      [<ffffffff8106f34c>] gup_pud_range+0x1bc/0x220
      [<ffffffff8106f634>] get_user_pages_fast+0x124/0x1b0
    
     BUG: unable to handle kernel paging request at ffffea0004088000
     IP: [<ffffffff81235f49>] copy_huge_pmd+0x159/0x350
     [..]
     Call Trace:
      [<ffffffff811fad3c>] copy_page_range+0x34c/0x9f0
      [<ffffffff810a0daf>] copy_process+0x1b7f/0x1e10
      [<ffffffff810a11c1>] _do_fork+0x91/0x590
    
    All of these paths are interpreting a dax pmd mapping as a transparent
    huge page and making the assumption that the pfn is covered by the
    memmap, i.e. that the pfn has an associated struct page.  PTE mappings
    do not suffer the same fate since they have the _PAGE_SPECIAL flag to
    cause the gup path to fault.  We can do something similar for the PMD
    path, or otherwise defer pmd support for cases where a struct page is
    available.  For now, 4.4-rc and -stable need to disable dax pmd support
    by default.
    
    For development the "depends on BROKEN" line can be removed from
    CONFIG_FS_DAX_PMD.
    
    Cc: <stable@...r.kernel.org>
    Cc: Jan Kara <jack@...e.com>
    Cc: Dave Chinner <david@...morbit.com>
    Cc: Matthew Wilcox <willy@...ux.intel.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
    Reported-by: Ross Zwisler <ross.zwisler@...ux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

commit ef83b6e8f40bb24b92ad73b5889732346e54a793
Author: Dan Williams <dan.j.williams@...el.com>
Date:   Tue Sep 29 15:48:11 2015 -0400

    ext2, ext4: warn when mounting with dax enabled
    
    Similar to XFS warn when mounting DAX while it is still considered under
    development.  Also, aspects of the DAX implementation, for example
    synchronization against multiple faults and faults causing block
    allocation, depend on the correct implementation in the filesystem.  The
    maturity of a given DAX implementation is filesystem specific.
    
    Cc: <stable@...r.kernel.org>
    Cc: "Theodore Ts'o" <tytso@....edu>
    Cc: Matthew Wilcox <willy@...ux.intel.com>
    Cc: linux-ext4@...r.kernel.org
    Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
    Reported-by: Dave Chinner <david@...morbit.com>
    Acked-by: Jan Kara <jack@...e.com>
    Signed-off-by: Dan Williams <dan.j.williams@...el.com>

diff --git a/block/blk.h b/block/blk.h
index da722eb786df..c43926d3d74d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -72,8 +72,6 @@ void blk_dequeue_request(struct request *rq);
 void __blk_queue_free_tags(struct request_queue *q);
 bool __blk_end_bidi_request(struct request *rq, int error,
 			    unsigned int nr_bytes, unsigned int bidi_bytes);
-int blk_queue_enter(struct request_queue *q, gfp_t gfp);
-void blk_queue_exit(struct request_queue *q);
 void blk_freeze_queue(struct request_queue *q);
 
 static inline void blk_queue_enter_live(struct request_queue *q)
diff --git a/fs/Kconfig b/fs/Kconfig
index da3f32f1a4e4..6ce72d8d1ee1 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -46,6 +46,12 @@ config FS_DAX
 	  or if unsure, say N.  Saying Y will increase the size of the kernel
 	  by about 5kB.
 
+config FS_DAX_PMD
+	bool
+	default FS_DAX
+	depends on FS_DAX
+	depends on BROKEN
+
 endif # BLOCK
 
 # Posix ACL utility routines
diff --git a/fs/block_dev.c b/fs/block_dev.c
index bb0dfb1c7af1..c25639e907bd 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -390,9 +390,17 @@ int bdev_read_page(struct block_device *bdev, sector_t sector,
 			struct page *page)
 {
 	const struct block_device_operations *ops = bdev->bd_disk->fops;
+	int result = -EOPNOTSUPP;
+
 	if (!ops->rw_page || bdev_get_integrity(bdev))
-		return -EOPNOTSUPP;
-	return ops->rw_page(bdev, sector + get_start_sect(bdev), page, READ);
+		return result;
+
+	result = blk_queue_enter(bdev->bd_queue, GFP_KERNEL);
+	if (result)
+		return result;
+	result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, READ);
+	blk_queue_exit(bdev->bd_queue);
+	return result;
 }
 EXPORT_SYMBOL_GPL(bdev_read_page);
 
@@ -421,14 +429,20 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
 	int result;
 	int rw = (wbc->sync_mode == WB_SYNC_ALL) ? WRITE_SYNC : WRITE;
 	const struct block_device_operations *ops = bdev->bd_disk->fops;
+
 	if (!ops->rw_page || bdev_get_integrity(bdev))
 		return -EOPNOTSUPP;
+	result = blk_queue_enter(bdev->bd_queue, GFP_KERNEL);
+	if (result)
+		return result;
+
 	set_page_writeback(page);
 	result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, rw);
 	if (result)
 		end_page_writeback(page);
 	else
 		unlock_page(page);
+	blk_queue_exit(bdev->bd_queue);
 	return result;
 }
 EXPORT_SYMBOL_GPL(bdev_write_page);
diff --git a/fs/dax.c b/fs/dax.c
index d1e5cb7311a1..43671b68220e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -541,6 +541,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 	unsigned long pfn;
 	int result = 0;
 
+	/* dax pmd mappings are broken wrt gup and fork */
+	if (!IS_ENABLED(CONFIG_FS_DAX_PMD))
+		return VM_FAULT_FALLBACK;
+
 	/* Fall back to PTEs if we're going to COW */
 	if (write && !(vma->vm_flags & VM_SHARED))
 		return VM_FAULT_FALLBACK;
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 3a71cea68420..748d35afc902 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -569,6 +569,8 @@ static int parse_options(char *options, struct super_block *sb)
 			/* Fall through */
 		case Opt_dax:
 #ifdef CONFIG_FS_DAX
+			ext2_msg(sb, KERN_WARNING,
+		"DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
 			set_opt(sbi->s_mount_opt, DAX);
 #else
 			ext2_msg(sb, KERN_INFO, "dax option not supported");
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 753f4e68b820..c9ab67da6e5a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1664,8 +1664,12 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
 		}
 		sbi->s_jquota_fmt = m->mount_opt;
 #endif
-#ifndef CONFIG_FS_DAX
 	} else if (token == Opt_dax) {
+#ifdef CONFIG_FS_DAX
+		ext4_msg(sb, KERN_WARNING,
+		"DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
+			sbi->s_mount_opt |= m->mount_opt;
+#else
 		ext4_msg(sb, KERN_INFO, "dax option not supported");
 		return -1;
 #endif
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3fe27f8d91f0..c0d2b7927c1f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -794,6 +794,8 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
 
+extern int blk_queue_enter(struct request_queue *q, gfp_t gfp);
+extern void blk_queue_exit(struct request_queue *q);
 extern void blk_start_queue(struct request_queue *q);
 extern void blk_stop_queue(struct request_queue *q);
 extern void blk_sync_queue(struct request_queue *q);
diff --git a/mm/memory.c b/mm/memory.c
index deb679c31f2a..c387430f06c3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3015,9 +3015,9 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		} else {
 			/*
 			 * The fault handler has no page to lock, so it holds
-			 * i_mmap_lock for write to protect against truncate.
+			 * i_mmap_lock for read to protect against truncate.
 			 */
-			i_mmap_unlock_write(vma->vm_file->f_mapping);
+			i_mmap_unlock_read(vma->vm_file->f_mapping);
 		}
 		goto uncharge_out;
 	}
@@ -3031,9 +3031,9 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	} else {
 		/*
 		 * The fault handler has no page to lock, so it holds
-		 * i_mmap_lock for write to protect against truncate.
+		 * i_mmap_lock for read to protect against truncate.
 		 */
-		i_mmap_unlock_write(vma->vm_file->f_mapping);
+		i_mmap_unlock_read(vma->vm_file->f_mapping);
 	}
 	return ret;
 uncharge_out:--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ