lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171016105857.GQ3323@X58A-UD3R>
Date:   Mon, 16 Oct 2017 19:58:57 +0900
From:   Byungchul Park <byungchul.park@....com>
To:     Peter Zijlstra <peterz@...radead.org>, david@...morbit.com,
        darrick.wong@...cle.com
Cc:     Tejun Heo <tj@...nel.org>, johannes.berg@...el.com,
        mingo@...nel.org, tglx@...utronix.de, oleg@...hat.com,
        linux-kernel@...r.kernel.org, kernel-team@....com
Subject: Re: [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions
 wrt workqueue flush

On Fri, Oct 13, 2017 at 10:27:50AM +0200, Peter Zijlstra wrote:
> On Fri, Oct 13, 2017 at 04:56:33PM +0900, Byungchul Park wrote:
> > On Thu, Oct 12, 2017 at 05:56:35PM +0200, Peter Zijlstra wrote:
> > > On Thu, Oct 12, 2017 at 08:38:17AM -0700, Tejun Heo wrote:
> > > > 
> > > > As long as we have the same level of protection, simpler code is of
> > > > course preferable.  That said, I haven't followed the discussion
> > > > closely and don't want to apply it without Peter's ack.  Peter?
> > > 
> > > I'm really tied up atm; and feel we should be addressing the false
> > > positives generated by the current code before we start doing new stuff
> > > on top.
> > 
> > We can never avoid adding false dependencies as long as we use
> > acquisitions in that way the workqueue code does, even though you
> > successfully replace write acquisitions with recursive-read ones after
> > making them work, as you know.
> 
> Not the point; they still need to get annotated away. The block layer
> and xfs are now fairly consistently triggering lockdep splats, that
> needs to get sorted.

I believe that the following patch is helpful for that issue. Could you
check if it works, and give your opinion? I will add more comments on it
if you let me know if this patch is acceptable.

----->8-----
>From 340846e59c28457faf4d8051fb0c8a4c9da8c9b1 Mon Sep 17 00:00:00 2001
From: Byungchul Park <byungchul.park@....com>
Date: Mon, 16 Oct 2017 18:51:51 +0900
Subject: [RFC] lockdep: Assign a lock_class per gendisk used for
 wait_for_completion()

Darrick and Dave Chinner posted the following warning:

> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc1-fixes #1 Tainted: G        W
> ------------------------------------------------------
> loop0/31693 is trying to acquire lock:
>  (&(&ip->i_mmaplock)->mr_lock){++++}, at: [<ffffffffa00f1b0c>] xfs_ilock+0x23c/0x330 [xfs]
>
> but now in release context of a crosslock acquired at the following:
>  ((complete)&ret.event){+.+.}, at: [<ffffffff81326c1f>] submit_bio_wait+0x7f/0xb0
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 ((complete)&ret.event){+.+.}:
>        lock_acquire+0xab/0x200
>        wait_for_completion_io+0x4e/0x1a0
>        submit_bio_wait+0x7f/0xb0
>        blkdev_issue_zeroout+0x71/0xa0
>        xfs_bmapi_convert_unwritten+0x11f/0x1d0 [xfs]
>        xfs_bmapi_write+0x374/0x11f0 [xfs]
>        xfs_iomap_write_direct+0x2ac/0x430 [xfs]
>        xfs_file_iomap_begin+0x20d/0xd50 [xfs]
>        iomap_apply+0x43/0xe0
>        dax_iomap_rw+0x89/0xf0
>        xfs_file_dax_write+0xcc/0x220 [xfs]
>        xfs_file_write_iter+0xf0/0x130 [xfs]
>        __vfs_write+0xd9/0x150
>        vfs_write+0xc8/0x1c0
>        SyS_write+0x45/0xa0
>        entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> -> #1 (&xfs_nondir_ilock_class){++++}:
>        lock_acquire+0xab/0x200
>        down_write_nested+0x4a/0xb0
>        xfs_ilock+0x263/0x330 [xfs]
>        xfs_setattr_size+0x152/0x370 [xfs]
>        xfs_vn_setattr+0x6b/0x90 [xfs]
>        notify_change+0x27d/0x3f0
>        do_truncate+0x5b/0x90
>        path_openat+0x237/0xa90
>        do_filp_open+0x8a/0xf0
>        do_sys_open+0x11c/0x1f0
>        entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> -> #0 (&(&ip->i_mmaplock)->mr_lock){++++}:
>        up_write+0x1c/0x40
>        xfs_iunlock+0x1d0/0x310 [xfs]
>        xfs_file_fallocate+0x8a/0x310 [xfs]
>        loop_queue_work+0xb7/0x8d0
>        kthread_worker_fn+0xb9/0x1f0
>
> Chain exists of:
>   &(&ip->i_mmaplock)->mr_lock --> &xfs_nondir_ilock_class --> (complete)&ret.event
>
>  Possible unsafe locking scenario by crosslock:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&xfs_nondir_ilock_class);
>   lock((complete)&ret.event);
>                                lock(&(&ip->i_mmaplock)->mr_lock);
>                                unlock((complete)&ret.event);
>
>                *** DEADLOCK ***

The warning is a false positive, caused by the fact that all
wait_for_completion()s in submit_bio_wait() are waiting with the same
lock class.

However, some bios have nothing to do with others, for example, the case
might happen while using loop devices, between bios of an upper device
and a lower device(=loop device).

The safest way to assign different lock classes to different devices is
to do it for each gendisk. In other words, this patch assigns a
lockdep_map per gendisk and uses it when initializing completion in
submit_bio_wait().

Of course, it might be too conservative. But, making it safest for now
and extended by block layer experts later is good, atm.

Signed-off-by: Byungchul Park <byungchul.park@....com>
---
 block/bio.c           |  4 ++--
 block/genhd.c         | 13 +++++--------
 include/linux/genhd.h | 26 ++++++++++++++++++++++----
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 9a63597..0d4d6c0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -941,7 +941,7 @@ int submit_bio_wait(struct bio *bio)
 {
 	struct submit_bio_ret ret;
 
-	init_completion(&ret.event);
+	init_completion_with_map(&ret.event, &bio->bi_disk->lockdep_map);
 	bio->bi_private = &ret;
 	bio->bi_end_io = submit_bio_wait_endio;
 	bio->bi_opf |= REQ_SYNC;
@@ -1382,7 +1382,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
 
 			if (len <= 0)
 				break;
-			
+
 			if (bytes > len)
 				bytes = len;
 
diff --git a/block/genhd.c b/block/genhd.c
index 7f520fa..676c245 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1304,13 +1304,7 @@ dev_t blk_lookup_devt(const char *name, int partno)
 }
 EXPORT_SYMBOL(blk_lookup_devt);
 
-struct gendisk *alloc_disk(int minors)
-{
-	return alloc_disk_node(minors, NUMA_NO_NODE);
-}
-EXPORT_SYMBOL(alloc_disk);
-
-struct gendisk *alloc_disk_node(int minors, int node_id)
+struct gendisk *__alloc_disk_node(int minors, int node_id, struct lock_class_key *key, const char *lock_name)
 {
 	struct gendisk *disk;
 
@@ -1350,9 +1344,12 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
 		disk_to_dev(disk)->type = &disk_type;
 		device_initialize(disk_to_dev(disk));
 	}
+
+	lockdep_init_map(&disk->lockdep_map, lock_name, key, 0);
+
 	return disk;
 }
-EXPORT_SYMBOL(alloc_disk_node);
+EXPORT_SYMBOL(__alloc_disk_node);
 
 struct kobject *get_disk(struct gendisk *disk)
 {
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index e619fae..5225efc 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -3,7 +3,7 @@
 
 /*
  * 	genhd.h Copyright (C) 1992 Drew Eckhardt
- *	Generic hard disk header file by  
+ *	Generic hard disk header file by
  * 		Drew Eckhardt
  *
  *		<drew@...orado.edu>
@@ -206,6 +206,9 @@ struct gendisk {
 #endif	/* CONFIG_BLK_DEV_INTEGRITY */
 	int node_id;
 	struct badblocks *bb;
+#ifdef CONFIG_LOCKDEP_COMPLETIONS
+	struct lockdep_map lockdep_map;
+#endif
 };
 
 static inline struct gendisk *part_to_disk(struct hd_struct *part)
@@ -483,7 +486,7 @@ struct bsd_disklabel {
 	__s16	d_type;			/* drive type */
 	__s16	d_subtype;		/* controller/d_type specific */
 	char	d_typename[16];		/* type name, e.g. "eagle" */
-	char	d_packname[16];			/* pack identifier */ 
+	char	d_packname[16];			/* pack identifier */
 	__u32	d_secsize;		/* # of bytes per sector */
 	__u32	d_nsectors;		/* # of data sectors per track */
 	__u32	d_ntracks;		/* # of tracks per cylinder */
@@ -602,8 +605,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
 extern void delete_partition(struct gendisk *, int);
 extern void printk_all_partitions(void);
 
-extern struct gendisk *alloc_disk_node(int minors, int node_id);
-extern struct gendisk *alloc_disk(int minors);
+extern struct gendisk *__alloc_disk_node(int minors, int node_id, struct lock_class_key *key, const char *lock_name);
 extern struct kobject *get_disk(struct gendisk *disk);
 extern void put_disk(struct gendisk *disk);
 extern void blk_register_region(dev_t devt, unsigned long range,
@@ -627,6 +629,22 @@ extern ssize_t part_fail_store(struct device *dev,
 			       const char *buf, size_t count);
 #endif /* CONFIG_FAIL_MAKE_REQUEST */
 
+#ifdef CONFIG_LOCKDEP_COMPLETIONS
+#define alloc_disk_node(m, id) \
+({									\
+	static struct lock_class_key __key;				\
+	const char *__lock_name;					\
+									\
+	__lock_name = "(complete)"#m"("#id")";				\
+									\
+	__alloc_disk_node(m, id, &__key, __lock_name);			\
+})
+#else
+#define alloc_disk_node(m, id)	__alloc_disk_node(m, id, NULL, NULL)
+#endif
+
+#define alloc_disk(m)		alloc_disk_node(m, NUMA_NO_NODE)
+
 static inline int hd_ref_init(struct hd_struct *part)
 {
 	if (percpu_ref_init(&part->ref, __delete_partition, 0,
-- 
1.9.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ