lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 16 Oct 2017 20:06:28 +0900
From:   Byungchul Park <byungchul.park@....com>
To:     Peter Zijlstra <peterz@...radead.org>, david@...morbit.com,
        darrick.wong@...cle.com
Cc:     Tejun Heo <tj@...nel.org>, johannes.berg@...el.com,
        mingo@...nel.org, tglx@...utronix.de, oleg@...hat.com,
        linux-kernel@...r.kernel.org, kernel-team@....com
Subject: Re: [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions
 wrt workqueue flush

On Mon, Oct 16, 2017 at 07:58:57PM +0900, Byungchul Park wrote:
> On Fri, Oct 13, 2017 at 10:27:50AM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 13, 2017 at 04:56:33PM +0900, Byungchul Park wrote:
> > > On Thu, Oct 12, 2017 at 05:56:35PM +0200, Peter Zijlstra wrote:
> > > > On Thu, Oct 12, 2017 at 08:38:17AM -0700, Tejun Heo wrote:
> > > > > 
> > > > > As long as we have the same level of protection, simpler code is of
> > > > > course preferable.  That said, I haven't followed the discussion
> > > > > closely and don't want to apply it without Peter's ack.  Peter?
> > > > 
> > > > I'm really tied up atm; and feel we should be addressing the false
> > > > positives generated by the current code before we start doing new stuff
> > > > on top.
> > > 
> > > We can never avoid adding false dependencies as long as we use
> > > acquisitions in that way the workqueue code does, even though you
> > > successfully replace write acquisitions with recursive-read ones after
> > > making them work, as you know.
> > 
> > Not the point; they still need to get annotated away. The block layer
> > and xfs are now fairly consistently triggering lockdep splats, that
> > needs to get sorted.
> 
> I believe that the following patch is helpful for that issue. Could you
> check if it works, and give your opinion? I will add more comments on it
> if you let me know if this patch is acceptable.

I missed something to tell you, the following patch should be applied on
top of the first patch of this patchset, "completion: Add support for
initializing completion with lockdep_map".

> ----->8-----
> >From 340846e59c28457faf4d8051fb0c8a4c9da8c9b1 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul.park@....com>
> Date: Mon, 16 Oct 2017 18:51:51 +0900
> Subject: [RFC] lockdep: Assign a lock_class per gendisk used for
>  wait_for_completion()
> 
> Darrick and Dave Chinner posted the following warning:
> 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.14.0-rc1-fixes #1 Tainted: G        W
> > ------------------------------------------------------
> > loop0/31693 is trying to acquire lock:
> >  (&(&ip->i_mmaplock)->mr_lock){++++}, at: [<ffffffffa00f1b0c>] xfs_ilock+0x23c/0x330 [xfs]
> >
> > but now in release context of a crosslock acquired at the following:
> >  ((complete)&ret.event){+.+.}, at: [<ffffffff81326c1f>] submit_bio_wait+0x7f/0xb0
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #2 ((complete)&ret.event){+.+.}:
> >        lock_acquire+0xab/0x200
> >        wait_for_completion_io+0x4e/0x1a0
> >        submit_bio_wait+0x7f/0xb0
> >        blkdev_issue_zeroout+0x71/0xa0
> >        xfs_bmapi_convert_unwritten+0x11f/0x1d0 [xfs]
> >        xfs_bmapi_write+0x374/0x11f0 [xfs]
> >        xfs_iomap_write_direct+0x2ac/0x430 [xfs]
> >        xfs_file_iomap_begin+0x20d/0xd50 [xfs]
> >        iomap_apply+0x43/0xe0
> >        dax_iomap_rw+0x89/0xf0
> >        xfs_file_dax_write+0xcc/0x220 [xfs]
> >        xfs_file_write_iter+0xf0/0x130 [xfs]
> >        __vfs_write+0xd9/0x150
> >        vfs_write+0xc8/0x1c0
> >        SyS_write+0x45/0xa0
> >        entry_SYSCALL_64_fastpath+0x1f/0xbe
> >
> > -> #1 (&xfs_nondir_ilock_class){++++}:
> >        lock_acquire+0xab/0x200
> >        down_write_nested+0x4a/0xb0
> >        xfs_ilock+0x263/0x330 [xfs]
> >        xfs_setattr_size+0x152/0x370 [xfs]
> >        xfs_vn_setattr+0x6b/0x90 [xfs]
> >        notify_change+0x27d/0x3f0
> >        do_truncate+0x5b/0x90
> >        path_openat+0x237/0xa90
> >        do_filp_open+0x8a/0xf0
> >        do_sys_open+0x11c/0x1f0
> >        entry_SYSCALL_64_fastpath+0x1f/0xbe
> >
> > -> #0 (&(&ip->i_mmaplock)->mr_lock){++++}:
> >        up_write+0x1c/0x40
> >        xfs_iunlock+0x1d0/0x310 [xfs]
> >        xfs_file_fallocate+0x8a/0x310 [xfs]
> >        loop_queue_work+0xb7/0x8d0
> >        kthread_worker_fn+0xb9/0x1f0
> >
> > Chain exists of:
> >   &(&ip->i_mmaplock)->mr_lock --> &xfs_nondir_ilock_class --> (complete)&ret.event
> >
> >  Possible unsafe locking scenario by crosslock:
> >
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(&xfs_nondir_ilock_class);
> >   lock((complete)&ret.event);
> >                                lock(&(&ip->i_mmaplock)->mr_lock);
> >                                unlock((complete)&ret.event);
> >
> >                *** DEADLOCK ***
> 
> The warning is a false positive, caused by the fact that all
> wait_for_completion()s in submit_bio_wait() are waiting with the same
> lock class.
> 
> However, some bios have nothing to do with others, for example, the case
> might happen while using loop devices, between bios of an upper device
> and a lower device(=loop device).
> 
> The safest way to assign different lock classes to different devices is
> to do it for each gendisk. In other words, this patch assigns a
> lockdep_map per gendisk and uses it when initializing completion in
> submit_bio_wait().
> 
> Of course, it might be too conservative. But, making it safest for now
> and extended by block layer experts later is good, atm.
> 
> Signed-off-by: Byungchul Park <byungchul.park@....com>
> ---
>  block/bio.c           |  4 ++--
>  block/genhd.c         | 13 +++++--------
>  include/linux/genhd.h | 26 ++++++++++++++++++++++----
>  3 files changed, 29 insertions(+), 14 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 9a63597..0d4d6c0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -941,7 +941,7 @@ int submit_bio_wait(struct bio *bio)
>  {
>  	struct submit_bio_ret ret;
>  
> -	init_completion(&ret.event);
> +	init_completion_with_map(&ret.event, &bio->bi_disk->lockdep_map);
>  	bio->bi_private = &ret;
>  	bio->bi_end_io = submit_bio_wait_endio;
>  	bio->bi_opf |= REQ_SYNC;
> @@ -1382,7 +1382,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
>  
>  			if (len <= 0)
>  				break;
> -			
> +
>  			if (bytes > len)
>  				bytes = len;
>  
> diff --git a/block/genhd.c b/block/genhd.c
> index 7f520fa..676c245 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1304,13 +1304,7 @@ dev_t blk_lookup_devt(const char *name, int partno)
>  }
>  EXPORT_SYMBOL(blk_lookup_devt);
>  
> -struct gendisk *alloc_disk(int minors)
> -{
> -	return alloc_disk_node(minors, NUMA_NO_NODE);
> -}
> -EXPORT_SYMBOL(alloc_disk);
> -
> -struct gendisk *alloc_disk_node(int minors, int node_id)
> +struct gendisk *__alloc_disk_node(int minors, int node_id, struct lock_class_key *key, const char *lock_name)
>  {
>  	struct gendisk *disk;
>  
> @@ -1350,9 +1344,12 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
>  		disk_to_dev(disk)->type = &disk_type;
>  		device_initialize(disk_to_dev(disk));
>  	}
> +
> +	lockdep_init_map(&disk->lockdep_map, lock_name, key, 0);
> +
>  	return disk;
>  }
> -EXPORT_SYMBOL(alloc_disk_node);
> +EXPORT_SYMBOL(__alloc_disk_node);
>  
>  struct kobject *get_disk(struct gendisk *disk)
>  {
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index e619fae..5225efc 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -3,7 +3,7 @@
>  
>  /*
>   * 	genhd.h Copyright (C) 1992 Drew Eckhardt
> - *	Generic hard disk header file by  
> + *	Generic hard disk header file by
>   * 		Drew Eckhardt
>   *
>   *		<drew@...orado.edu>
> @@ -206,6 +206,9 @@ struct gendisk {
>  #endif	/* CONFIG_BLK_DEV_INTEGRITY */
>  	int node_id;
>  	struct badblocks *bb;
> +#ifdef CONFIG_LOCKDEP_COMPLETIONS
> +	struct lockdep_map lockdep_map;
> +#endif
>  };
>  
>  static inline struct gendisk *part_to_disk(struct hd_struct *part)
> @@ -483,7 +486,7 @@ struct bsd_disklabel {
>  	__s16	d_type;			/* drive type */
>  	__s16	d_subtype;		/* controller/d_type specific */
>  	char	d_typename[16];		/* type name, e.g. "eagle" */
> -	char	d_packname[16];			/* pack identifier */ 
> +	char	d_packname[16];			/* pack identifier */
>  	__u32	d_secsize;		/* # of bytes per sector */
>  	__u32	d_nsectors;		/* # of data sectors per track */
>  	__u32	d_ntracks;		/* # of tracks per cylinder */
> @@ -602,8 +605,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
>  extern void delete_partition(struct gendisk *, int);
>  extern void printk_all_partitions(void);
>  
> -extern struct gendisk *alloc_disk_node(int minors, int node_id);
> -extern struct gendisk *alloc_disk(int minors);
> +extern struct gendisk *__alloc_disk_node(int minors, int node_id, struct lock_class_key *key, const char *lock_name);
>  extern struct kobject *get_disk(struct gendisk *disk);
>  extern void put_disk(struct gendisk *disk);
>  extern void blk_register_region(dev_t devt, unsigned long range,
> @@ -627,6 +629,22 @@ extern ssize_t part_fail_store(struct device *dev,
>  			       const char *buf, size_t count);
>  #endif /* CONFIG_FAIL_MAKE_REQUEST */
>  
> +#ifdef CONFIG_LOCKDEP_COMPLETIONS
> +#define alloc_disk_node(m, id) \
> +({									\
> +	static struct lock_class_key __key;				\
> +	const char *__lock_name;					\
> +									\
> +	__lock_name = "(complete)"#m"("#id")";				\
> +									\
> +	__alloc_disk_node(m, id, &__key, __lock_name);			\
> +})
> +#else
> +#define alloc_disk_node(m, id)	__alloc_disk_node(m, id, NULL, NULL)
> +#endif
> +
> +#define alloc_disk(m)		alloc_disk_node(m, NUMA_NO_NODE)
> +
>  static inline int hd_ref_init(struct hd_struct *part)
>  {
>  	if (percpu_ref_init(&part->ref, __delete_partition, 0,
> -- 
> 1.9.1

Powered by blists - more mailing lists