lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 1 Nov 2017 11:48:44 -0400
From:   Mike Snitzer <snitzer@...hat.com>
To:     Mikulas Patocka <mpatocka@...hat.com>
Cc:     Zdenek Kabelac <zkabelac@...hat.com>,
        "Alasdair G. Kergon" <agk@...hat.com>, dm-devel@...hat.com,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org
Subject: SRCU's apparent use of NR_CPUS? [was: re: dm: allocate struct
 mapped_device with kvzalloc]

[cc'ing Paul, and LKML, to get his/others' take on SRCU cpu scaling]

On Tue, Oct 31 2017 at  7:33pm -0400,
Mikulas Patocka <mpatocka@...hat.com> wrote:

> The structure srcu_struct can be very big, its size is proportional to the 
> value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field 
> io_barrier in the struct mapped_device has 84kB in the debugging kernel 
> and 50kB in the non-debugging kernel. The large size may result in failure 
> of the function kzalloc_node.
> 
> In order to avoid the allocation failure, we use the function
> kvzalloc_node, this function falls back to vmalloc if a large contiguous
> chunk of memory is not available. This patch also moves the field
> io_barrier to the last position of struct mapped_device - the reason is
> that on many processor architectures, short memory offsets result in
> smaller code than long memory offsets - on x86-64 it reduces code size by
> 320 bytes.
> 
> Note to stable kernel maintainers - the kernels 4.11 and older don't have
> the function kvzalloc_node, you can use the function vzalloc_node instead.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@...hat.com>
> Cc: stable@...r.kernel.org

This looks reasonable as a near-term workaround.. BUT:
Paul has there been any discussion about how to make SRCU support
dynamically scaling up to NR_CPUS maximum as 'nr_cpus' changes (rather
than accounting for worst case of NR_CPUS up-front)?

(But I had a quick look at scrutree.h and I'm not seeing explicit use of
NR_CPUS, so it is likely occuring via implicit percpu through some
member of 'struct srcu_struct', e.g. 'sda'?)

Thanks,
Mike

> 
> ---
>  drivers/md/dm-core.h |    3 ++-
>  drivers/md/dm.c      |    6 +++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/drivers/md/dm-core.h
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-core.h
> +++ linux-2.6/drivers/md/dm-core.h
> @@ -29,7 +29,6 @@ struct dm_kobject_holder {
>   * DM targets must _not_ deference a mapped_device to directly access its members!
>   */
>  struct mapped_device {
> -	struct srcu_struct io_barrier;
>  	struct mutex suspend_lock;
>  
>  	/*
> @@ -127,6 +126,8 @@ struct mapped_device {
>  	struct blk_mq_tag_set *tag_set;
>  	bool use_blk_mq:1;
>  	bool init_tio_pdu:1;
> +
> +	struct srcu_struct io_barrier;
>  };
>  
>  void dm_init_md_queue(struct mapped_device *md);
> Index: linux-2.6/drivers/md/dm.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm.c
> +++ linux-2.6/drivers/md/dm.c
> @@ -1695,7 +1695,7 @@ static struct mapped_device *alloc_dev(i
>  	struct mapped_device *md;
>  	void *old_md;
>  
> -	md = kzalloc_node(sizeof(*md), GFP_KERNEL, numa_node_id);
> +	md = kvzalloc_node(sizeof(*md), GFP_KERNEL, numa_node_id);
>  	if (!md) {
>  		DMWARN("unable to allocate device, out of memory.");
>  		return NULL;
> @@ -1795,7 +1795,7 @@ bad_io_barrier:
>  bad_minor:
>  	module_put(THIS_MODULE);
>  bad_module_get:
> -	kfree(md);
> +	kvfree(md);
>  	return NULL;
>  }
>  
> @@ -1814,7 +1814,7 @@ static void free_dev(struct mapped_devic
>  	free_minor(minor);
>  
>  	module_put(THIS_MODULE);
> -	kfree(md);
> +	kvfree(md);
>  }
>  
>  static void __bind_mempools(struct mapped_device *md, struct dm_table *t)

Powered by blists - more mailing lists