[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CY1PR0401MB1536AD7F70C960D62DCA6983810F0@CY1PR0401MB1536.namprd04.prod.outlook.com>
Date: Sat, 8 Apr 2017 20:56:10 +0000
From: Bart Van Assche <Bart.VanAssche@...disk.com>
To: Javier González <jg@...htnvm.io>,
"mb@...htnvm.io" <mb@...htnvm.io>
CC: "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Javier González <javier@...xlabs.com>
Subject: Re: [PATCH v3] lightnvm: physical block device (pblk) target
On 04/07/17 11:50, Javier González wrote:
> Documentation/lightnvm/pblk.txt | 21 +
> drivers/lightnvm/Kconfig | 19 +
> drivers/lightnvm/Makefile | 5 +
> drivers/lightnvm/pblk-cache.c | 112 +++
> drivers/lightnvm/pblk-core.c | 1641 ++++++++++++++++++++++++++++++++++++++
> drivers/lightnvm/pblk-gc.c | 542 +++++++++++++
> drivers/lightnvm/pblk-init.c | 942 ++++++++++++++++++++++
> drivers/lightnvm/pblk-map.c | 135 ++++
> drivers/lightnvm/pblk-rb.c | 859 ++++++++++++++++++++
> drivers/lightnvm/pblk-read.c | 513 ++++++++++++
> drivers/lightnvm/pblk-recovery.c | 1007 +++++++++++++++++++++++
> drivers/lightnvm/pblk-rl.c | 182 +++++
> drivers/lightnvm/pblk-sysfs.c | 500 ++++++++++++
> drivers/lightnvm/pblk-write.c | 408 ++++++++++
> drivers/lightnvm/pblk.h | 1127 ++++++++++++++++++++++++++
> include/linux/lightnvm.h | 57 +-
> pblk-sysfs.c | 581 ++++++++++++++
This patch introduces two slightly different versions of pblk-sysfs.c -
one at the top level and one in drivers/lightnvm. Please remove the file
at the top level.
> +config NVM_PBLK_L2P_OPT
> + bool "PBLK with optimized L2P table for devices up to 8TB"
> + depends on NVM_PBLK
> + ---help---
> + Physical device addresses are typical 64-bit integers. Since we store
> + the logical to physical (L2P) table on the host, this takes 1/500 of
> + host memory (e.g., 2GB per 1TB of storage media). On drives under 8TB,
> + it is possible to reduce this to 1/1000 (e.g., 1GB per 1TB). This
> + option allows to do this optimization on the host L2P table.
Why is NVM_PBLK_L2P_OPT a compile-time option instead of a run-time
option? Since this define does not affect the definition of the ppa_addr
I don't see why this has to be a compile-time option. For e.g. Linux
distributors the only choice would be to disable NVM_PBLK_L2P_OPT. I
think it would be unfortunate if no Linux distribution ever would be
able to benefit from this optimization.
> +#ifdef CONFIG_NVM_DEBUG
> + atomic_add(nr_entries, &pblk->inflight_writes);
> + atomic_add(nr_entries, &pblk->req_writes);
> +#endif
Has it been considered to use the "static key" feature such that
consistency checks can be enabled at run-time instead of having to
rebuild the kernel to enable CONFIG_NVM_DEBUG?
> +#ifdef CONFIG_NVM_DEBUG
> + BUG_ON(nr_rec_entries != valid_entries);
> + atomic_add(valid_entries, &pblk->inflight_writes);
> + atomic_add(valid_entries, &pblk->recov_gc_writes);
> +#endif
Are you aware that Linus is strongly opposed against using BUG_ON()?
> +#ifdef CONFIG_NVM_DEBUG
> + lockdep_assert_held(&l_mg->free_lock);
> +#endif
Why is lockdep_assert_held() surrounded with #ifdef CONFIG_NVM_DEBUG /
#endif? Are you aware that lockdep_assert_held() do not generate any
code with CONFIG_PROVE_LOCKING=n?
> +static const struct block_device_operations pblk_fops = {
> + .owner = THIS_MODULE,
> +};
Is this data structure useful? If so, where is pblk_fops used?
> +static void pblk_l2p_free(struct pblk *pblk)
> +{
> + vfree(pblk->trans_map);
> +}
> +
> +static int pblk_l2p_init(struct pblk *pblk)
> +{
> + sector_t i;
> +
> + pblk->trans_map = vmalloc(sizeof(pblk_addr) * pblk->rl.nr_secs);
> + if (!pblk->trans_map)
> + return -ENOMEM;
> +
> + for (i = 0; i < pblk->rl.nr_secs; i++)
> + pblk_ppa_set_empty(&pblk->trans_map[i]);
> +
> + return 0;
> +}
Has it been considered to add support for keeping a subset of the L2P
translation table in memory instead of keeping it in memory in its entirety?
> + sprintf(cache_name, "pblk_line_m_%s", pblk->disk->disk_name);
Please use snprintf() or kasprintf() instead of printf(). That makes it
easier for humans to verify that no buffer overflow is triggered.
> +/* physical block device target */
> +static struct nvm_tgt_type tt_pblk = {
> + .name = "pblk",
> + .version = {1, 0, 0},
Are version numbers useful inside the kernel tree?
> +void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
> + unsigned long *lun_bitmap, unsigned int valid_secs,
> + unsigned int off)
> +{
> + struct pblk_sec_meta *meta_list = rqd->meta_list;
> + unsigned int map_secs;
> + int min = pblk->min_write_pgs;
> + int i;
> +
> + for (i = off; i < rqd->nr_ppas; i += min) {
> + map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
> + pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
> + lun_bitmap, &meta_list[i], map_secs);
> + }
> +}
Has it been considered to implement the above code such that no modulo
(%) computation is needed, which is a relatively expensive operation? I
think for this loop that can be done easily.
> +static DECLARE_RWSEM(pblk_rb_lock);
> +
> +void pblk_rb_data_free(struct pblk_rb *rb)
> +{
> + struct pblk_rb_pages *p, *t;
> +
> + down_write(&pblk_rb_lock);
> + list_for_each_entry_safe(p, t, &rb->pages, list) {
> + free_pages((unsigned long)page_address(p->pages), p->order);
> + list_del(&p->list);
> + kfree(p);
> + }
> + up_write(&pblk_rb_lock);
> +}
Global locks like pblk_rb_lock are a performance bottleneck on
multi-socket systems. Why is that lock global?
Bart.
Powered by blists - more mailing lists