[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100210110551.GA1323@localhost>
Date: Wed, 10 Feb 2010 19:05:51 +0800
From: Wu Fengguang <fengguang.wu@...el.com>
To: Nikanth Karthikesan <knikanth@...e.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
Jens Axboe <jens.axboe@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH v2] Make vm_max_readahead configurable at run-time
Nikanth,
> Make vm_max_readahead configurable at run-time. Expose a sysctl knob
> in procfs to change it. This would ensure that new disks added would
> use this value as their default read_ahead_kb.
Do you have use case, or customer demand for it?
> Also filesystems which use default_backing_dev_info would also
> use this new value, even if they were already mounted.
>
> Currently xfs, btrfs, nilfs, raw, mtd use the default_backing_dev_info.
This sounds like bad interface, in that users will be confused by the
tricky details of "works for new devices" and "works for some fs".
One more tricky point is, btrfs/md/dm readahead size may not be
influenced if some of the component disks are hot added.
So this patch is only going to work for hot-plugged disks that
contains _standalone_ filesystem. Is this typical use case in servers?
Thanks,
Fengguang
>
> Signed-off-by: Nikanth Karthikesan <knikanth@...e.de>
>
> ---
>
> Index: linux-2.6/block/blk-core.c
> ===================================================================
> --- linux-2.6.orig/block/blk-core.c
> +++ linux-2.6/block/blk-core.c
> @@ -499,7 +499,7 @@ struct request_queue *blk_alloc_queue_no
> q->backing_dev_info.unplug_io_fn = blk_backing_dev_unplug;
> q->backing_dev_info.unplug_io_data = q;
> q->backing_dev_info.ra_pages =
> - (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
> + (vm_max_readahead_kb * 1024) / PAGE_CACHE_SIZE;
> q->backing_dev_info.state = 0;
> q->backing_dev_info.capabilities = BDI_CAP_MAP_COPY;
> q->backing_dev_info.name = "block";
> Index: linux-2.6/fs/fuse/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/fuse/inode.c
> +++ linux-2.6/fs/fuse/inode.c
> @@ -870,7 +870,7 @@ static int fuse_bdi_init(struct fuse_con
> int err;
>
> fc->bdi.name = "fuse";
> - fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
> + fc->bdi.ra_pages = (vm_max_readahead_kb * 1024) / PAGE_CACHE_SIZE;
> fc->bdi.unplug_io_fn = default_unplug_io_fn;
> /* fuse does it's own writeback accounting */
> fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB;
> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h
> +++ linux-2.6/include/linux/mm.h
> @@ -1188,7 +1188,11 @@ int write_one_page(struct page *page, in
> void task_dirty_inc(struct task_struct *tsk);
>
> /* readahead.c */
> -#define VM_MAX_READAHEAD 128 /* kbytes */
> +#define INITIAL_VM_MAX_READAHEAD_KB 128
> +extern unsigned long vm_max_readahead_kb;
> +
> +int sysctl_vm_max_readahead_kb_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos);
>
> int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
> pgoff_t offset, unsigned long nr_to_read);
> Index: linux-2.6/mm/backing-dev.c
> ===================================================================
> --- linux-2.6.orig/mm/backing-dev.c
> +++ linux-2.6/mm/backing-dev.c
> @@ -18,7 +18,8 @@ EXPORT_SYMBOL(default_unplug_io_fn);
>
> struct backing_dev_info default_backing_dev_info = {
> .name = "default",
> - .ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE,
> + .ra_pages = INITIAL_VM_MAX_READAHEAD_KB
> + >> (PAGE_CACHE_SHIFT - 10),
> .state = 0,
> .capabilities = BDI_CAP_MAP_COPY,
> .unplug_io_fn = default_unplug_io_fn,
> Index: linux-2.6/mm/readahead.c
> ===================================================================
> --- linux-2.6.orig/mm/readahead.c
> +++ linux-2.6/mm/readahead.c
> @@ -17,6 +17,19 @@
> #include <linux/pagevec.h>
> #include <linux/pagemap.h>
>
> +unsigned long vm_max_readahead_kb = INITIAL_VM_MAX_READAHEAD_KB;
> +
> +int sysctl_vm_max_readahead_kb_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos)
> +{
> + proc_doulongvec_minmax(table, write, buffer, length, ppos);
> +
> + default_backing_dev_info.ra_pages =
> + vm_max_readahead_kb >> (PAGE_CACHE_SHIFT - 10);
> +
> + return 0;
> +}
> +
> /*
> * Initialise a struct file's readahead state. Assumes that the caller has
> * memset *ra to zero.
> Index: linux-2.6/kernel/sysctl.c
> ===================================================================
> --- linux-2.6.orig/kernel/sysctl.c
> +++ linux-2.6/kernel/sysctl.c
> @@ -1273,7 +1273,13 @@ static struct ctl_table vm_table[] = {
> .extra2 = &one,
> },
> #endif
> -
> + {
> + .procname = "max_readahead_kb",
> + .data = &vm_max_readahead_kb,
> + .maxlen = sizeof(vm_max_readahead_kb),
> + .mode = 0644,
> + .proc_handler = sysctl_vm_max_readahead_kb_handler,
> + },
> { }
> };
>
> Index: linux-2.6/Documentation/sysctl/vm.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/sysctl/vm.txt
> +++ linux-2.6/Documentation/sysctl/vm.txt
> @@ -31,6 +31,7 @@ Currently, these files are in /proc/sys/
> - laptop_mode
> - legacy_va_layout
> - lowmem_reserve_ratio
> +- max_readahead_kb
> - max_map_count
> - memory_failure_early_kill
> - memory_failure_recovery
> @@ -263,6 +264,18 @@ The minimum value is 1 (1/1 -> 100%).
>
> ==============================================================
>
> +max_readahead_kb:
> +
> +This file contains the default maximum readahead that would be
> +used, when new disks would be added to the system.
> +
> +Also filesystems which use default_backing_dev_info would also
> +use this new value, even if they were already mounted.
> +
> +xfs, btrfs, nilfs, raw, mtd use the default_backing_dev_info.
> +
> +==============================================================
> +
> max_map_count:
>
> This file contains the maximum number of memory map areas a process
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists