[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEH94LgeOssJ++tLZzo6ZAUmjFUc1BWjLuszmAeSz2YbxmU3SA@mail.gmail.com>
Date: Mon, 5 Nov 2012 20:20:27 +0800
From: Zhi Yong Wu <zwu.kernel@...il.com>
To: Steven Whitehouse <swhiteho@...hat.com>
Cc: linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linuxram@...ux.vnet.ibm.com, viro@...iv.linux.org.uk,
david@...morbit.com, tytso@....edu, cmm@...ibm.com,
wuzhy@...ux.vnet.ibm.com, wenqing.lz@...bao.com
Subject: Re: [RFC v4+ hot_track 09/19] vfs: add one work queue
On Mon, Nov 5, 2012 at 8:07 PM, Steven Whitehouse <swhiteho@...hat.com> wrote:
> Hi,
>
> On Mon, 2012-11-05 at 19:55 +0800, Zhi Yong Wu wrote:
>> On Mon, Nov 5, 2012 at 7:21 PM, Steven Whitehouse <swhiteho@...hat.com> wrote:
>> > Hi,
>> >
>> > On Mon, 2012-10-29 at 12:30 +0800, zwu.kernel@...il.com wrote:
>> >> From: Zhi Yong Wu <wuzhy@...ux.vnet.ibm.com>
>> >>
>> >> Add a per-superblock workqueue and a delayed_work
>> >> to run periodic work to update map info on each superblock.
>> >>
>> >> Signed-off-by: Zhi Yong Wu <wuzhy@...ux.vnet.ibm.com>
>> >> ---
>> >> fs/hot_tracking.c | 85 ++++++++++++++++++++++++++++++++++++++++++
>> >> fs/hot_tracking.h | 3 +
>> >> include/linux/hot_tracking.h | 3 +
>> >> 3 files changed, 91 insertions(+), 0 deletions(-)
>> >>
>> >> diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
>> >> index fff0038..0ef9cad 100644
>> >> --- a/fs/hot_tracking.c
>> >> +++ b/fs/hot_tracking.c
>> >> @@ -15,9 +15,12 @@
>> >> #include <linux/module.h>
>> >> #include <linux/spinlock.h>
>> >> #include <linux/hardirq.h>
>> >> +#include <linux/kthread.h>
>> >> +#include <linux/freezer.h>
>> >> #include <linux/fs.h>
>> >> #include <linux/blkdev.h>
>> >> #include <linux/types.h>
>> >> +#include <linux/list_sort.h>
>> >> #include <linux/limits.h>
>> >> #include "hot_tracking.h"
>> >>
>> >> @@ -557,6 +560,67 @@ static void hot_map_array_exit(struct hot_info *root)
>> >> }
>> >> }
>> >>
>> >> +/* Temperature compare function*/
>> >> +static int hot_temp_cmp(void *priv, struct list_head *a,
>> >> + struct list_head *b)
>> >> +{
>> >> + struct hot_comm_item *ap =
>> >> + container_of(a, struct hot_comm_item, n_list);
>> >> + struct hot_comm_item *bp =
>> >> + container_of(b, struct hot_comm_item, n_list);
>> >> +
>> >> + int diff = ap->hot_freq_data.last_temp
>> >> + - bp->hot_freq_data.last_temp;
>> >> + if (diff > 0)
>> >> + return -1;
>> >> + if (diff < 0)
>> >> + return 1;
>> >> + return 0;
>> >> +}
>> >> +
>> >> +/*
>> >> + * Every sync period we update temperatures for
>> >> + * each hot inode item and hot range item for aging
>> >> + * purposes.
>> >> + */
>> >> +static void hot_update_worker(struct work_struct *work)
>> >> +{
>> >> + struct hot_info *root = container_of(to_delayed_work(work),
>> >> + struct hot_info, update_work);
>> >> + struct hot_inode_item *hi_nodes[8];
>> >> + u64 ino = 0;
>> >> + int i, n;
>> >> +
>> >> + while (1) {
>> >> + n = radix_tree_gang_lookup(&root->hot_inode_tree,
>> >> + (void **)hi_nodes, ino,
>> >> + ARRAY_SIZE(hi_nodes));
>> >> + if (!n)
>> >> + break;
>> >> +
>> >> + ino = hi_nodes[n - 1]->i_ino + 1;
>> >> + for (i = 0; i < n; i++) {
>> >> + kref_get(&hi_nodes[i]->hot_inode.refs);
>> >> + hot_map_array_update(
>> >> + &hi_nodes[i]->hot_inode.hot_freq_data, root);
>> >> + hot_range_update(hi_nodes[i], root);
>> >> + hot_inode_item_put(hi_nodes[i]);
>> >> + }
>> >> + }
>> >> +
>> >> + /* Sort temperature map info */
>> >> + for (i = 0; i < HEAT_MAP_SIZE; i++) {
>> >> + list_sort(NULL, &root->heat_inode_map[i].node_list,
>> >> + hot_temp_cmp);
>> >> + list_sort(NULL, &root->heat_range_map[i].node_list,
>> >> + hot_temp_cmp);
>> >> + }
>> >> +
>> >
>> > If this list can potentially have one (or more) entries per inode, then
>> Only one hot_inode_item per inode, while maybe multiple
>> hot_range_items per inode.
>> > filesystems with a lot of inodes (millions) may potentially exceed the
>> > max size of list which list_sort() can handle. If that happens it still
>> > works, but you'll get a warning message and it won't be as efficient.
>> I haven't do so large scale test. If we want to find that issue, we
>> need to do large scale performance test, before that, i want to make
>> sure the code change is correct at first.
>> To be honest, for that issue you pointed to, i also have such
>> concern.But list_sort() performance looks good from the test result of
>> the following URL:
>> https://lkml.org/lkml/2010/1/20/485
>>
> Yes, I think it is good. Also, even when it says that it's performance
> is poor (via the warning message) it is still much better than the
> alternative (of not sorting) in the GFS2 case. So currently our
> workaround is to ignore the warning. Due to what we using it for
> (sorting the data blocks for ordered writeback) we only see it very
> occasionally when there has been lots of data write activity with little
> journal activity on a node with lots of RAM.
OK.
>
>> >
>> > It is something that we've run into with list_sort() and GFS2, but it
>> > only happens very rarely,
>> Beside list_sort(), do you have any other way to share? For this
>> concern, how does GFS2 resolve it?
>>
> That is an ongoing investigation :-)
>
> I've pondered various options... increase temp variable space in
> list_sort(), not using list_sort() and insertion sorting the blocks
> instead, flushing the ordered write data early if the list gets too
> long, figuring out how to remove blocks written back by the VM from the
> list before the sort, and various other possible solutions. So far I'm
> not sure which will be the best to choose, and since your situation is a
> bit different it might not make sense to use the same solution.
>
> I just thought it was worth mentioning though since it was something
> that we'd run across,
thanks for your experience share. anyway, thanks.
By the way, it will be appreciated if you can comment on other patches.
>
> Steve.
>
>
--
Regards,
Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists