[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20140515142414.16c47315a03160c58ceb9066@linux-foundation.org>
Date: Thu, 15 May 2014 14:24:14 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Mel Gorman <mgorman@...e.de>
Cc: Peter Zijlstra <peterz@...radead.org>,
Oleg Nesterov <oleg@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Vlastimil Babka <vbabka@...e.cz>, Jan Kara <jack@...e.cz>,
Michal Hocko <mhocko@...e.cz>, Hugh Dickins <hughd@...gle.com>,
Dave Hansen <dave.hansen@...el.com>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>,
Linux-FSDevel <linux-fsdevel@...r.kernel.org>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
David Howells <dhowells@...hat.com>
Subject: Re: [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue
lookups in unlock_page fastpath v4
On Thu, 15 May 2014 11:48:09 +0100 Mel Gorman <mgorman@...e.de> wrote:
> Changelog since v3
> o Correct handling of exclusive waits
>
> This patch introduces a new page flag for 64-bit capable machines,
> PG_waiters, to signal there are processes waiting on PG_lock and uses it to
> avoid memory barriers and waitqueue hash lookup in the unlock_page fastpath.
>
> This adds a few branches to the fast path but avoids bouncing a dirty
> cache line between CPUs. 32-bit machines always take the slow path but the
> primary motivation for this patch is large machines so I do not think that
> is a concern.
>
> The test case used to evaulate this is a simple dd of a large file done
> multiple times with the file deleted on each iterations. The size of
> the file is 1/10th physical memory to avoid dirty page balancing. In the
> async case it will be possible that the workload completes without even
> hitting the disk and will have variable results but highlight the impact
> of mark_page_accessed for async IO. The sync results are expected to be
> more stable. The exception is tmpfs where the normal case is for the "IO"
> to not hit the disk.
>
> The test machine was single socket and UMA to avoid any scheduling or
> NUMA artifacts. Throughput and wall times are presented for sync IO, only
> wall times are shown for async as the granularity reported by dd and the
> variability is unsuitable for comparison. As async results were variable
> do to writback timings, I'm only reporting the maximum figures. The sync
> results were stable enough to make the mean and stddev uninteresting.
>
> The performance results are reported based on a run with no profiling.
> Profile data is based on a separate run with oprofile running. The
> kernels being compared are "accessed-v2" which is the patch series up
> to this patch where as lockpage-v2 includes this patch.
>
> async dd
> 3.15.0-rc3 3.15.0-rc3
> accessed-v3 lockpage-v3
> ext3 Max elapsed 11.5900 ( 0.00%) 11.0000 ( 5.09%)
> ext4 Max elapsed 13.3400 ( 0.00%) 13.4300 ( -0.67%)
> tmpfs Max elapsed 0.4900 ( 0.00%) 0.4800 ( 2.04%)
> btrfs Max elapsed 12.7800 ( 0.00%) 13.8200 ( -8.14%)
> xfs Max elapsed 2.0900 ( 0.00%) 2.1100 ( -0.96%)
So ext3 got 5% faster and btrfs got 8% slower?
>
> ...
>
The numbers look pretty marginal from here and the patch is, umm, not a
thing of beauty or simplicity.
I'd be inclined to go find something else to work on, frankly.
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -241,15 +241,22 @@ void delete_from_page_cache(struct page *page)
> }
> EXPORT_SYMBOL(delete_from_page_cache);
>
> -static int sleep_on_page(void *word)
> +static int sleep_on_page(struct page *page)
> {
> - io_schedule();
> + /*
> + * A racing unlock can miss that the waitqueue is active and clear the
> + * waiters again. Only sleep if PageWaiters is still set and timeout
> + * to recheck as races can still occur.
> + */
> + if (PageWaiters(page))
> + io_schedule_timeout(HZ);
ew.
> return 0;
> }
>
> ...
>
> +/* Returns true if the page is locked */
Comment is inaccurate.
> +static inline bool prepare_wait_bit(struct page *page, wait_queue_head_t *wqh,
> + wait_queue_t *wq, int state, int bit_nr, bool exclusive)
> +{
> +
> + /* Set PG_waiters so a racing unlock_page will check the waitiqueue */
> + if (!PageWaiters(page))
> + SetPageWaiters(page);
> +
> + if (exclusive)
> + prepare_to_wait_exclusive(wqh, wq, state);
> + else
> + prepare_to_wait(wqh, wq, state);
> + return test_bit(bit_nr, &page->flags);
> }
>
> ...
>
> int wait_on_page_bit_killable(struct page *page, int bit_nr)
> {
> + wait_queue_head_t *wqh;
> DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
> + int ret = 0;
>
> if (!test_bit(bit_nr, &page->flags))
> return 0;
> + wqh = page_waitqueue(page);
> +
> + do {
> + if (prepare_wait_bit(page, wqh, &wait.wait, TASK_KILLABLE, bit_nr, false))
> + ret = sleep_on_page_killable(page);
> + } while (!ret && test_bit(bit_nr, &page->flags));
> + finish_wait(wqh, &wait.wait);
>
> - return __wait_on_bit(page_waitqueue(page), &wait,
> - sleep_on_page_killable, TASK_KILLABLE);
> + return ret;
> }
Please find a way to test all this nicely when there are signals pending?
> /**
>
> ...
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists