[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100212100519.GA29085@laptop>
Date: Fri, 12 Feb 2010 21:05:19 +1100
From: Nick Piggin <npiggin@...e.de>
To: Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com>
Cc: Mel Gorman <mel@....ul.ie>,
Andrew Morton <akpm@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
epasch@...ibm.com, SCHILLIG@...ibm.com,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Heiko Carstens <heiko.carstens@...ibm.com>,
christof.schmitt@...ibm.com, thoss@...ibm.com, hare@...e.de,
gregkh@...ell.com
Subject: Re: Performance regression in scsi sequential throughput (iozone)
due to "e084b - page-allocator: preserve PFN ordering when __GFP_COLD
is set"
On Thu, Feb 11, 2010 at 05:11:24PM +0100, Christian Ehrhardt wrote:
> > 2. Test with the patch below rmqueue_bulk-fewer-parameters to see if the
> > number of parameters being passed is making some unexpected large
> > difference
>
> BINGO - this definitely hit something.
> This experimental patch does two things - on one hand it closes the race we had:
>
> 4 THREAD READ 8 THREAD READ 16 THREAD READ %ofcalls
> perf_count_congestion_wait 13 27 52
> perf_count_call_congestion_wait_from_alloc_pages_high_priority 0 0 0
> perf_count_call_congestion_wait_from_alloc_pages_slowpath 13 27 52 99.52%
> perf_count_pages_direct_reclaim 30867 56811 131552
> perf_count_failed_pages_direct_reclaim 14 24 52
> perf_count_failed_pages_direct_reclaim_but_progress 14 21 52 0.04% !!!
>
> On the other hand we see that the number of direct_reclaim calls increased by ~x4.
>
> I assume (might be totally wrong) that the x4 increase of direct_reclaim calls could be caused by the fact that before we used higher orders which worked on x4 number of pages at once.
But the order parameter was always passed as constant 0 by the caller?
> This leaves me with two ideas what the real issue could be:
> 1. either really the 6th parameter as this is the first one that needs to go on stack and that way might open a race and rob gcc a big pile of optimization chances
It must be something to do wih code generation AFAIKS. I'm surprised
the function isn't inlined, giving exactly the same result regardless
of the patch.
Unlikely to be a correctness issue with code generation, but I'm
really surprised that a small difference in performance could have
such a big (and apparently repeatable) effect on behaviour like this.
What's the assembly look like?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists