[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4761D0E9.4010701@rtr.ca>
Date: Thu, 13 Dec 2007 19:40:09 -0500
From: Mark Lord <liml@....ca>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: James Bottomley <James.Bottomley@...senPartnership.com>,
jens.axboe@...cle.com, lkml@....ca, matthew@....cx,
linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-scsi@...r.kernel.org, linux-mm@...ck.org, mel@....ul.ie
Subject: [PATCH] fix page_alloc for larger I/O segments
Mark Lord wrote:
> Mark Lord wrote:
>> Mark Lord wrote:
>>> Mark Lord wrote:
>>>> Andrew Morton wrote:
>>>>> On Thu, 13 Dec 2007 17:15:06 -0500
>>>>> James Bottomley <James.Bottomley@...senPartnership.com> wrote:
>>>>>
>>>>>> On Thu, 2007-12-13 at 14:02 -0800, Andrew Morton wrote:
>>>>>>> On Thu, 13 Dec 2007 21:09:59 +0100
>>>>>>> Jens Axboe <jens.axboe@...cle.com> wrote:
>>>>>>>
>>>>>>>> OK, it's a vm issue,
>>>>>>> cc linux-mm and probable culprit.
>>>>>>>
>>>>>>>> I have tens of thousand "backward" pages after a
>>>>>>>> boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
>>>>>>>> reverse. So it looks like that bug got reintroduced.
>>>>>>> Bill Irwin fixed this a couple of years back: changed the page
>>>>>>> allocator so
>>>>>>> that it mostly hands out pages in ascending physical-address order.
>>>>>>>
>>>>>>> I guess we broke that, quite possibly in Mel's page allocator
>>>>>>> rework.
>>>>>>>
>>>>>>> It would help if you could provide us with a simple recipe for
>>>>>>> demonstrating this problem, please.
>>>>>> The simple way seems to be to malloc a large area, touch every
>>>>>> page and
>>>>>> then look at the physical pages assigned ... they now mostly seem
>>>>>> to be
>>>>>> descending in physical address.
>>>>>>
>>>>>
>>>>> OIC. -mm's /proc/pid/pagemap can be used to get the pfn's...
>>>> ..
>>>>
>>>> I'm actually running the treadmill right now (have been for many
>>>> hours, actually,
>>>> to bisect it to a specific commit.
>>>>
>>>> Thought I was almost done, and then noticed that git-bisect doesn't
>>>> keep
>>>> the Makefile VERSION lines the same, so I was actually running the
>>>> wrong
>>>> kernel after the first few times.. duh.
>>>>
>>>> Wrote a script to fix it now.
>>> ..
>>>
>>> Well, that was a waste of three hours.
>> ..
>>
>> Ahh.. it seems to be sensitive to one/both of these:
>>
>> CONFIG_HIGHMEM64G=y with 4GB RAM: not so bad, frequently does 20KB -
>> 48KB segments.
>> CONFIG_HIGHMEM4G=y with 2GB RAM: very severe, rarely does more than
>> 8KB segments.
>> CONFIG_HIGHMEM4G=y with 3GB RAM: very severe, rarely does more than
>> 8KB segments.
>>
>> So if you want to reproduce this on a large memory machine, use
>> "mem=2GB" for starters.
> ..
>
> Here's the commit that causes the regression:
>
> 535131e6925b4a95f321148ad7293f496e0e58d7 Choose pages from the per-cpu
> list based on migration type
>
And here is a patch that seems to fix it for me here:
* * * *
Fix page allocator to give better change of larger contiguous segments (again).
Signed-off-by: Mark Lord <mlord@...ox.com
---
--- old/mm/page_alloc.c.orig 2007-12-13 19:25:15.000000000 -0500
+++ linux-2.6/mm/page_alloc.c 2007-12-13 19:35:50.000000000 -0500
@@ -954,7 +954,7 @@
goto failed;
}
/* Find a page of the appropriate migrate type */
- list_for_each_entry(page, &pcp->list, lru) {
+ list_for_each_entry_reverse(page, &pcp->list, lru) {
if (page_private(page) == migratetype) {
list_del(&page->lru);
pcp->count--;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists