[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AE9C7BF.3060509@sgi.com>
Date: Thu, 29 Oct 2009 09:50:07 -0700
From: Mike Travis <travis@....com>
To: Andrea Arcangeli <aarcange@...hat.com>
CC: Ingo Molnar <mingo@...e.hu>, Andi Kleen <andi@...stfloor.org>,
linux-mm@...ck.org, Marcelo Tosatti <mtosatti@...hat.com>,
Adam Litke <agl@...ibm.com>, Avi Kivity <avi@...hat.com>,
Izik Eidus <ieidus@...hat.com>,
Hugh Dickins <hugh.dickins@...cali.co.uk>,
Nick Piggin <npiggin@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Karl Feind <kaf@....com>,
Jack Steiner <steiner@....com>
Subject: Re: RFC: Transparent Hugepage support
Hi Andrea,
I will find some time soon to test out your patch on a
(relatively) huge machine and let you know the results.
The memory size on this machine:
480,700,399,616 bytes of system memory tested OK
This translates to ~240k available 2Mb pages.
Thanks,
Mike
Andrea Arcangeli wrote:
> Hello Ingo, Andi, everyone,
>
> On Thu, Oct 29, 2009 at 10:43:44AM +0100, Ingo Molnar wrote:
>> * Andi Kleen <andi@...stfloor.org> wrote:
>>
>>>> 1GB pages can't be handled by this code, and clearly it's not
>>>> practical to hope 1G pages to materialize in the buddy (even if we
>>> That seems short sightened. You do this because 2MB pages give you x%
>>> performance advantage, but then it's likely that 1GB pages will give
>>> another y% improvement and why should people stop at the smaller
>>> improvement?
>>>
>>> Ignoring the gigantic pages now would just mean that this would need
>>> to be revised later again or that users still need to use hacks like
>>> libhugetlbfs.
>> I've read the patch and have read through this discussion and you are
>> missing the big point that it's best to do such things gradually - one
>> step at a time.
>>
>> Just like we went from 2 level pagetables to 3 level pagetables, then to
>> 4 level pagetables - and we might go to 5 level pagetables in the
>> future. We didnt go from 2 level pagetables to 5 level page tables in
>> one go, despite predictions clearly pointing out the exponentially
>> increasing need for RAM.
>
> I totally agree with your assessment.
>
>> So your obsession with 1GB pages is misguided. If indeed transparent
>> largepages give us real benefits we can extend it to do transparent
>> gbpages as well - should we ever want to. There's nothing 'shortsighted'
>> about being gradual - the change is already ambitious enough as-is, and
>> brings very clear benefits to a difficult, decade-old problem no other
>> person was able to address.
>>
>> In fact introducing transparent 2MBpages makes 1GB pages support
>> _easier_ to merge: as at that point we'll already have a (finally..)
>> successful hugetlb facility happility used by an increasing range of
>> applications.
>
> Agreed.
>
>> Hugetlbfs's big problem was always that it wasnt transparent and hence
>> wasnt gradual for applications. It was an opt-in and constituted an
>> interface/ABI change - that is always a big barrier to app adoption.
>>
>> So i give Andrea's patch a very big thumbs up - i hope it gets reviewed
>> in fine detail and added to -mm ASAP. Our lack of decent, automatic
>> hugepage support is sticking out like a sore thumb and is hurting us in
>> high-performance setups. If largepage support within Linux has a chance,
>> this might be the way to do it.
>
> Thanks a lot for your review!
>
>> A small comment regarding the patch itself: i think it could be
>> simplified further by eliminating CONFIG_TRANSPARENT_HUGEPAGE and by
>> making it a natural feature of hugepage support. If the code is correct
>> i cannot see any scenario under which i wouldnt want a hugepage enabled
>> kernel i'm booting to not have transparent hugepage support as well.
>
> The two reasons why I added a config option are:
>
> 1) because it was easy enough, gcc is smart enough to eliminate the
> external calls so I didn't need to add ifdefs with the exception of
> returning 0 from pmd_trans_huge and pmd_trans_frozen. I only had to
> make the exports of huge_memory.c visible unconditionally so it doesn't
> warn, after that I don't need to build and link huge_memory.o.
>
> 2) to avoid breaking build of archs not implementing pmd_trans_huge
> and that may never be able to take advantage of it
>
> But we could move CONFIG_TRANSPARENT_HUGEPAGE to an arch define forced
> to Y on x86-64 and N on power.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists