[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200804023548.GA186735@KEI>
Date: Tue, 4 Aug 2020 11:35:48 +0900
From: Cho KyongHo <pullip.cho@...sung.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: David Hildenbrand <david@...hat.com>, akpm@...ux-foundation.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
hyesoo.yu@...sung.com, janghyuck.kim@...sung.com
Subject: Re: [PATCH] mm: sort freelist by rank number
On Mon, Aug 03, 2020 at 05:45:55PM +0200, Vlastimil Babka wrote:
> On 8/3/20 9:57 AM, David Hildenbrand wrote:
> > On 03.08.20 08:10, pullip.cho@...sung.com wrote:
> >> From: Cho KyongHo <pullip.cho@...sung.com>
> >>
> >> LPDDR5 introduces rank switch delay. If three successive DRAM accesses
> >> happens and the first and the second ones access one rank and the last
> >> access happens on the other rank, the latency of the last access will
> >> be longer than the second one.
> >> To address this panelty, we can sort the freelist so that a specific
> >> rank is allocated prior to another rank. We expect the page allocator
> >> can allocate the pages from the same rank successively with this
> >> change. It will hopefully improves the proportion of the consecutive
> >> memory accesses to the same rank.
> >
> > This certainly needs performance numbers to justify ... and I am sorry,
> > "hopefully improves" is not a valid justification :)
> >
> > I can imagine that this works well initially, when there hasn't been a
> > lot of memory fragmentation going on. But quickly after your system is
> > under stress, I doubt this will be very useful. Proof me wrong. ;)
>
> Agreed. The implementation of __preferred_rank() seems to be very simple and
> optimistic.
DRAM rank is selected by CS bits from DRAM controllers. In the most systems
CS bits are alloated to specific bit fields in BUS address. For example,
If CS bit is allocated to bit[16] in bus (physical) address in two rank
system, all 16KiB with bit[16] = 1 are in the rank 1 and the others are
in the rank 0.
This patch is not beneficial to other system than the mobile devices
with LPDDR5. That is why the default behavior of this patch is noop.
> I think these systems could perhaps better behave as NUMA with (interleaved)
> nodes for each rank, then you immediately have all the mempolicies support etc
> to achieve what you need? Of course there's some cost as well, but not the costs
> of adding hacks to page allocator core?
Thank you for the proposal. NUMA will be helpful to allocate pages from
a specific rank programmatically. I should consider NUMA if rank
affinity should be also required.
However, page allocation overhead by this policy (page migration and
reclamation ect.) will give the users worse responsiveness. The intend
of this patch is to reduce rank switch delay optimistically without
hurting page allocation speed.
Powered by blists - more mailing lists