[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080813164121.GA5985@alberich.amd.com>
Date: Wed, 13 Aug 2008 18:41:21 +0200
From: Andreas Herrmann <andreas.herrmann3@....com>
To: Johannes Weiner <hannes@...urebad.de>
CC: Ingo Molnar <mingo@...e.hu>, Nick Piggin <npiggin@...e.de>,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] alloc_bootmem_core: fix misaligned allocation of 1G
page
On Tue, Aug 12, 2008 at 06:58:55PM +0200, Johannes Weiner wrote:
> Andreas Herrmann <andreas.herrmann3@....com> writes:
> > The current code in alloc_bootmem_core is based on changes introduced
> > with commit 5f2809e69c7128f86316048221cf45146f69a4a0 (bootmem: clean
> > up alloc_bootmem_core). But I didn't check whether this commit
> > introduced the problem.
>
> It did, there were workarounds for the same problem in the earlier code,
> I missed it.
>
> The misalignment stems from the fact that the alignment requirement is
> wider than the offset-pfn and the starting pfn of the node is not
> aligned itself, correct?
Yes.
> I think, the cleaner fix would be to work with an aligned base pfn to
> begin with, like the following, untested. What do you think?
This won't (completely) work.
Every time you compute the new alignment for sidx the starting point
(node_min_pfn) must be factored in.
Otherwise the function can't allocate the first possible page. For
example, assuming that
node_min_pfn = 130000
align = 0x40000000 (1GByte)
allocating a 1G page on this node will result in
sidx=0x40000
min_pfn=0x140000
Both are properly aligned. But the resulting super-page will be at
address 0x180000000 whereas the first possible 1G page would be at
address 0x140000000.
> diff --git a/mm/bootmem.c b/mm/bootmem.c
> index 4af15d0..bee4dfe 100644
> --- a/mm/bootmem.c
> +++ b/mm/bootmem.c
...
> @@ -492,8 +493,7 @@ find_block:
> PFN_UP(end_off), BOOTMEM_EXCLUSIVE))
> BUG();
>
> - region = phys_to_virt(PFN_PHYS(bdata->node_min_pfn) +
> - start_off);
> + region = phys_to_virt(PFN_PHYS(min_pfn) + start_off);
> memset(region, 0, size);
> return region;
Oops ...
the returned region doesn't match the reserved one as it still gets
reserved with
if (__reserve(bdata, PFN_DOWN(start_off) + merge,
PFN_UP(end_off), BOOTMEM_EXCLUSIVE))
where __reserve() will use bdata->node_min_pfn and not the properly
aligned min_pfn value. Either you have to pass the new min_pfn
value to __reserve() or you have to adapt start_off with another
offset = min_pfn - bdata->node_min_pfn ...
I thought about other solutions like introducing a "base_offset" --
the value needed to align node_min_pfn. But this value must be used
in many places to correctly compute/align sidx etc. and it doesn't
make the code better readable.
Hence I still prefer the patch posted yesterday. I just want to clean
it up somewhat. See attached patch.
Regards,
Andreas
--
alloc_bootmem_core: minor cleanup, use min instead of bdata->node_min_pfn
Signed-off-by: Andreas Herrmann <andreas.herrmann3@....com>
---
mm/bootmem.c | 20 ++++++++------------
1 files changed, 8 insertions(+), 12 deletions(-)
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 9d54244..11ece4b 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -459,9 +459,8 @@ static void * __init alloc_bootmem_core(struct bootmem_data *bdata,
unsigned long eidx, i, start_off, end_off;
find_block:
sidx = find_next_zero_bit(bdata->node_bootmem_map,
- midx - bdata->node_min_pfn,
- sidx - bdata->node_min_pfn);
- sidx += bdata->node_min_pfn;
+ midx - min, sidx - min);
+ sidx += min;
sidx = ALIGN(sidx, step);
eidx = sidx + PFN_UP(size);
@@ -469,8 +468,7 @@ find_block:
break;
for (i = sidx; i < eidx; i++)
- if (test_bit(i - bdata->node_min_pfn,
- bdata->node_bootmem_map)) {
+ if (test_bit(i - min, bdata->node_bootmem_map)) {
sidx = ALIGN(i, step);
if (sidx == i)
sidx += step;
@@ -478,17 +476,16 @@ find_block:
}
if (bdata->last_end_off &&
- (PFN_DOWN(bdata->last_end_off) + 1) ==
- (sidx - bdata->node_min_pfn))
+ (PFN_DOWN(bdata->last_end_off) + 1) == (sidx - min))
start_off = ALIGN(bdata->last_end_off, align);
else
- start_off = PFN_PHYS(sidx - bdata->node_min_pfn);
+ start_off = PFN_PHYS(sidx - min);
- merge = PFN_DOWN(start_off) < (sidx - bdata->node_min_pfn);
+ merge = PFN_DOWN(start_off) < (sidx - min);
end_off = start_off + size;
bdata->last_end_off = end_off;
- bdata->hint_idx = PFN_UP(end_off + bdata->node_min_pfn);
+ bdata->hint_idx = PFN_UP(end_off + min);
/*
* Reserve the area now:
@@ -497,8 +494,7 @@ find_block:
PFN_UP(end_off), BOOTMEM_EXCLUSIVE))
BUG();
- region = phys_to_virt(PFN_PHYS(bdata->node_min_pfn) +
- start_off);
+ region = phys_to_virt(PFN_PHYS(min) + start_off);
memset(region, 0, size);
return region;
}
--
1.5.6.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists