[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1103161442210.3382@kaball-desktop>
Date: Wed, 16 Mar 2011 14:43:42 +0000
From: Stefano Stabellini <stefano.stabellini@...citrix.com>
To: Stefano Stabellini <Stefano.Stabellini@...citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
"H. Peter Anvin" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Jeremy Fitzhardinge <jeremy@...p.org>,
Yinghai Lu <yinghai@...nel.org>,
"xen-devel@...ts.xensource.com" <xen-devel@...ts.xensource.com>
Subject: Re: [GIT PULL tip/x86/mm] xen/x86 fixes
actually attach the logs :)
On Wed, 16 Mar 2011, Stefano Stabellini wrote:
> On Fri, 11 Mar 2011, Konrad Rzeszutek Wilk wrote:
> > On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
> > > Hello,
> > > recently we had a couple of long discussions with Yinghai about boot
> > > crashes on xen, related to pagetable initialization.
> > > As a result we came up with three patches, two of them fix the first [1]
> > > boot crash and provide a nice cleanup on native:
> >
> > I don't know why this is happening now, but it could be very well
> > related to the build config. Smaller builds don't seem to encounter this, while
> > this is a distro type build. If I use:
> >
> > > Stefano Stabellini (1):
> > > xen: set max_pfn_mapped to the last pfn mapped
> >
> > it hangs during bootup. The machine hangs during the box (no keyboard interaction)
> > and I can see this in the bootup.
>
> Konrad sent me few other logs offline: log1 is the log of the hang and
> log2 is a successful boot (reverting the problematic patch).
> It looks like the SP5100 TCO WatchDog Timer Driver is using ioremap on
> an address (0xb8fe00) that belongs to the memory range used for the
> pagetable (0x9fc000-0xf43fff).
> In the succesful case max_pfn_mapped is higher so the pagetable is
> located at an higher address (0x16dfb000-0x17342fff) so the problem
> doesn't occur.
>
> I still have few unaswered questions on this issue: if we assume that
> the ioremap address is the same in the two cases (0xb8fe00), how is it
> possible that in the first case it is ram (page_is_ram returns true)
> while in the second case it is not (otherwise we would still get a
> warning from ioremap): page_is_ram shouldn't be affected by the position
> of the kernel pagetable, and the e820 is still the same.
> In any case if 0xb8fe00 is really an MMIO address memblock_find_in_range
> shouldn't have returned the range (0x9fc000-0xf43fff) in
> find_early_table_space.
> I think that lowering the value of max_pfn_mapped is likely to cause
> bugs like this one, where a low memory range is not properly marked as
> reserved and gets mistakenly used for the pagetable.
>
> Considering that meanwhile Linux 2.6.38 was released with this bug, I
> think is better if we change approach and fix the regression in a more
> straightforward way, like for example:
>
> - 2M align _end;
> - do not clean initial mapping between _brk_end to _end;
> - resurrect the patch "respect memblock reserved regions when
> destroying mappings", trying to minimize the number of memblock reserved
> checks.
>
> Opinions?
>
>
>
> Regarding the other commit "x86-64, mm: Put early page table high" that
> causes a reliable crash on Xen: I noticed that Ingo sent a pull request
> to Linus with this commit included.
> At this point I can send the patch to fix the Xen issue to Linus
> directly, no need to rebased the patch on tip?
>
View attachment "log1" of type "text/plain" (91156 bytes)
View attachment "log2" of type "text/plain" (83124 bytes)
Powered by blists - more mailing lists