[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bAYUFBBGo-LHBK4UWRK1tpx3AZ4Z9NkDxiDK0UYEDozaQ@mail.gmail.com>
Date: Wed, 31 Jul 2019 12:40:51 -0400
From: Pavel Tatashin <pasha.tatashin@...een.com>
To: Mark Rutland <mark.rutland@....com>
Cc: James Morris <jmorris@...ei.org>, Sasha Levin <sashal@...nel.org>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
kexec mailing list <kexec@...ts.infradead.org>,
LKML <linux-kernel@...r.kernel.org>,
Jonathan Corbet <corbet@....net>,
Catalin Marinas <catalin.marinas@....com>, will@...nel.org,
Linux Doc Mailing List <linux-doc@...r.kernel.org>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
Marc Zyngier <marc.zyngier@....com>,
James Morse <james.morse@....com>,
Vladimir Murzin <vladimir.murzin@....com>,
Matthias Brugger <matthias.bgg@...il.com>,
Bhupesh Sharma <bhsharma@...hat.com>
Subject: Re: [RFC v2 0/8] arm64: MMU enabled kexec relocation
On Wed, Jul 31, 2019 at 12:33 PM Mark Rutland <mark.rutland@....com> wrote:
>
> Hi Pavel,
>
> Generally, the cover letter should state up-front what the goal is (or
> what problem you're trying to solve). It would be really helpful to have
> that so that we understand what you're trying to achieve, and why.
>
> Messing with the MMU is often fraught with danger (and very painful to
> debug, as you are now aware), and so far we've tried to minimize the
> number of places where we have to do so.
Hi Mark,
I understand, this is why I first went another route of solving this
problem: pre-reserving contiguous memory, and avoid relocation
entirely (the same as what happens during crash reboot). But, that
solution was not accepted because it introduces a change to the common
code to solve ARM specific problem. So, James Morse, and other
suggested that I take a look at the root of the problem, and enable
MMU during relocation by doing what is already done during hibernate
restore.
>
> On Wed, Jul 31, 2019 at 11:38:49AM -0400, Pavel Tatashin wrote:
> > Changelog from previous RFC:
> > - Added trans_table support for both hibernate and kexec.
> > - Fixed performance issue, where enabling MMU did not yield the
> > actual performance improvement.
> >
> > Bug:
> > With the current state, this patch series works on kernels booted with EL1
> > mode, but for some reason, when elevated to EL2 mode reboot freezes in
> > both QEMU and on real hardware.
> >
> > The freeze happens in:
> >
> > arch/arm64/kernel/relocate_kernel.S
> > turn_on_mmu()
> >
> > Right after sctlr_el2 is written (MMU on EL2 is enabled)
> >
> > msr sctlr_el2, \tmp1
> >
> > I've been studying all the relevant control registers for EL2, but do not
> > see what might be causing this hang:
> >
> > MAIR_EL2 is set to be exactly the same as MAIR_EL1 0xbbff440c0400
> >
> > TCR_EL2 0x80843510
> > Enabled bits:
> > PS Physical Address Size. (0b100 44 bits, 16TB.)
> > SH0 Shareability 11 Inner Shareable
> > ORGN0 Normal memory, Outer Write-Back Read-Allocate Write-Allocate Cach.
> > IRGN0 Normal memory, Inner Write-Back Read-Allocate Write-Allocate Cach.
> > T0SZ 01 0000
> >
> > SCTLR_EL2 0x30e5183f
> > RES1 : Reserve ones
> > M : MMU enabled
> > A : Align check
> > C : Cacheability control
> > SA : SP Alignment check enable
> > IESB : Implicit Error Synchronization event
> > I : Instruction access Cacheability
> >
> > TTBR0_EL2 0x1b3069000 (address of trans_table)
> >
> > Any suggestion of what else might be missing that causes this freeze when
> > MMU is enabled in EL2?
> >
> > =====
>
> > Here is the current data from the real hardware:
> > (because of bug, I forced EL1 mode by setting el2_switch always to zero in
> > cpu_soft_restart()):
> >
> > For this experiment, the size of kernel plus initramfs is 25M. If initramfs
> > was larger, than the improvements would be even greater, as time spent in
> > relocation is proportional to the size of relocation.
> >
> > Previously:
> > kernel shutdown 0.022131328s
> > relocation 0.440510736s
> > kernel startup 0.294706768s
>
> In total this takes ~0.76s...
>
> >
> > Relocation was taking: 58.2% of reboot time
> >
> > Now:
> > kernel shutdown 0.032066576s
> > relocation 0.022158152s
> > kernel startup 0.296055880s
>
> ... and this takes ~0.35s
>
> So do we really need this complexity for a few blinks of an eye?
Yes, we have an extremely tight reboot budget, 0.35s is not an acceptable waste.
>
> Thanks,
> Mark.
Powered by blists - more mailing lists