lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191015021848.GA18043@dhcp-128-65.nay.redhat.com>
Date:   Tue, 15 Oct 2019 10:18:48 +0800
From:   Dave Young <dyoung@...hat.com>
To:     Kairui Song <kasong@...hat.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Thomas Lendacky <Thomas.Lendacky@....com>,
        Baoquan He <bhe@...hat.com>, Lianbo Jiang <lijiang@...hat.com>,
        the arch/x86 maintainers <x86@...nel.org>,
        "kexec@...ts.infradead.org" <kexec@...ts.infradead.org>
Subject: Re: [PATCH v3 2/2] x86/kdump: Reserve extra memory when SME or SEV
 is active

On 10/14/19 at 07:05pm, Dave Young wrote:
> On 10/12/19 at 05:24pm, Kairui Song wrote:
> > On 9/27/19 1:42 PM, Dave Young wrote:
> > > On 09/25/19 at 06:36pm, Kairui Song wrote:
> > > > On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar <mingo@...nel.org> wrote:
> > > > > * Kairui Song <kasong@...hat.com> wrote:
> > > > > 
> > > > > > Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> > > > > > SWIOTLB will be enabled even if there is less than 4G of memory when SME
> > > > > > is active, to support DMA of devices that not support address with the
> > > > > > encrypt bit.
> > > > > > 
> > > > > > And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> > > > > > active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> > > > > > 
> > > > > > Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> > > > > > encryption") will always force SWIOTLB to be enabled when SEV is active
> > > > > > in all cases.
> > > > > > 
> > > > > > Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> > > > > > and this is also true for kdump kernel. As a result kdump kernel will
> > > > > > run out of already scarce pre-reserved memory easily.
> > > > > > 
> > > > > > So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> > > > > > kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> > > > > > is specified or any offset is used. As for the high reservation case, an
> > > > > > extra low memory region will always be reserved and that is enough for
> > > > > > SWIOTLB. Else if the offset format is used, user should be fully aware
> > > > > > of any possible kdump kernel memory requirement and have to organize the
> > > > > > memory usage carefully.
> > > > > > 
> > > > > > Signed-off-by: Kairui Song <kasong@...hat.com>
> > > > > > ---
> > > > > >   arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> > > > > >   1 file changed, 17 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > > index 71f20bb18cb0..ee6a2f1e2226 100644
> > > > > > --- a/arch/x86/kernel/setup.c
> > > > > > +++ b/arch/x86/kernel/setup.c
> > > > > > @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > > > >                                          unsigned long long *crash_size,
> > > > > >                                          bool high)
> > > > > >   {
> > > > > > -     unsigned long long base, size;
> > > > > > +     unsigned long long base, size, mem_enc_req = 0;
> > > > > > 
> > > > > >        base = *crash_base;
> > > > > >        size = *crash_size;
> > > > > > @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > > > >        if (high)
> > > > > >                goto high_reserve;
> > > > > > 
> > > > > > +     /*
> > > > > > +      * When SME/SEV is active and not using high reserve,
> > > > > > +      * it will always required an extra SWIOTLB region.
> > > > > > +      */
> > > > > > +     if (mem_encrypt_active())
> > > > > > +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> > > > > > +
> > > > > >        base = memblock_find_in_range(CRASH_ALIGN,
> > > > > > -                                   CRASH_ADDR_LOW_MAX, size,
> > > > > > +                                   CRASH_ADDR_LOW_MAX,
> > > > > > +                                   size + mem_enc_req,
> > > > > >                                      CRASH_ALIGN);
> > > > > 
> > > > 
> > > > Hi Ingo,
> > > > 
> > > > I re-read my previous reply, it's long and tedious, let me try to make
> > > > a more effective reply:
> > > > 
> > > > > What sizes are we talking about here?
> > > > 
> > > > The size here is how much memory will be reserved for kdump kernel, to
> > > > ensure kdump kernel and userspace can run without OOM.
> > > > 
> > > > > 
> > > > > - What is the possible size range of swiotlb_size_or_default()
> > > > 
> > > > swiotlb_size_or_default() returns the swiotlb size, it's specified by
> > > > user using swiotlb=<size>, or default size (64MB)
> > > > 
> > > > > 
> > > > > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?
> > > > 
> > > > It's 4G.
> > > > 
> > > > > 
> > > > > - Why do we replace one fixed limit with another fixed limit instead of
> > > > >    accurately sizing the area, with each required feature adding its own
> > > > >    requirement to the reservation size?
> > > > 
> > > > It's quite hard to "accurately sizing the area".
> > > > 
> > > > No way to tell the exact amount of memory kdump needs, we can only estimate.
> > > > Kdump kernel use different cmdline, drivers and components will have
> > > > special handling for kdump, and userspace is totally different.
> > > 
> > > Agreed about your above, but specific this the problem in this patch
> > > There should be other ways.
> > > 
> > > First thought about doing generic handling in swiotlb part, and do
> > > something like kdump_memory_reserve(size) Ingo suggested,  but according
> > > to you swiotlb init is late, so it can not increase the size, OTOH if
> > > reserve another region for kdump in swiotlb will cause other issues.
> > > 
> > > So let's think about other improvement, for example to see if you can
> > > call kdump_memory_reserve(size) in AMD SME init path, for example in
> > > mem_encrypt_init(), is it before crashkernel reservation?
> > > 
> > > If doable it will be at least cleaner than the code in this patch.
> > > 
> > > Thanks
> > > Dave
> > > 
> > 
> > How about something simple as following code? The logic and new function is as simple as
> > possible, just always reserve extra low memory when SME/SEV is active, ignore the high/low
> > reservation case. It will waste some memory with SME and high reservation though.
> > 
> > Was hesitating a lot about this series, one thing I'm thinking is that what is the point
> > of "crashkernel=" argument, if the crashkernel value could be adjusted according, the value
> > specified will seems more meanless or confusing...
> > 
> > And currently there isn't anything like crashkernel=auto or anything similiar to let kernel
> > calculate the value automatically, maybe the admin should be aware of the value or be informed
> > about the suitable crashkernel value after all?
> 
> Hmm, it is reasonable that a user defined value should be just as is
> without any change by kernel.  So it is a good reason to introduce
> a crashkernel=auto so that kernel can tune the crashkernel size
> accordingly on top of some base value which can be configurable by
> kernel configs (arch dependent).
> 

And for the time being, can just print a warning when crashkernel= param
is used, in mem_encrypt_init() code. alert people to increase the memory
size swiotlb_size_or_default().

In the future, if the crashkernel=auto is doable then kernel can adapt
to that in code.  Even if it is reasonable to let admin to provide a
exact value but sometimes it is hard to know these kernel requirement
details..

Thanks
Dave

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ