lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACw3F52dyiAyo1ijKfLUGLbh+kquwoUhGMwg4-RObSDvqxreJw@mail.gmail.com>
Date: Thu, 29 Aug 2024 12:21:39 -0700
From: Jiaqi Yan <jiaqiyan@...gle.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Peter Xu <peterx@...hat.com>, linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	Gavin Shan <gshan@...hat.com>, Catalin Marinas <catalin.marinas@....com>, x86@...nel.org, 
	Ingo Molnar <mingo@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Paolo Bonzini <pbonzini@...hat.com>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	Thomas Gleixner <tglx@...utronix.de>, Alistair Popple <apopple@...dia.com>, kvm@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, Sean Christopherson <seanjc@...gle.com>, 
	Oscar Salvador <osalvador@...e.de>, Borislav Petkov <bp@...en8.de>, Zi Yan <ziy@...dia.com>, 
	Axel Rasmussen <axelrasmussen@...gle.com>, David Hildenbrand <david@...hat.com>, 
	Yan Zhao <yan.y.zhao@...el.com>, Will Deacon <will@...nel.org>, 
	Kefeng Wang <wangkefeng.wang@...wei.com>, Alex Williamson <alex.williamson@...hat.com>, 
	ankita@...dia.com
Subject: Re: [PATCH v2 00/19] mm: Support huge pfnmaps

On Wed, Aug 28, 2024 at 4:50 PM Jason Gunthorpe <jgg@...dia.com> wrote:
>
> On Wed, Aug 28, 2024 at 09:10:34AM -0700, Jiaqi Yan wrote:
> > On Wed, Aug 28, 2024 at 7:24 AM Jason Gunthorpe <jgg@...dia.com> wrote:
> > >
> > > On Tue, Aug 27, 2024 at 05:42:21PM -0700, Jiaqi Yan wrote:
> > >
> > > > Instead of removing the whole pud, can driver or memory_failure do
> > > > something similar to non-struct-page-version of split_huge_page? So
> > > > driver doesn't need to re-fault good pages back?
> > >
> > > It would be far nicer if we didn't have to poke a hole in a 1G mapping
> > > just for memory failure reporting.
> >
> > If I follow this, which of the following sounds better? 1. remove pud
> > and rely on the driver to re-fault PFNs that it knows are not poisoned
> > (what Peter suggested), or 2. keep the pud and allow access to both
> > good and bad PFNs.
>
> In practice I think people will need 2, as breaking up a 1G mapping
> just because a few bits are bad will destroy the VM performance.
>

Totally agreed.

> For this the expectation would be for the VM to co-operate and not
> keep causing memory failures, or perhaps for the platform to spare in
> good memory somehow.

Yes, whether a VM gets into a memory-error-consumption loop
maliciously or accidentally, a reasonable VMM should have means to
detect and break it.

>
> > Or provide some knob (configured by ?) so that kernel + driver can
> > switch between the two?
>
> This is also sounding reasonable, especially if we need some
> alternative protocol to signal userspace about the failed memory
> besides fault and SIGBUS.

To clarify, what on my mind is a knob say named
"sysctl_enable_hard_offline", configured by userspace.

To apply to Ankit's memory_failure_pfn patch[*]:

static int memory_failure_pfn(unsigned long pfn, int flags)
{
  struct interval_tree_node *node;
  int res = MF_FAILED;
  LIST_HEAD(tokill);

  mutex_lock(&pfn_space_lock);
   for (node = interval_tree_iter_first(&pfn_space_itree, pfn, pfn); node;
         node = interval_tree_iter_next(node, pfn, pfn)) {
    struct pfn_address_space *pfn_space =
      container_of(node, struct pfn_address_space, node);

    if (pfn_space->ops)
      pfn_space->ops->failure(pfn_space, pfn);

    collect_procs_pgoff(NULL, pfn_space->mapping, pfn, &tokill);

    if (sysctl_enable_hard_offline)
      unmap_mapping_range(pfn_space->mapping, pfn << PAGE_SHIFT,
                                             PAGE_SIZE, 0);

    res = MF_RECOVERED;
  }
  mutex_unlock(&pfn_space_lock);

  if (res == MF_FAILED)
    return action_result(pfn, MF_MSG_PFN_MAP, res);

  flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
  kill_procs(&tokill, true, false, pfn, flags);

  return action_result(pfn, MF_MSG_PFN_MAP, MF_RECOVERED);
}

I think we still want to attempt to SIGBUS userspace, regardless of
doing unmap_mapping_range or not.

[*] https://lore.kernel.org/lkml/20231123003513.24292-2-ankita@nvidia.com/#t

>
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ