lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230627112220.229240-1-david@redhat.com>
Date:   Tue, 27 Jun 2023 13:22:15 +0200
From:   David Hildenbrand <david@...hat.com>
To:     linux-kernel@...r.kernel.org
Cc:     linux-mm@...ck.org, virtualization@...ts.linux-foundation.org,
        David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        John Hubbard <jhubbard@...dia.com>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        Jason Wang <jasowang@...hat.com>,
        Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Subject: [PATCH v1 0/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals

As raised by John Hubbard [1], offline_and_remove_memory() failing on
fatal signals can be sub-optimal for out-of-tree drivers: dying user space
might be the last one holding a device node open.

As that device node gets closed, the driver might unplug the device
and trigger offline_and_remove_memory() to unplug previously
hotplugged device memory. This, however, will fail reliably when fatal
signals are pending on the dying process, turning the device unusable until
the machine gets rebooted.

That can be optizied easily by ignoring fatal signals. In fact, checking
for fatal signals in the case of offline_and_remove_memory() doesn't
make too much sense; the check makes sense when offlining is triggered
directly via sysfs.  However, we actually do want a way to not end up
stuck in offline_and_remove_memory() forever.

What offline_and_remove_memory() users actually want is fail after some
given timeout and not care about fatal signals.

So let's implement that, optimizing virtio-mem along the way.

Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: "Michael S. Tsirkin" <mst@...hat.com>
Cc: John Hubbard <jhubbard@...dia.com>
Cc: Oscar Salvador <osalvador@...e.de>
Cc: Michal Hocko <mhocko@...e.com>
Cc: Jason Wang <jasowang@...hat.com>
Cc: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>

[1] https://lkml.kernel.org/r/20230620011719.155379-1-jhubbard@nvidia.com

David Hildenbrand (5):
  mm/memory_hotplug: check for fatal signals only in offline_pages()
  virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY
  mm/memory_hotplug: make offline_and_remove_memory() timeout instead of
    failing on fatal signals
  virtio-mem: set the timeout for offline_and_remove_memory() to 10
    seconds
  virtio-mem: check if the config changed before (fake) offlining memory

 drivers/virtio/virtio_mem.c    | 22 +++++++++++++--
 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 50 ++++++++++++++++++++++++++++++++--
 3 files changed, 68 insertions(+), 6 deletions(-)


base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1
-- 
2.40.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ