lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee492da8-74b4-4a97-8b24-73e07257f01d@redhat.com>
Date:   Fri, 17 Nov 2023 16:37:29 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Gerald Schaefer <gerald.schaefer@...ux.ibm.com>
Cc:     Sumanth Korikkar <sumanthk@...ux.ibm.com>,
        linux-mm <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        linux-s390 <linux-s390@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/8] implement "memmap on memory" feature on s390

On 17.11.23 14:00, Gerald Schaefer wrote:
> On Fri, 17 Nov 2023 00:08:31 +0100
> David Hildenbrand <david@...hat.com> wrote:
> 
>> On 14.11.23 19:02, Sumanth Korikkar wrote:
>>> Hi All,
>>>
>>> The patch series implements "memmap on memory" feature on s390 and
>>> provides the necessary fixes for it.
>>
>> Thinking about this, one thing that makes s390x different from all the
>> other architectures in this series is the altmap handling.
>>
>> I'm curious, why is that even required?
>>
>> A memmep that is not marked as online in the section should not be
>> touched by anybody (except memory onlining code :) ). And if we do, it's
>> usually a BUG because that memmap might contain garbage/be poisoned or
>> completely stale, so we might want to track that down and fix it in any
>> case.
>>
>> So what speaks against just leaving add_memory() populate the memmap
>> from the altmap? Then, also the page tables for the memmap are already
>> in place when onlining memory.
> 
> Good question, I am not 100% sure if we ran into bugs, or simply assumed
> that it is not OK to call __add_pages() when the memory for the altmap
> is not accessible.

I mean, we create the direct map even though nobody should access that 
memory, so maybe we can simply map the altmap even though nobody should 
should access that memory.

As I said, then, even the page tables for the altmap are allocated 
already and memory onlining likely doesn't need any allocation anymore 
(except, there is kasan or some other memory notifiers have special 
demands).

Certainly simpler :)

> 
> Maybe there is also already a common code bug with that, s390 might be
> special but that is often also good for finding bugs in common code ...

If it's only the page_init_poison() as noted by Sumanth, we could 
disable that on s390x with an altmap some way or the other; should be 
possible.

I mean, you effectively have your own poisoning if the altmap is 
effectively inaccessible and makes your CPU angry on access :)

Last but not least, support for an inaccessible altmap might come in 
handy for virtio-mem eventually, and make altmap support eventually 
simpler. So added bonus points.

> 
>> Then, adding two new notifier calls on start of memory_block_online()
>> called something like MEM_PREPARE_ONLINE and end the end of
>> memory_block_offline() called something like MEM_FINISH_OFFLINE is still
>> suboptimal, but that's where standby memory could be
>> activated/deactivated, without messing with the altmap.
>>
>> That way, the only s390x specific thing is that the memmap that should
>> not be touched by anybody is actually inaccessible, and you'd
>> activate/deactivate simply from the new notifier calls just the way we
>> used to do.
>>
>> It's still all worse than just adding/removing memory properly, using a
>> proper interface -- where you could alloc/free an actual memmap when the
>> altmap is not desired. But I know that people don't want to spend time
>> just doing it cleanly from scratch.
> 
> Yes, sometimes they need to be forced to do that :-)

I certainly won't force you if we can just keep the __add_pages() calls 
as is; having an altmap that is inaccessible but fully prepared sounds 
reasonable to me.

I can see how this gives an immediate benefit to existing s390x 
installations without being too hacky and without taking a long time to 
settle.

But I'll strongly suggest to evaluate a new interface long-term.

> 
> So, we'll look into defining a "proper interface", and treat patches 1-3
> separately as bug fixes? Especially patch 3 might be interesting for arm,
> if they do not have ZONE_DEVICE, but still use the functions, they might
> end up with the no-op version, not really freeing any memory.

It might make sense to

1) Send the first 3 out separately
2) Look into a simple variant that leaves __add_pages() calls alone and
    only adds the new MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers --
    well, and deals with an inaccessible altmap, like the
    page_init_poison() when the altmap might be inaccessible.
3) Look into a proper interface to add/remove memory instead of relying
    on online/offline.

2) is certainly an improvement and might be desired in some cases. 3) is 
more powerful (e.g., where you don't want an altmap because of 
fragmentation) and future proof.

I suspect there will be installations where an altmap is undesired: it 
fragments your address space with unmovable (memmap) allocations. 
Currently, runtime allocations of gigantic pages are affected. Long-term 
other large allocations (if we ever see very large THP) will be affected.

For that reason, we want to either support variable-sized memory blocks 
long-term, or simulate that by "grouping" memory blocks that share a 
same altmap located on the first memory blocks in that group: but 
onlining one block forces onlining of the whole group.

On s390x that adds all memory ahead of time, it's hard to make a 
decision what the right granularity will be, and seeing sudden 
online/offline changed behavior might be quite "surprising" for users. 
The user can give better hints when adding/removing memory explicitly.

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ