linux-kernel - Re: [PATCH linux-next] mm/madvise: allow KSM hints for process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YsKNJiGA/ruLRS27@dhcp22.suse.cz>
Date:   Mon, 4 Jul 2022 08:48:06 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     cgel.zte@...il.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, vbabka@...e.cz, minchan@...nel.org,
        oleksandr@...hat.com, xu xin <xu.xin16@....com.cn>,
        Jann Horn <jannh@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH linux-next] mm/madvise: allow KSM hints for
 process_madvise

On Fri 01-07-22 21:12:56, David Hildenbrand wrote:
> On 01.07.22 15:19, Michal Hocko wrote:
> > On Fri 01-07-22 14:39:24, David Hildenbrand wrote:
> >>> I am not sure about exact details of the KSM implementation but if that
> >>> is not a desirable behavior then it should be handled on the KSM level.
> >>> The very sam thing can easily happen in a multithreaded (or in general
> >>> multi-process with shared mm) environment as well.
> >>
> >> I don't quite get what you mean.
> > 
> > I meant to say that if KSM needs to be aware of a special CoW semantic
> > then it should be handled on the KSM layer regardless whether the KSM
> > has been set by the process itself or any other process that has acccess
> > to the MM. process_madvise is just another way to access a remote MM
> > other than sharing the full MM.
> 
> Okay.
> 
> KSM has been a corner case feature that was restricted to well-defined
> and well-tested environments. Until recently, R/O pins of any KSM pages
> was essentially completely unreliably. And applications don't expect
> such surprises. The shared zeropage is most probably the last
> problematic piece.
> 
> Yes, we're getting there that it's a real feature that can see more
> (forced) wide-spread use. However, until the known issues in KSM have
> been fixed (e.g., below -- there is a whole list of papers regarding
> attacks on memory deduplication), it should be limited to well defined
> environments and applications only -- IMHO.

Very much agreed on all this! To be completely honest I am not really
sure that all those consequences are widely understood and optmizing
solely on memory savings is a very short sighted strategy IMO. But, it
seems that there is a demand for this feature and previous attempts for
APIs were much worse both from the semantic and maintainability POV. I
am not sure we can get anything more sane than madvise.

I also very much agree that current shortcomings have to be adressed
first before we open this can of worms to 3rd party actors. I was not
aware of those so thank for bringing them up. Maybe I was overly
optimistic here.

So I guess we have following questions to answer:
1) Do we really want to support KSM triggered by 3rd party? Does it
impose new challenges other than existing ones in multi "threaded"
environemnts?
2) If yes, is the process_madvise the most appropriate existing API? Or
do we need a new one?
3) Should this be a highly privileged operation or we want to allow
userspace to shoot its feet because consequences are subtle and not very
well understood?

> So what I want to express here is that if we're adding an interface that
> can be used to just enable KSM on the whole system easily, it might be a
> bit to soon for that. No matter what you document, people will ignore it.

Agreed.

> OTOH, if this is a real debug feature that will only be available in
> specific debug/test scenarios (kernel config? toggle? whatsoever?), then
> it's "better". If that is already the case, good.

No, I think this is aimed to real production deployments.

Thanks!
-- 
Michal Hocko
SUSE Labs