lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 14 May 2019 08:30:43 +0200
From:   Oleksandr Natalenko <oleksandr@...hat.com>
To:     Kirill Tkhai <ktkhai@...tuozzo.com>
Cc:     linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...e.com>,
        Matthew Wilcox <willy@...radead.org>,
        Pavel Tatashin <pasha.tatashin@...een.com>,
        Timofey Titovets <nefelim4ag@...il.com>,
        Aaron Tomlin <atomlin@...hat.com>, linux-mm@...ck.org
Subject: Re: [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs

Hi.

On Mon, May 13, 2019 at 03:37:56PM +0300, Kirill Tkhai wrote:
> > Yes, I get your point. But the intention is to avoid another hacky trick
> > (LD_PRELOAD), thus *something* should *preferably* be done on the
> > kernel level instead.
> 
> I don't think so. Does userspace hack introduce some overhead? It does not
> look so. Why should we think about mergeable VMAs in page fault handler?!
> This is the last thing we want to think in page fault handler.
> 
> Also, there is difficult synchronization in page fault handlers, and it's
> easy to make a mistake. So, there is a mistake in [3/4], and you call
> ksm_enter() with mmap_sem read locked, while normal way is to call it
> with write lock (see madvise_need_mmap_write()).
> 
> So, let's don't touch this path. Small optimization for unlikely case will
> introduce problems in optimization for likely case in the future.

Yup, you're right, I've missed the fact that write lock is needed there.
Re-vamping locking there is not my intention, so lets find another
solution.

> > Also, just for the sake of another piece of stats here:
> > 
> > $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
> > 526
> 
> This all requires attentive analysis. The number looks pretty big for me.
> What are the pages you get merged there? This may be just zero pages,
> you have identical.
> 
> E.g., your browser want to work fast. It introduces smart schemes,
> and preallocates many pages in background (mmap + write 1 byte to a page),
> so in further it save some time (no page fault + alloc), when page is
> really needed. But your change merges these pages and kills this
> optimization. Sounds not good, does this?
> 
> I think, we should not think we know and predict better than application
> writers, what they want from kernel. Let's people decide themselves
> in dependence of their workload. The only exception is some buggy
> or old applications, which impossible to change, so force madvise
> workaround may help. But only in case there are really such applications...
> 
> I'd researched what pages you have duplicated in these 526 MB. Maybe
> you find, no action is required or a report to userspace application
> to use madvise is needed.

OK, I agree, this is a good argument to move decision to userspace.

> > 2) what kinds of opt-out we should maintain? Like, what if force_madvise
> > is called, but the task doesn't want some VMAs to be merged? This will
> > required new flag anyway, it seems. And should there be another
> > write-only file to unmerge everything forcibly for specific task?
> 
> For example,
> 
> Merge:
> #echo $task > /sys/kernel/mm/ksm/force_madvise

Immediate question: what should be actually done on this? I see 2
options:

1) mark all VMAs as mergeable + set some flag for mmap() to mark all
further allocations as mergeable as well;
2) just mark all the VMAs as mergeable; userspace can call this
periodically to mark new VMAs.

My prediction is that 2) is less destructive, and the decision is
preserved predominantly to userspace, thus it would be a desired option.

> Unmerge:
> #echo -$task > /sys/kernel/mm/ksm/force_madvise

Okay.

> In case of task don't want to merge some VMA, we just should skip it at all.

This way we lose some flexibility, IMO, but I get you point.

Thanks.

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ