lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a558ed8-0db0-4350-adb2-f2baa9d4c2bd@lucifer.local>
Date: Mon, 19 May 2025 20:18:49 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Pedro Falcato <pfalcato@...e.de>, Xu Xin <xu.xin16@....com.cn>,
        Chengming Zhou <chengming.zhou@...ux.dev>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 3/4] mm: prevent KSM from completely breaking VMA merging

On Mon, May 19, 2025 at 08:14:17PM +0100, Lorenzo Stoakes wrote:
> On Mon, May 19, 2025 at 08:59:32PM +0200, David Hildenbrand wrote:
> > > >
> > > > I am not 100% sure why we bail out on special mappings: all we have to do is
> > > > reliably identify anon pages, and we should be able to do that.
> > >
> > > But they map e.g. kernel memory (at least for VM_PFNMAP, purely and by
> > > implication really VM_IO), it wouldn't make sense for KSM to be asked to
> > > try to merge these right?
> > >
> > > And of course no underlying struct page to pin, no reference counting
> > > either, so I think you'd end up in trouble potentially here wouldn't you?
> > > And how would the CoW work?
> >
> > KSM only operates on anonymous pages. It cannot de-duplicate anything else.
> > (therefore, only MAP_PRIVATE applies)
>
> Yes I had this realisation see my reply to your reply :)
>
> I mean is MAP_PRIVATE of a VM_PFNMAP really that common?...
>
> >
> > Anything else (no struct page, not a CoW anon folio in such a mapping) is
> > skipped.
> >
> > Take a look at scan_get_next_rmap_item() where we do
> >
> > folio = folio_walk_start(&fw, vma, ksm_scan.address, 0);
> > if (folio) {
> > 	if (!folio_is_zone_device(folio) &&
> > 	    folio_test_anon(folio)) {
> > 		folio_get(folio);
> > 		tmp_page = fw.page;
> > 	}
> > 	folio_walk_end(&fw, vma)
> > }
> >
> >
> > Before I changed that code, we were using GUP. And GUP just always refuses
> > VM_IO|VM_PFNMAP because it cannot handle it properly.
>
> OK so it boils down to doing KSM _on the already CoW'd_ MAP_PRIVATE mapping?
>
> But hang on, how do we discover this? vm_normal_page() will screw this up right?
> As VM_SPECIAL will be set...
>
> OK now I'm not sure I understand how MAP_PRIVATE-mapped VM_PFNMAP mappings work
> :)))

Oh wait hang on...

So here, MAP_PRIVATE, CoW'd would clear pte_special() presumably:

	if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
		if (likely(!pte_special(pte)))
			goto check_pfn;

...

And in non-PTE special flag case:

...
			if (!is_cow_mapping(vma->vm_flags))
				return NULL;
...

And of course, this will be a CoW mapping...

So actually we should be fine. Got it.

>
> >
> > > >
> > > > So, assuming we could remove the VM_PFNMAP | VM_IO | VM_DONTEXPAND |
> > > > VM_MIXEDMAP constraint from vma_ksm_compatible(), could we simplify?
> > >
> > > Well I question removing this constraint for above reasons.
> > >
> > > At any rate, even if we _could_ this feels like a bigger change that we
> > > should come later.
> >
> > "bigger" -- it might just be removing these 4 flags from the check ;)
> >
> > I'll dig a bit more.
>
> Right, but doing so would be out of scope here don't you think?
>
> I'd rather not delay fixing this bug on this basis ideally, esp. as easy to
> adjust later.
>
> I suggest we put this in as-is (or close to it anyway) and then if we remove the
> flags we can change this...
>
> As I said in other reply, .mmap() means the driver can do literally anything
> they want (which is _hateful_), so we'd really want to have some confidence they
> didn't do something so crazy, unless we were happy to just let that possibly
> explode.
>
> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ