lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c5cd0ad5-9d9d-4df3-ab20-c5de2a380894@suse.cz>
Date: Mon, 21 Oct 2024 19:01:59 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Roman Gushchin <roman.gushchin@...ux.dev>,
 Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, stable@...r.kernel.org,
 Hugh Dickins <hughd@...gle.com>, Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH] mm: page_alloc: move mlocked flag clearance into
 free_pages_prepare()

On 10/21/24 18:48, Roman Gushchin wrote:
> Syzbot reported [1] a bad page state problem caused by a page
> being freed using free_page() still having a mlocked flag at
> free_pages_prepare() stage:
> 
>   BUG: Bad page state in process syz.0.15  pfn:1137bb
>   page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb
>   flags: 0x400000000080000(mlocked|node=0|zone=1)
>   raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000
>   raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000
>   page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>   page_owner tracks the page as allocated
>   page last allocated via order 0, migratetype Unmovable, gfp_mask
>   0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid
>   3004 (syz.0.15), ts 61546  608067, free_ts 61390082085
>    set_page_owner include/linux/page_owner.h:32 [inline]
>    post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
>    prep_new_page mm/page_alloc.c:1545 [inline]
>    get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457
>    __alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733
>    alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265
>    kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99
>    kvm_create_vm virt/kvm/kvm_main.c:1235 [inline]
>    kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline]
>    kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542
>    vfs_ioctl fs/ioctl.c:51 [inline]
>    __do_sys_ioctl fs/ioctl.c:907 [inline]
>    __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
>    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>    do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83
>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>   page last free pid 951 tgid 951 stack trace:
>    reset_page_owner include/linux/page_owner.h:25 [inline]
>    free_pages_prepare mm/page_alloc.c:1108 [inline]
>    free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638
>    vfree+0x181/0x2e0 mm/vmalloc.c:3361
>    delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282
>    process_one_work kernel/workqueue.c:3229 [inline]
>    process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310
>    worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391
>    kthread+0x2df/0x370 kernel/kthread.c:389
>    ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
>    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> 
> The problem was originally introduced by
> commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final
> clearance"): it was handling focused on handling pagecache
> and anonymous memory and wasn't suitable for lower level
> get_page()/free_page() API's used for example by KVM, as with
> this reproducer.

Does that mean KVM is mlocking pages that are not pagecache nor anonymous,
thus not LRU? How and why (and since when) is that done?

> Fix it by moving the mlocked flag clearance down to
> free_page_prepare().
> 
> The bug itself if fairly old and harmless (aside from generating these
> warnings), so the stable backport is likely not justified.

But since there's a Cc: stable below, it will be backported :)

> Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000
> Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance")
> Signed-off-by: Roman Gushchin <roman.gushchin@...ux.dev>
> Cc: <stable@...r.kernel.org>
> Cc: Hugh Dickins <hughd@...gle.com>
> Cc: Matthew Wilcox <willy@...radead.org>
> ---
>  mm/page_alloc.c |  9 +++++++++
>  mm/swap.c       | 14 --------------
>  2 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bc55d39eb372..24200651ad92 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page,
>  	bool skip_kasan_poison = should_skip_kasan_poison(page);
>  	bool init = want_init_on_free();
>  	bool compound = PageCompound(page);
> +	struct folio *folio = page_folio(page);
>  
>  	VM_BUG_ON_PAGE(PageTail(page), page);
>  
> @@ -1053,6 +1054,14 @@ __always_inline bool free_pages_prepare(struct page *page,
>  	if (memcg_kmem_online() && PageMemcgKmem(page))
>  		__memcg_kmem_uncharge_page(page, order);
>  
> +	if (unlikely(folio_test_mlocked(folio))) {
> +		long nr_pages = folio_nr_pages(folio);
> +
> +		__folio_clear_mlocked(folio);
> +		zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages);
> +		count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
> +	}

Why drop the useful comment?

> +
>  	if (unlikely(PageHWPoison(page)) && !order) {
>  		/* Do not let hwpoison pages hit pcplists/buddy */
>  		reset_page_owner(page, order);
> diff --git a/mm/swap.c b/mm/swap.c
> index 835bdf324b76..7cd0f4719423 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp,
>  		lruvec_del_folio(*lruvecp, folio);
>  		__folio_clear_lru_flags(folio);
>  	}
> -
> -	/*
> -	 * In rare cases, when truncation or holepunching raced with
> -	 * munlock after VM_LOCKED was cleared, Mlocked may still be
> -	 * found set here.  This does not indicate a problem, unless
> -	 * "unevictable_pgs_cleared" appears worryingly large.
> -	 */
> -	if (unlikely(folio_test_mlocked(folio))) {
> -		long nr_pages = folio_nr_pages(folio);
> -
> -		__folio_clear_mlocked(folio);
> -		zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages);
> -		count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
> -	}
>  }
>  
>  /*


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ