linux-kernel - Re: [PATCH v3] mm: fix race by making init_zero_pfn() early

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <51e3affb-ea09-65a4-99e1-daba968e6dc8@wanyeetech.com>
Date:   Tue, 30 Mar 2021 12:59:27 +0800
From:   Zhou Yanjie <zhouyanjie@...yeetech.com>
To:     Ilya Lipnitskiy <ilya.lipnitskiy@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc:     Hugh Dickins <hughd@...gle.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>, stable@...r.kernel.org
Subject: Re: [PATCH v3] mm: fix race by making init_zero_pfn() early_initcall

Hi Ilya,

On 2021/3/30 下午12:42, Ilya Lipnitskiy wrote:
> There are code paths that rely on zero_pfn to be fully initialized
> before core_initcall. For example, wq_sysfs_init() is a core_initcall
> function that eventually results in a call to kernel_execve, which
> causes a page fault with a subsequent mmput. If zero_pfn is not
> initialized by then it may not get cleaned up properly and result in an
> error:
>    BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1
>
> Here is an analysis of the race as seen on a MIPS device. On this
> particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
> initialized, at which point it becomes PFN 5120:
>    1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
>         [<80340dc8>] kobject_uevent_env+0x7e4/0x7ec
>         [<8033f8b8>] kset_register+0x68/0x88
>         [<803cf824>] bus_register+0xdc/0x34c
>         [<803cfac8>] subsys_virtual_register+0x34/0x78
>         [<8086afb0>] wq_sysfs_init+0x1c/0x4c
>         [<80001648>] do_one_initcall+0x50/0x1a8
>         [<8086503c>] kernel_init_freeable+0x230/0x2c8
>         [<8066bca0>] kernel_init+0x10/0x100
>         [<80003038>] ret_from_kernel_thread+0x14/0x1c
>
>    2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
>       kernel_execve asynchronously.
>
>    3. Memory allocations in kernel_execve cause a page fault, bumping the
>       MM reference counter:
>         [<8015adb4>] add_mm_counter_fast+0xb4/0xc0
>         [<80160d58>] handle_mm_fault+0x6e4/0xea0
>         [<80158aa4>] __get_user_pages.part.78+0x190/0x37c
>         [<8015992c>] __get_user_pages_remote+0x128/0x360
>         [<801a6d9c>] get_arg_page+0x34/0xa0
>         [<801a7394>] copy_string_kernel+0x194/0x2a4
>         [<801a880c>] kernel_execve+0x11c/0x298
>         [<800420f4>] call_usermodehelper_exec_async+0x114/0x194
>
>    4. In case zero_pfn has not been initialized yet, zap_pte_range does
>       not decrement the MM_ANONPAGES RSS counter and the BUG message is
>       triggered shortly afterwards when __mmdrop checks the ref counters:
>         [<800285e8>] __mmdrop+0x98/0x1d0
>         [<801a6de8>] free_bprm+0x44/0x118
>         [<801a86a8>] kernel_execve+0x160/0x1d8
>         [<800420f4>] call_usermodehelper_exec_async+0x114/0x194
>         [<80003198>] ret_from_kernel_thread+0x14/0x1c
>
> To avoid races such as described above, initialize init_zero_pfn at
> early_initcall level. Depending on the architecture, ZERO_PAGE is either
> constant or gets initialized even earlier, at paging_init, so there is
> no issue with initializing zero_pfn earlier.
>
> Discussion: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com
>
> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@...il.com>
> Cc: Hugh Dickins <hughd@...gle.com>
> Cc: "Eric W. Biederman" <ebiederm@...ssion.com>
> Cc: stable@...r.kernel.org
> ---
>   mm/memory.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)


Tested-by: 周琰杰 (Zhou Yanjie)<zhouyanjie@...yeetech.com> # on 
CU1000-Neo/X1000E and CU1830-Neo/X1830


> diff --git a/mm/memory.c b/mm/memory.c
> index 5c3b29d3af66..e66b11ac1659 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -166,7 +166,7 @@ static int __init init_zero_pfn(void)
>   	zero_pfn = page_to_pfn(ZERO_PAGE(0));
>   	return 0;
>   }
> -core_initcall(init_zero_pfn);
> +early_initcall(init_zero_pfn);
>   
>   void mm_trace_rss_stat(struct mm_struct *mm, int member, long count)
>   {