lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 28 Feb 2021 19:28:13 -0800
From:   Ilya Lipnitskiy <ilya.lipnitskiy@...il.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Kees Cook <keescook@...omium.org>,
        Christoph Hellwig <hch@....de>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: exec error: BUG: Bad rss-counter

Eric, All,

The following error appears when running Linux 5.10.18 on an embedded
MIPS mt7621 target:
[    0.301219] BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1

Being a very generic error, I started digging and added a stack dump
before the BUG:
Call Trace:
[<80008094>] show_stack+0x30/0x100
[<8033b238>] dump_stack+0xac/0xe8
[<800285e8>] __mmdrop+0x98/0x1d0
[<801a6de8>] free_bprm+0x44/0x118
[<801a86a8>] kernel_execve+0x160/0x1d8
[<800420f4>] call_usermodehelper_exec_async+0x114/0x194
[<80003198>] ret_from_kernel_thread+0x14/0x1c

So that's how I got to looking at fs/exec.c and noticed quite a few
changes last year. Turns out this message only occurs once very early
at boot during the very first call to kernel_execve. current->mm is
NULL at this stage, so acct_arg_size() is effectively a no-op.

More digging, and I traced the RSS counter increment to:
[<8015adb4>] add_mm_counter_fast+0xb4/0xc0
[<80160d58>] handle_mm_fault+0x6e4/0xea0
[<80158aa4>] __get_user_pages.part.78+0x190/0x37c
[<8015992c>] __get_user_pages_remote+0x128/0x360
[<801a6d9c>] get_arg_page+0x34/0xa0
[<801a7394>] copy_string_kernel+0x194/0x2a4
[<801a880c>] kernel_execve+0x11c/0x298
[<800420f4>] call_usermodehelper_exec_async+0x114/0x194
[<80003198>] ret_from_kernel_thread+0x14/0x1c

In fact, I also checked vma_pages(bprm->vma) and lo and behold it is set to 1.

How is fs/exec.c supposed to handle implied RSS increments that happen
due to page faults when discarding the bprm structure? In this case,
the bug-generating kernel_execve call never succeeded, it returned -2,
but I didn't trace exactly what failed.

Interestingly, this "BUG:" message is timing-dependent. If I wait a
bit before calling free_bprm after bprm_execve the message seems to go
away (there are 3 other cores running and calling into kernel_execve
at the same time, so there is that). The error also only ever happens
once (probably because no more page faults happen?).

I don't know enough to propose a proper fix here. Is it decrementing
the bprm->mm RSS counter to account for that page fault? Or is
current->mm being NULL a bigger problem?

Apologies in advance, but I have looked hard and do not see a clear
resolution for this even in the latest kernel code.

- Ilya

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ