linux-kernel - exec error: BUG: Bad rss-counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCv0x1NauG_13DmmzwYaRDaq3qjmvEdyi7=XzF04KR06Q=WHA@mail.gmail.com>
Date:   Sun, 28 Feb 2021 19:28:13 -0800
From:   Ilya Lipnitskiy <ilya.lipnitskiy@...il.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Kees Cook <keescook@...omium.org>,
        Christoph Hellwig <hch@....de>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: exec error: BUG: Bad rss-counter

Eric, All,

The following error appears when running Linux 5.10.18 on an embedded
MIPS mt7621 target:
[    0.301219] BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1

Being a very generic error, I started digging and added a stack dump
before the BUG:
Call Trace:
[<80008094>] show_stack+0x30/0x100
[<8033b238>] dump_stack+0xac/0xe8
[<800285e8>] __mmdrop+0x98/0x1d0
[<801a6de8>] free_bprm+0x44/0x118
[<801a86a8>] kernel_execve+0x160/0x1d8
[<800420f4>] call_usermodehelper_exec_async+0x114/0x194
[<80003198>] ret_from_kernel_thread+0x14/0x1c

So that's how I got to looking at fs/exec.c and noticed quite a few
changes last year. Turns out this message only occurs once very early
at boot during the very first call to kernel_execve. current->mm is
NULL at this stage, so acct_arg_size() is effectively a no-op.

More digging, and I traced the RSS counter increment to:
[<8015adb4>] add_mm_counter_fast+0xb4/0xc0
[<80160d58>] handle_mm_fault+0x6e4/0xea0
[<80158aa4>] __get_user_pages.part.78+0x190/0x37c
[<8015992c>] __get_user_pages_remote+0x128/0x360
[<801a6d9c>] get_arg_page+0x34/0xa0
[<801a7394>] copy_string_kernel+0x194/0x2a4
[<801a880c>] kernel_execve+0x11c/0x298
[<800420f4>] call_usermodehelper_exec_async+0x114/0x194
[<80003198>] ret_from_kernel_thread+0x14/0x1c

In fact, I also checked vma_pages(bprm->vma) and lo and behold it is set to 1.

How is fs/exec.c supposed to handle implied RSS increments that happen
due to page faults when discarding the bprm structure? In this case,
the bug-generating kernel_execve call never succeeded, it returned -2,
but I didn't trace exactly what failed.

Interestingly, this "BUG:" message is timing-dependent. If I wait a
bit before calling free_bprm after bprm_execve the message seems to go
away (there are 3 other cores running and calling into kernel_execve
at the same time, so there is that). The error also only ever happens
once (probably because no more page faults happen?).

I don't know enough to propose a proper fix here. Is it decrementing
the bprm->mm RSS counter to account for that page fault? Or is
current->mm being NULL a bigger problem?

Apologies in advance, but I have looked hard and do not see a clear
resolution for this even in the latest kernel code.

- Ilya