[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjVWMnH2LfFNnXcf6=WuU1RyLa_cgTEOqnViHiqDrqQjg@mail.gmail.com>
Date: Tue, 2 Mar 2021 10:56:30 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Ilya Lipnitskiy <ilya.lipnitskiy@...il.com>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
Linux-MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Kees Cook <keescook@...omium.org>,
Christoph Hellwig <hch@....de>
Subject: Re: exec error: BUG: Bad rss-counter
On Mon, Mar 1, 2021 at 11:59 PM Ilya Lipnitskiy
<ilya.lipnitskiy@...il.com> wrote:
>
> Good to know. Some more digging and I can say that we hit this error
> when trying to unmap PFN 0 (is_zero_pfn(pfn) returns TRUE,
> vm_normal_page returns NULL, zap_pte_range does not decrement
> MM_ANONPAGES RSS counter). Is my understanding correct that PFN 0 is
> usable, but special? Or am I totally off the mark here?
PFN 0 should be usable - depending on architecture, of course - and
shouldn't even be special in any way.
is_zero_pfn(pfn) is *not* meant to test for pfn being 0 - it's meant
to test for the pfn pointing to the special zero-filled page. The two
_could_ be the same thing, of course, but generally are not (h8300
seems to say "we use pfn 0 as the zero page" if I read things right).
In fact, there can be many zero-filled pages - architectures with
virtually mapped caches that want cache coloring have multiple
contiguous zero-filled pages and then map in the right one based on
virtual address. I'm not sure why it would matter (the zero-page is
always mapped read-only, so any physical aliases should be a
non-issue), but whatever..
> Here is the (optimized) stack trace when the counter does not get decremented:
> [<8015b078>] vm_normal_page+0x114/0x1a8
Yes, if "is_zero_pfn()" returns true, then it won't be considered a
normal page, and is not refcounted.
But that should only trigger for pfn == zero_pfn, and zero_pfn should
be initialized to
zero_pfn = page_to_pfn(ZERO_PAGE(0));
so it _sounds_ like you possibly have something odd going on with ZERO_PAGE.
Yes, one architecture does actually make pfn 0 _be_ the zero page, but
you said MIPS, and that does do the page coloring games, and has
#define ZERO_PAGE(vaddr) \
(virt_to_page((void *)(empty_zero_page + (((unsigned
long)(vaddr)) & zero_page_mask))))
where zero_page_mask is the page colorign mask, and empty_zero_page is
allocated in setup_zero_pages() fairly early in mem_init() (again, it
allocates multiple pages depending on the page ordering - see that
horrible virtual cache thing with cpu_has_vce).
So PFN 0 shouldn't be an issue at all.
Of course, since you said this was an embedded MIPS platform, maybe
it's one of the broken ones with virtual caches and cpu_has_vce is
set. I'm not sure how much testing that has gotten lately. MOST of the
later MIPS architectures walked away from the pure virtual cache
setups.
Linus
Powered by blists - more mailing lists