linux-kernel - Re: [RFC 3/8] mm: Avoid using set_page_count() in set_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+CK2bDb4vZYFEYf7WuanbCYFh+Kb=U3VHqRwj-YTFhzsp6ZuQ@mail.gmail.com>
Date:   Mon, 1 Nov 2021 10:30:41 -0400
From:   Pasha Tatashin <pasha.tatashin@...een.com>
To:     John Hubbard <jhubbard@...dia.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, linux-mm <linux-mm@...ck.org>,
        linux-m68k@...ts.linux-m68k.org,
        Anshuman Khandual <anshuman.khandual@....com>,
        Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        william.kucharski@...cle.com,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        schmitzmic@...il.com, Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <guro@...com>,
        Muchun Song <songmuchun@...edance.com>, weixugc@...gle.com,
        Greg Thelen <gthelen@...gle.com>
Subject: Re: [RFC 3/8] mm: Avoid using set_page_count() in set_page_recounted()

On Wed, Oct 27, 2021 at 9:35 PM John Hubbard <jhubbard@...dia.com> wrote:
>
> On 10/27/21 18:20, John Hubbard wrote:
> >>> But it's still not good to have this function name doing something completely
> >>> different than its name indicates.
> >>
> >> I see, I can rename it to: 'set_page_recounted/get_page_recounted' ?
> >>
> >
> > What? No, that's not where I was going at all. The function is already
> > named set_page_refcounted(), and one of the problems I see is that your
> > changes turn it into something that most certainly does not
> > set_page_refounted(). Instead, this patch *increments* the refcount.
> > That is not the same thing.
> >
> > And then it uses a .config-sensitive assertion to "prevent" problems.
> > And by that I mean, the wording throughout this series seems to equate
> > VM_BUG_ON_PAGE() assertions with real assertions. They are only active,
> > however, in CONFIG_DEBUG_VM configurations, and provide no protection at
> > all for normal (most distros) users. That's something that the wording,
> > comments, and even design should be tweaked to account for.
>
> ...and to clarify a bit more, maybe this also helps:
>
> These patches are attempting to improve debugging, and that is fine, as

They are attempting to catch potentioal race conditions where
_refcount is changed between the time we verified what it was and we
set it to something else.

They also attempt to prevent overflows and underflows bugs which are
not all tested today, but can be tested with this patch set at least
on kernels where DEBUG_VM is enabled.

> far as debugging goes. However, a point that seems to be slightly
> misunderstood is: incrementing a bad refcount value is not actually any
> better than overwriting it, from a recovery point of view. Maybe (?)
> it's better from a debugging point of view.

It is better for debugging as well: if one is tracing the page
_refcount history, knowing that the _refcount can only be
incremented/decremented/frozen/unfrozen provides a contiguous history
of refcount that can be tracked. In case when we set refcount in some
places as we do today, the contigous history is lost, as we do not
know the actual _refcount value at the time of the set operation.

>
> That's because the problem occurred before this code, and its debug-only
> assertions, ran. Once here, the code cannot actually recover: there is
> no automatic way to recover from a refcount that it 1, -1, 2, or 706,
> when it was supposed to be zero. Incrementing it is, again, not really
> necessarily better than setting: setting it might actually make the
> broken system appear to run--and in some cases, even avoid symptoms.
> Whereas incrementing doesn't cover anything up. The only thing you can
> really does is just panic() or BUG(), really.

This is what my patch series attempt to do, I chose to use VM_BUG()
instead of BUG() because this is VM code, and avoid potential
performance regressions for those who chose performance over possible
security implications.

>
> Don't get me wrong, I don't want bugs covered up. But the claim that
> incrementing is somehow better deserves some actual thinking about it.

I think it does, I described my points above, if you still disagree
please let me know.

Thank you for providing your thoughts on this RFC, I will send out a
new version, and we can continue discussion in the new thread.

Pasha