linux-kernel - Re: [PATCH 10/17] prmem: documentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrW1FcT88u0QY9k_PMOeZj0G8H6KNb5Bnreo-12NvmmCEQ@mail.gmail.com>
Date:   Tue, 30 Oct 2018 21:41:13 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Igor Stoppa <igor.stoppa@...il.com>,
        Tycho Andersen <tycho@...ho.ws>,
        Kees Cook <keescook@...omium.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Mimi Zohar <zohar@...ux.vnet.ibm.com>,
        Dave Chinner <david@...morbit.com>,
        James Morris <jmorris@...ei.org>,
        Michal Hocko <mhocko@...nel.org>,
        Kernel Hardening <kernel-hardening@...ts.openwall.com>,
        linux-integrity <linux-integrity@...r.kernel.org>,
        LSM List <linux-security-module@...r.kernel.org>,
        Igor Stoppa <igor.stoppa@...wei.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Jonathan Corbet <corbet@....net>,
        Laura Abbott <labbott@...hat.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 10/17] prmem: documentation

On Tue, Oct 30, 2018 at 2:36 PM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Tue, Oct 30, 2018 at 10:43:14PM +0200, Igor Stoppa wrote:
> > On 30/10/2018 21:20, Matthew Wilcox wrote:
> > > > > So the API might look something like this:
> > > > >
> > > > >         void *p = rare_alloc(...);      /* writable pointer */
> > > > >         p->a = x;
> > > > >         q = rare_protect(p);            /* read-only pointer */
> >
> > With pools and memory allocated from vmap_areas, I was able to say
> >
> > protect(pool)
> >
> > and that would do a swipe on all the pages currently in use.
> > In the SELinux policyDB, for example, one doesn't really want to
> > individually protect each allocation.
> >
> > The loading phase happens usually at boot, when the system can be assumed to
> > be sane (one might even preload a bare-bone set of rules from initramfs and
> > then replace it later on, with the full blown set).
> >
> > There is no need to process each of these tens of thousands allocations and
> > initialization as write-rare.
> >
> > Would it be possible to do the same here?
>
> What Andy is proposing effectively puts all rare allocations into
> one pool.  Although I suppose it could be generalised to multiple pools
> ... one mm_struct per pool.  Andy, what do you think to doing that?

Hmm.  Let's see.

To clarify some of this thread, I think that the fact that rare_write
uses an mm_struct and alias mappings under the hood should be
completely invisible to users of the API.  No one should ever be
handed a writable pointer to rare_write memory (except perhaps during
bootup or when initializing a large complex data structure that will
be rare_write but isn't yet, e.g. the policy db).

For example, there could easily be architectures where having a
writable alias is problematic.  On such architectures, an entirely
different mechanism might work better.  And, if a tool like KNOX ever
becomes a *part* of the Linux kernel (hint hint!)

If you have multiple pools and one mm_struct per pool, you'll need a
way to find the mm_struct from a given allocation.  Regardless of how
the mm_structs are set up, changing rare_write memory to normal memory
or vice versa will require a global TLB flush (all ASIDs and global
pages) on all CPUs, so having extra mm_structs doesn't seem to buy
much.

(It's just possible that changing rare_write back to normal might be
able to avoid the flush if the spurious faults can be handled
reliably.)