linux-kernel - Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181224093804.GA16933@osadl.at>
Date:   Mon, 24 Dec 2018 10:38:04 +0100
From:   Nicholas Mc Guire <der.herr@...r.at>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     David Rientjes <rientjes@...gle.com>,
        Nicholas Mc Guire <hofrat@...dl.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Chintan Pandya <cpandya@...eaurora.org>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Arun KS <arunks@...eaurora.org>, Joe Perches <joe@...ches.com>,
        "Luis R. Rodriguez" <mcgrof@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail

On Mon, Dec 24, 2018 at 09:10:56AM +0100, Michal Hocko wrote:
> On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote:
> > On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> > > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> > > 
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 871e41c..1c118d7 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> > > >  
> > > >  	/* Import existing vmlist entries. */
> > > >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > > > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > > > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> > > >  		va->flags = VM_VM_AREA;
> > > >  		va->va_start = (unsigned long)tmp->addr;
> > > >  		va->va_end = va->va_start + tmp->size;
> > > 
> > > Hi Nicholas,
> > > 
> > > You're right that this looks wrong because there's no guarantee that va is 
> > > actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> > > we're not giving the page allocator a chance to reclaim so this would 
> > > likely just end up looping forever instead of crashing with a NULL pointer 
> > > dereference, which would actually be the better result.
> > >
> > tried tracing the __GFP_NOFAIL path and had concluded that it would
> > end in out_of_memory() -> panic("System is deadlocked on memory\n");
> > which also should point cleanly to the cause - but I´m actually not
> > that sure if that trace was correct in all cases.
> 
> No, we do not trigger the memory reclaim path nor the oom killer when
> using GFP_NOWAIT. In fact the current implementation even ignores
> __GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect
> that we used to loop fpr __GFP_NOFAIL at some point in the past). The
> patch simply doesn't have any effect. But the primary objection is that
> the behavior might change in future and you certainly do not want to get
> stuck in the boot process without knowing what is going on. Crashing
> will tell you that quite obviously. Although I have hard time imagine
> how that could happen in a reasonably configured system.

I think most of the defensive structures are covering rare to almost
impossible cases - but those are precisely the hard ones to understand if
they do happen.

> 
> > > You could do
> > > 
> > > 	BUG_ON(!va);
> > > 
> > > to make it obvious why we crashed, however.  It makes it obvious that the 
> > > crash is intentional rather than some error in the kernel code.
> > 
> > makes sense - that atleast makes it imediately clear from the code
> > that there is no way out from here.
> 
> How does it differ from blowing up right there when dereferencing flags?
> It would be clear from the oops.

The question is how soon does it blow-up if it were imediate then three is
probably no real difference if there is some delay say due to the region
affected by the NULL pointer not being imediately in use - it may be very
hard to differenciate between an allocation failure and memory corruption
so having a directly associated trace should be significantly simpler to
understand - and you might actually not want a system to try booting if there
are problems at this level.

thx!
hofrat