linux-kernel - Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20181224115818.GA3063@osadl.at>
Date:   Mon, 24 Dec 2018 12:58:18 +0100
From:   Nicholas Mc Guire <der.herr@...r.at>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     David Rientjes <rientjes@...gle.com>,
        Nicholas Mc Guire <hofrat@...dl.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Chintan Pandya <cpandya@...eaurora.org>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Arun KS <arunks@...eaurora.org>, Joe Perches <joe@...ches.com>,
        "Luis R. Rodriguez" <mcgrof@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail

On Mon, Dec 24, 2018 at 10:38:04AM +0100, Nicholas Mc Guire wrote:
> On Mon, Dec 24, 2018 at 09:10:56AM +0100, Michal Hocko wrote:
> > On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote:
> > > On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> > > > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> > > > 
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 871e41c..1c118d7 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> > > > >  
> > > > >  	/* Import existing vmlist entries. */
> > > > >  	for (tmp = vmlist; tmp; tmp = tmp->next) {
> > > > > -		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > > > > +		va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> > > > >  		va->flags = VM_VM_AREA;
> > > > >  		va->va_start = (unsigned long)tmp->addr;
> > > > >  		va->va_end = va->va_start + tmp->size;
> > > > 
> > > > Hi Nicholas,
> > > > 
> > > > You're right that this looks wrong because there's no guarantee that va is 
> > > > actually non-NULL.  __GFP_NOFAIL won't help in init, unfortunately, since 
> > > > we're not giving the page allocator a chance to reclaim so this would 
> > > > likely just end up looping forever instead of crashing with a NULL pointer 
> > > > dereference, which would actually be the better result.
> > > >
> > > tried tracing the __GFP_NOFAIL path and had concluded that it would
> > > end in out_of_memory() -> panic("System is deadlocked on memory\n");
> > > which also should point cleanly to the cause - but I´m actually not
> > > that sure if that trace was correct in all cases.
> > 
> > No, we do not trigger the memory reclaim path nor the oom killer when
> > using GFP_NOWAIT. In fact the current implementation even ignores
> > __GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect
> > that we used to loop fpr __GFP_NOFAIL at some point in the past). The
> > patch simply doesn't have any effect. But the primary objection is that
> > the behavior might change in future and you certainly do not want to get
> > stuck in the boot process without knowing what is going on. Crashing
> > will tell you that quite obviously. Although I have hard time imagine
> > how that could happen in a reasonably configured system.
> 
> I think most of the defensive structures are covering rare to almost
> impossible cases - but those are precisely the hard ones to understand if
> they do happen.
> 
> > 
> > > > You could do
> > > > 
> > > > 	BUG_ON(!va);
> > > > 
> > > > to make it obvious why we crashed, however.  It makes it obvious that the 
> > > > crash is intentional rather than some error in the kernel code.
> > > 
> > > makes sense - that atleast makes it imediately clear from the code
> > > that there is no way out from here.
> > 
> > How does it differ from blowing up right there when dereferencing flags?
> > It would be clear from the oops.
> 
> The question is how soon does it blow-up if it were imediate then three is
> probably no real difference if there is some delay say due to the region
> affected by the NULL pointer not being imediately in use - it may be very
> hard to differenciate between an allocation failure and memory corruption
> so having a directly associated trace should be significantly simpler to
> understand - and you might actually not want a system to try booting if there
> are problems at this level.
>
sorry - you are right - it would blow up imediately - so there is no way this
could be delayed in this case. So then its just a matter of the code making
clear that the NULL case was considered - by a comment or by BUG_ON().

thx!
hofrat