lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 28 Oct 2010 17:45:14 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Christoph Lameter <cl@...ux.com>,
	Lee Schermerhorn <lee.schermerhorn@...com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tejun Heo <tj@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Brian Gerst <brgerst@...il.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org, mingo@...e.hu
Subject: Re: [PATCH] numa: fix slab_node(MPOL_BIND)

On Thu, Oct 28, 2010 at 08:59:42AM -0700, Linus Torvalds wrote:
> Hmm. More people added to the discussion..
> 
> This code seems to go back all the way to commit 19770b32609b: "mm:
> filter based on a nodemask as well as a gfp_mask". Which was back in
> April 2008. and got merged into 2.6.26.
> 

I am about to run out the door so I didn't read the thread but
first_zones_zonelist() can indeed return NULL. It happens when the
zonelist is empty (unlikely) or when a nodemask is applied restricting
the allowable nodes and that results in no valid zones (more likely).

> And I'd be happy to commit it (in fact, I was going to), but when
> looking for other uses of first_zones_zonelist(), I found
> local_memory_node() which does the exact same thing: ignore the return
> value, and unconditionally dereference the resulting 'zone' variable.
> 

That does look unsafe.

> And so does - although less obviously - mm/vmscan.c for the
> wait_iff_confgested() thing.
> 

It should be implicitly safe although it is non-obvious.  wait_iff_congested
in mm/vmscan.c is called from do_try_to_free_pages() which is in the direct
reclaim path. To get there, it must have passed this check in page_alloc.c

        first_zones_zonelist(zonelist, high_zoneidx, nodemask, &preferred_zone);
        if (!preferred_zone) {
                put_mems_allowed();
                return NULL;
        }

Did I miss anything?

The memory controller also can end up there but for it to get into
trouble, they would have to be trying to shrink a cgroup with an invalid
zonelist. Is that possible?

> So are those buggy too, since first_zones_zonelist() can apparently return NULL?
> 

Yes, it can.

> Please advise...
> 

Callers need to check for NULL or be sure they are not dealing with an
empty zonelist.

>                   Linus
> 
> On Wed, Oct 27, 2010 at 10:33 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > Le mercredi 27 octobre 2010 à 18:07 +0200, Eric Dumazet a écrit :
> >
> >> So I tried following experiment :
> >>
> >> # swapoff
> >> # numactl --membind=0 swapon -a
> >> # grep swap /proc/vmallocinfo
> >> 0xf9bf3000-0xf9cf4000 1052672 sys_swapon+0x4aa/0xb24 pages=256 vmalloc N0=256
> >> # swapoff -a
> >> # numactl --membind=1 swapon -a
> >>
> >> <<FREEZE>>
> >>
> >
> > Crash in fact, not freeze, in slab_node()
> >
> > Problem is : we dereference a NULL zone pointer.
> >
> > (node 1 has HighMem only)
> >
> > Following patch seems to solve the problem for me
> >
> > # swapoff -a
> > # numactl --membind=1 swapon -a
> > # grep swap /proc/vmallocinfo
> > 0xf9da5000-0xf9ea6000 1052672 sys_swapon+0x3f9/0xa34 pages=256 vmalloc N1=256
> >
> >
> > Thanks
> >
> >
> > [PATCH] numa: fix slab_node(MPOL_BIND)
> >
> > When a node contains only HighMem memory, slab_node(MPOL_BIND)
> > dereferences a NULL pointer.
> >
> > Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
> > ---
> >  mm/mempolicy.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 81a1276..4a57f13 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1597,7 +1597,7 @@ unsigned slab_node(struct mempolicy *policy)
> >                (void)first_zones_zonelist(zonelist, highest_zoneidx,
> >                                                        &policy->v.nodes,
> >                                                        &zone);
> > -               return zone->node;
> > +               return zone ? zone->node : numa_node_id();
> >        }
> >
> >        default:
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ