lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yk7l4CEpQFoPnYB/@dhcp22.suse.cz>
Date:   Thu, 7 Apr 2022 15:23:44 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Juergen Gross <jgross@...e.com>, xen-devel@...ts.xenproject.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        stable@...r.kernel.org,
        Marek Marczykowski-Górecki 
        <marmarek@...isiblethingslab.com>, Mel Gorman <mgorman@...e.de>
Subject: Re: [PATCH] mm, page_alloc: fix build_zonerefs_node()

On Thu 07-04-22 14:12:38, David Hildenbrand wrote:
> On 07.04.22 14:04, Michal Hocko wrote:
> > On Thu 07-04-22 13:58:44, David Hildenbrand wrote:
> > [...]
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index 3589febc6d31..130a2feceddc 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -6112,10 +6112,8 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
> >>>  	do {
> >>>  		zone_type--;
> >>>  		zone = pgdat->node_zones + zone_type;
> >>> -		if (managed_zone(zone)) {
> >>> -			zoneref_set_zone(zone, &zonerefs[nr_zones++]);
> >>> -			check_highest_zone(zone_type);
> >>> -		}
> >>> +		zoneref_set_zone(zone, &zonerefs[nr_zones++]);
> >>> +		check_highest_zone(zone_type);
> >>>  	} while (zone_type);
> >>>  
> >>>  	return nr_zones;
> >>
> >> I don't think having !populated zones in the zonelist is a particularly
> >> good idea. Populated vs !populated changes only during page
> >> onlininge/offlining.
> >>
> >> If I'm not wrong, with your patch we'd even include ZONE_DEVICE here ...
> > 
> > What kind of problem that would cause? The allocator wouldn't see any
> > pages at all so it would fallback to the next one. Maybe kswapd would
> > need some tweak to have a bail out condition but as mentioned in the
> > thread already. !populated or !managed for that matter are not all that
> > much different from completely depleted zones. The fact that we are
> > making that distinction has led to some bugs and I suspect it makes the
> > code more complex without a very good reason.
> 
> I assume performance problems. Assume you have an ordinary system with
> multiple NUMA nodes and no MOVABLE memory. Most nodes will only have
> ZONE_NORMAL. Yet, you'd include ZONE_DMA* and ZONE_MOVABLE that will
> always remain empty to be traversed on each and every allocation
> fallback. Of course, we could measure, but IMHO at least *that* part of
> memory onlining/offlining is not the complicated part :D

You've got a good point here. I guess there will be usecases that really
benefit from every single CPU cycle spared in the allocator hot path
which really depends on the zonelists traversing.

I have very briefly had a look at our remaining usage of managed_zone()
and there are not that many left. We have 2 in page_alloc.c via
has_managed_dma(). I guess we could drop that one and use __GFP_NOWARN
in dma_atomic_pool_init but this is nothing really earth shattering.
The remaining occurances are in vmscan.c:
	- should_continue_reclaim, pgdat_balanced - required because
	  they iterate all zones withing the zoneindex and need to
	  decide whether they are balanced or not. We can make empty
	  zones special case and make them always balanced
	- kswapd_shrink_node - required because we would be increasing
	  reclaim target for empty zones
	- update_reclaim_active - required because we do not want to
	  alter zone state if it is not a subject of the reclaim which
	  empty zones are not by definition.
	- balance_pgdat - first check is likely a microoptimization,
	  reclaim_idx is needed to have a populated zone there
	- wakeup_kswapd - I dunno
	- shrink_node, allow_direct_reclaim, lruvec_lru_size - microptimizations
	- pgdat_watermark_boosted - microptimizations I suspect as empty
	  zone shouldn't ever get watermark_boost
	- pgdat_balanced - functionally dd

So we can get rid of quite some but we will still need some of them.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ