linux-kernel - Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0eadea9c-5af0-d7e6-071e-898b04294dd3@redhat.com>
Date:   Tue, 8 Jun 2021 17:00:09 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Oscar Salvador <osalvador@...e.de>
Cc:     Michal Hocko <mhocko@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Pavel Tatashin <pasha.tatashin@...een.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get
 stable zone's values

On 07.06.21 12:23, Oscar Salvador wrote:
> On Mon, Jun 07, 2021 at 10:49:01AM +0200, David Hildenbrand wrote:
>> I'd like to point out that I think the seqlock is not in place to
>> synchronize with actual growing/shrinking but to get consistent zone ranges
>> -- like using atomics, but we have two inter-dependent values here.
> 
> I guess so, at least that's what it should do.
> But the way it is placed right now is misleading.
> 
> If we really want to get consistent zone ranges, we should start using
> zone's seqlock where it matters and that is pretty much all those
> places that use zone_spans_pfn().

Right, or even only zone_end_pfn() to get a consistent value.

> Otherwise there is no way you can be sure the pfn you're checking is
> within the limits. Moreover, as Michal pointed out early, if we really
> want to go down that road the locking should be made in the caller
> evolving the operation, otheriwse things might change once the lock
> is dropped and you're working with a wrong assumption.
> 
> I can see arguments for both riping it out and doing it right (but none for
> the way it is right now).
> For riping it out, one could say that those races might not be fatal,
> as usually the pfn you're working with (the one you want to check falls
> within a certain range) you know is valid, so the worst can happen is
> you get false positives/negatives and that might or might not be detected
> further down. How bad are false positive/negatives I guess it depends on the
> situation, but we already do that right now.
> The zone_spans_pfn() from page_outside_zone_boundaries() is the only one using
> locking right now, so well, if we survided this long without locks in other places
> using zone_spans_pfn() makes one wonder if it is that bad.
> 
> On the other hand, one could argue that for correctness sake, we should be holding
> zone's seqlock whenever checking for zone_spans_pfn() to avoid any inconsistency.
> 
> 

IMHO, as we know the race exists and we have a tool to handle it in 
place, we should maybe fix the obvious cases if possible.

Code that uses zone->zone_start_pfn directly is unlikely to be broken on 
most architectures. We will usually read/write via single instruction 
and won't get inconsistencies, for example, when shrinking or growing 
the zone. We most probably don't want to use an atomic for that right now.

Code that uses zone->spanned_pages to detect the zone end, however, is 
more likely to be broken. I don't think we have any relevant around 
anymore. Everything was converted to zone_end_pfn().

I feel like we should just make zone_end_pfn() take the seqlock in read. 
Then, we at least get a consistent value, for example, while growing a zone.

Just imagine the following case when we grow a section to the front when 
onlining memory:

	zone->zone_start_pfn -= new_pages;
	zone->spanned_pages += new_pages;

Note that compilers/CPUs might reshuffle as they like. If someone (e.g., 
zone_spans_pfn()) races with that code, it might get new 
zone->zone_start_pfn but old zone->spanned_pages. zone_end_pfn() will 
report a "too small zone" and trigger false negatives in zone_spans_pfn().

-- 
Thanks,

David / dhildenb