linux-kernel - Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <a1fe74b9-5531-5cbb-f24a-437ac7534904@redhat.com>
Date:   Wed, 9 Jun 2021 11:42:14 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Oscar Salvador <osalvador@...e.de>
Cc:     Michal Hocko <mhocko@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Pavel Tatashin <pasha.tatashin@...een.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get
 stable zone's values

On 08.06.21 17:00, David Hildenbrand wrote:
> On 07.06.21 12:23, Oscar Salvador wrote:
>> On Mon, Jun 07, 2021 at 10:49:01AM +0200, David Hildenbrand wrote:
>>> I'd like to point out that I think the seqlock is not in place to
>>> synchronize with actual growing/shrinking but to get consistent zone ranges
>>> -- like using atomics, but we have two inter-dependent values here.
>>
>> I guess so, at least that's what it should do.
>> But the way it is placed right now is misleading.
>>
>> If we really want to get consistent zone ranges, we should start using
>> zone's seqlock where it matters and that is pretty much all those
>> places that use zone_spans_pfn().
> 
> Right, or even only zone_end_pfn() to get a consistent value.
> 
>> Otherwise there is no way you can be sure the pfn you're checking is
>> within the limits. Moreover, as Michal pointed out early, if we really
>> want to go down that road the locking should be made in the caller
>> evolving the operation, otheriwse things might change once the lock
>> is dropped and you're working with a wrong assumption.
>>
>> I can see arguments for both riping it out and doing it right (but none for
>> the way it is right now).
>> For riping it out, one could say that those races might not be fatal,
>> as usually the pfn you're working with (the one you want to check falls
>> within a certain range) you know is valid, so the worst can happen is
>> you get false positives/negatives and that might or might not be detected
>> further down. How bad are false positive/negatives I guess it depends on the
>> situation, but we already do that right now.
>> The zone_spans_pfn() from page_outside_zone_boundaries() is the only one using
>> locking right now, so well, if we survided this long without locks in other places
>> using zone_spans_pfn() makes one wonder if it is that bad.
>>
>> On the other hand, one could argue that for correctness sake, we should be holding
>> zone's seqlock whenever checking for zone_spans_pfn() to avoid any inconsistency.
>>
>>
> 
> IMHO, as we know the race exists and we have a tool to handle it in
> place, we should maybe fix the obvious cases if possible.
> 
> Code that uses zone->zone_start_pfn directly is unlikely to be broken on
> most architectures. We will usually read/write via single instruction
> and won't get inconsistencies, for example, when shrinking or growing
> the zone. We most probably don't want to use an atomic for that right now.
> 
> Code that uses zone->spanned_pages to detect the zone end, however, is
> more likely to be broken. I don't think we have any relevant around
> anymore. Everything was converted to zone_end_pfn().
> 
> I feel like we should just make zone_end_pfn() take the seqlock in read.
> Then, we at least get a consistent value, for example, while growing a zone.
> 
> Just imagine the following case when we grow a section to the front when
> onlining memory:
> 
> 	zone->zone_start_pfn -= new_pages;
> 	zone->spanned_pages += new_pages;
> 
> Note that compilers/CPUs might reshuffle as they like. If someone (e.g.,
> zone_spans_pfn()) races with that code, it might get new
> zone->zone_start_pfn but old zone->spanned_pages. zone_end_pfn() will
> report a "too small zone" and trigger false negatives in zone_spans_pfn().
> 

Thinking again, we could of course also simply convert to 
zone->zone_start_+ pfn zone->zone_end_pfn. Places that need 
spanned_pages() would have the same issue, but I think they are rather a 
concern case.

-- 
Thanks,

David / dhildenb