lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 3 Jun 2021 10:38:24 +0200
From:   Oscar Salvador <osalvador@...e.de>
To:     David Hildenbrand <david@...hat.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Michal Hocko <mhocko@...nel.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Pavel Tatashin <pasha.tatashin@...een.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get
 stable zone's values

On Wed, Jun 02, 2021 at 09:45:58PM +0200, Oscar Salvador wrote:
> It was too nice and easy to be true I guess.
> Yeah, I missed the point that blocking might be an issue here, and hotplug
> operations can take really long, so not an option.
> I must have switched my brain off back there, because now it is just too
> obvious.
> 
> Then I guwmess that seqlock must stay and the only thing than can go is the
> pgdat resize lock from the hotplug code.

So, I have been looking into this again.
Of course, the approach taken here is outrageously wrong, but there are
some other things that are a bit confusing.

As pointed out, bad_range() (the function that ends up calling
page_outside_zone_boundaries) is called from different functions via VM_BUG_ON_*.
page_outside_zone_boundaries() takes care of taking the seqlock to avoid
reading stale values that might happen if we race with memhotplug
operations.
page_outside_zone_boundaries() calls zone_spans_pfn() to check that.
 
Now on the funny thing.

We do have several places happily calling zone_spans_pfn() without
holding zone's seqlock, e.g:

set_pageblock_migratetype
 set_pfnblock_flags_mask
  zone_spans_pfn

move_freepages_block
 zone_spans_pfn

alloc_contig_pages
 zone_spans_last_pfn
  zone_spans_pfn

Those places hold zone->lock, while move_pfn_range_to_zone() and
remove_pfn_range_from_zone() hold zone->seqlock, so AFAICS, those places
could read a stale value and proceed thinking the range is within the
zone while it is not.

So I guess my question is, should we force those places to take the
seqlock reader as we do in page_outside_zone_boundaries(), (or maybe
just move the seqlock handling to zone_spans_pfn())?

Because I does not make much sense to take it in a VM_DEBUG context and
not in "real life".

Thoughts?

-- 
Oscar Salvador
SUSE L3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ