linux-kernel - Re: [PATCH 2/2] Make is_mem_section_removable more conformable with offlining code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20100903190520.8751aab6.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Fri, 3 Sep 2010 19:05:20 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Michal Hocko <mhocko@...e.cz>
Cc:	Hiroyuki Kamezawa <kamezawa.hiroyuki@...il.com>,
	Wu Fengguang <fengguang.wu@...el.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Kleen, Andi" <andi.kleen@...el.com>,
	Haicheng Li <haicheng.li@...ux.intel.com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Mel Gorman <mel@...ux.vnet.ibm.com>
Subject: Re: [PATCH 2/2] Make is_mem_section_removable more conformable with
 offlining code

On Fri, 3 Sep 2010 11:50:49 +0200
Michal Hocko <mhocko@...e.cz> wrote:

> On Fri 03-09-10 18:13:27, KAMEZAWA Hiroyuki wrote:
> > On Fri, 3 Sep 2010 10:25:58 +0200
> > Michal Hocko <mhocko@...e.cz> wrote:
> > 
> > > On Fri 03-09-10 12:14:52, KAMEZAWA Hiroyuki wrote:
> > > [...]
> [...]
> > > Cannot ZONE_MOVABLE contain different MIGRATE_types?
> > > 
> > never.
> 
> Then I am terribly missing something. Zone contains free lists for
> different MIGRATE_TYPES, doesn't it? Pages allocated from those free
> lists keep the migration type of the list, right?
> 
> ZONE_MOVABLE just says whether it makes sense to move pages in that zone
> at all, right?
> 

 ZONE_MOVABLE means "it only contains MOVABLE pages."
 So, we can ignore migrate-type.





> > 
> > > > +
> > > > +	pfn = page_to_pfn(page);
> > > > +	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
> > > > +		unsigned long check = pfn + iter;
> > > > +
> > > > +		if (!pfn_valid_within(check)) {
> > > > +			iter++;
> > > > +			continue;
> > > > +		}
> > > > +		page = pfn_to_page(check);
> > > > +		if (!page_count(page)) {
> > > > +			if (PageBuddy(page))
> > > 
> > > Why do you check page_count as well? PageBuddy has alwyas count==0,
> > > right?
> > > 
> > 
> > But PageBuddy() flag is considered to be valid only when page_count()==0.
> > This is for safe handling.
> 
> OK. I don't see that documented anywhere but it makes sense. Anyway
> there are some places which don't do this test (e.g.
> isolate_freepages_block, suitable_migration_target, etc.).
> 
> > 
> > 
> > > > +				iter += (1 << page_order(page)) - 1;
> > > > +			continue;
> > > > +		}
> > > > +		if (!PageLRU(page))
> > > > +			found++;
> > > > +		/*
> > > > +		 * If the page is not RAM, page_count()should be 0.
> > > > +		 * we don't need more check. This is an _used_ not-movable page.
> > > > +		 *
> > > > +		 * The problematic thing here is PG_reserved pages. But if
> > > > +		 * a PG_reserved page is _used_ (at boot), page_count > 1.
> > > > +		 * But...is there PG_reserved && page_count(page)==0 page ?
> > > 
> > > Can we have PG_reserved && PG_lru? 
> > 
> > I think never.
> > 
> > > I also quite don't understand the comment. 
> > 
> > There an issue that "remove an memory section which includes memory hole".
> > Then,
> > 
> >    a page used by bootmem .... PG_reserved.
> >    a page of memory hole  .... PG_reserved.
> > 
> > We need to call page_is_ram() or some for handling this mess.
> 
> OK, I see.
> 
> > 
> > 
> > > At this place we are sure that the page is valid and neither
> > > free nor LRU.
> > > 
> [...]
> > > > +bool is_pageblock_removable(struct page *page)
> > > > +{
> > > > +	struct zone *zone = page_zone(page);
> > > > +	unsigned long flags;
> > > > +	int num;
> > > > +
> > > > +	spin_lock_irqsave(&zone->lock, flags);
> > > > +	num = __count_unmovable_pages(zone, page);
> > > > +	spin_unlock_irqrestore(&zone->lock, flags);
> > > 
> > > Isn't this a problem? The function is triggered from userspace by sysfs
> > > (0444 file) and holds the lock for pageblock_nr_pages. So someone can
> > > simply read the file and block the zone->lock preventing/delaying
> > > allocations for the rest of the system.
> > > 
> > But we need to take this. Maybe no panic you'll see even if no-lock.
> 
> Yes, I think that this can only lead to a false possitive in sysfs
> interface. Isolating code holds the lock.
> 

ok, let's go step by step.

I'm ok that your new patch to be merged. I'll post some clean up and small
bugfix (not related to your patch), later.
(I'll be very busy in this weekend, sorry.)


Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/