linux-kernel - Re: [PATCH 0/2] memcgroup: work better with tmpfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <476AB0DA.80804@linux.vnet.ibm.com>
Date:	Thu, 20 Dec 2007 23:43:46 +0530
From:	Balbir Singh <balbir@...ux.vnet.ibm.com>
To:	Hugh Dickins <hugh@...itas.com>
CC:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] memcgroup: work better with tmpfs

Hugh Dickins wrote:
> On Wed, 19 Dec 2007, Balbir Singh wrote:
>> Hugh Dickins wrote:
>>
>> We always call mem_cgroup_isolate_pages() from shrink_(in)active_pages
>> under spin_lock_irq of the zone's lru lock. That's the reason that we
>> don't explicitly use it in the routine.
> 
> Indeed, thanks.
> 
>>> 2. There's mem_cgroup_charge and mem_cgroup_cache_charge (wouldn't the
>>> former be better called mem_cgroup_charge_mapped? why does the latter
>> Yes, it would be. After we've refactored the code, the new name makes sense.
>>
>>> test MEM_CGROUP_TYPE_ALL instead of MEM_CGROUP_TYPE_CACHED? I still don't
>>> understand your enums there).
>> We do that to ensure that we charge page cache pages only when the
>> accounting type is set to MEM_CGROUP_TYPE_ALL. If the type is anything
>> else, we ignore cached pages, we did not have MEM_CGROUP_TYPE_CACHED
>> initially when the patches went in.
> 
> I think you've given yourself far too many degrees of freedom here.
> 
> Please explain to me how the system is supposed to behave when I
> echo -n 2 >/cg/0/memory.control_type
> 
> From the name in the enum (MEM_CGROUP_TYPE_CACHED, doesn't even have
> an "= 2", yet this is a part of the API?), I think it's supposed to
> account pages into and out of the page cache, but not as they're
> mapped into and out of userspace.  But from the code, no references
> whatever to MEM_CGROUP_TYPE_CACHED or MEM_CGROUP_TYPE_MAPPED,
> it looks as if it'll behave just like MEM_CGROUP_TYPE_MAPPED
> (which we hope = 1).
> 
> As you say, you didn't have MEM_CGROUP_TYPE_CACHED originally:
> it looks like it got added to the straightforward case of MAPPED,
> in an apparently flexible but not fully thought out way.
> 

Yes, I think your argument makes sense. I had initially thought of
making the enums a mask

MEM_CGROUP_TYPE_MAPPED = 0x1,
MEM_CGROUP_TYPE_CACHED = 0x2,
MEM_CGROUP_TYPE_ALL = 0x3

and check for control_type & MEM_CGROUP_TYPE_CACHED


>>  But there's only mem_cgroup_uncharge.
>>> So when, for example, an add_to_page_cache fails, the uncharge may not
>>> balance the charge?
>>>
>> We use mem_cgroup_uncharge() everywhere. The reason being, we might
>> switch control type, we uncharge pages that have a page_cgroup
>> associated with them, hence once we;ve charged, uncharge does not
>> distinguish between charge types.
> 
> Ah, so this is the meaning of that 
> 	/*
> 	 * This can handle cases when a page is not charged at all and we
> 	 * are switching between handling the control_type.
> 	 */
> 	if (!pc)
> 		return;
> 
> 	if (atomic_dec_and_test(&pc->ref_cnt)) {
> at the beginning of mem_cgroup_uncharge.
> 
> Sorry, that doesn't fly.  You've no locking between acquiring pc from
> the page->page_cgroup, testing pc here, and decrementing its ref_cnt.
> When that ref_cnt goes down to zero, clear_page_cgroup may NULLify
> page->page_cgroup and kfree the pc.
> 

Yes, we need to lock the page group.


> So if you cannot really keep track of the ref_cnt (because of
> changing control_type), that atomic_dec_and_test is in danger
> of corrupting someone else's memory - isn't it?
>

Yes, my bad :(

> I can just about imagine that some admins will want control_type
> MAPPED and others CACHED and others MAPPED+CACHED.  But is there
> actually a need for one cgroup to be controlling MAPPED while
> another on the same machine is controlling MAPPED+CACHED?  And
> does that make sense - there'll be weirdnesses, won't there?
> And is there actually a need to change a cgroup's control_type
> on the fly while it's alive?
> 
> Of course it's nice to be flexible and allow for such possibilities;
> but not if that just opens windows for corruption.  My own view is
> that at present you should just account mapped and cached for all,
> and strip out these extra degrees of freedom: add them back in some
> future when the locking is worked out.
> 


I am going to rip out the control_type interface for now. The locking
needs fixing irrespective of control_type. I am sending out a patch to
rip out control_type.

> Hugh

Thanks for your detailed comments,

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/