linux-kernel - Re: [Patch 8/8] kexec: allow to shrink reserved memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1eirg5j9i.fsf@fess.ebiederm.org>
Date:	Wed, 12 Aug 2009 23:18:33 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Amerigo Wang <amwang@...hat.com>
Cc:	linux-kernel@...r.kernel.org, tony.luck@...el.com,
	linux-ia64@...r.kernel.org, linux-mm@...ck.org,
	Neil Horman <nhorman@...hat.com>,
	Andi Kleen <andi@...stfloor.org>, akpm@...ux-foundation.org,
	bernhard.walle@....de, Fenghua Yu <fenghua.yu@...el.com>,
	Ingo Molnar <mingo@...e.hu>,
	Anton Vorontsov <avorontsov@...mvista.com>
Subject: Re: [Patch 8/8] kexec: allow to shrink reserved memory

Amerigo Wang <amwang@...hat.com> writes:

> Eric W. Biederman wrote:
>> Amerigo Wang <amwang@...hat.com> writes:
>>
>>   
>>> This patch implements shrinking the reserved memory for crash kernel,
>>> if it is more than enough.
>>>
>>> For example, if you have already reserved 128M, now you just want 100M,
>>> you can do:
>>>
>>> # echo $((100*1024*1024)) > /sys/kernel/kexec_crash_size
>>>     
>>
>> Getting closer (comments inline)
>>
>> Semantically this patch is non-contriversial and pretty
>> simple, but still needs a fair amount of review.  Can
>> you put this patch at the front of your patch set.
>>
>>   
>
> Sure, I will do it when I resend them next time.
>
> I add mm people into Cc.
>>> Index: linux-2.6/kernel/kexec.c
>>> ===================================================================
>>> --- linux-2.6.orig/kernel/kexec.c
>>> +++ linux-2.6/kernel/kexec.c
>>> @@ -1083,6 +1083,76 @@ void crash_kexec(struct pt_regs *regs)
>>>  	}
>>>  }
>>>  +int kexec_crash_kernel_loaded(void)
>>> +{
>>> +	int ret;
>>> +	if (!mutex_trylock(&kexec_mutex))
>>> +		return 1;
>>>     
>>
>> We don't need trylock on this code path   
>
> OK.
>
>>   
>>> +	ret = kexec_crash_image != NULL;
>>> +	mutex_unlock(&kexec_mutex);
>>> +	return ret;
>>> +}
>>> +
>>> +size_t get_crash_memory_size(void)
>>> +{
>>> +	size_t size;
>>> +	if (!mutex_trylock(&kexec_mutex))
>>> +		return 1;
>>>     
>>
>> We don't need trylock on this code path 
>>
>>   
>
> Hmm, crashk_res is a global struct, so other process can also
> change it... but currently no process does that, right?
>

We still need the lock.  Just doing trylock doesn't instead
of just sleeping doesn't seem to make any sense on these
code paths.

>>> +	size = crashk_res.end - crashk_res.start + 1;
>>> +	mutex_unlock(&kexec_mutex);
>>> +	return size;
>>> +}
>>> +
>>> +int shrink_crash_memory(unsigned long new_size)
>>> +{
>>> +	struct page **pages;
>>> +	int ret = 0;
>>> +	int  npages, i;
>>> +	unsigned long addr;
>>> +	unsigned long start, end;
>>> +	void *vaddr;
>>> +
>>> +	if (!mutex_trylock(&kexec_mutex))
>>> +		return -EBUSY;
>>>     
>>
>> We don't need trylock on this code path 
>>
>> We are missing the check to see if the crash_kernel is loaded
>> under this lock instance. So I please move the kexec_crash_image != NULL
>> test inline here and kill the kexec_crash_kernel_loaded function.
>>   
>
> Ok, no problem.
>
>>   
>>> +	start = crashk_res.start;
>>> +	end = crashk_res.end;
>>> +
>>> +	if (new_size >= end - start + 1) {
>>> +		ret = -EINVAL;
>>> +		if (new_size == end - start + 1)
>>> +			ret = 0;
>>> +		goto unlock;
>>> +	}
>>> +
>>> +	start = roundup(start, PAGE_SIZE);
>>> +	end = roundup(start + new_size, PAGE_SIZE) - 1;
>>> +	npages = (end + 1 - start ) / PAGE_SIZE;
>>> +
>>> +	pages = kmalloc(sizeof(struct page *) * npages, GFP_KERNEL);
>>> +	if (!pages) {
>>> +		ret = -ENOMEM;
>>> +		goto unlock;
>>> +	}
>>> +	for (i = 0; i < npages; i++) {
>>> +		addr = end + 1 + i * PAGE_SIZE;
>>> +		pages[i] = virt_to_page(addr);
>>> +	}
>>> +
>>> +	vaddr = vm_map_ram(pages, npages, 0, PAGE_KERNEL);
>>>     
>>
>> This is the wrong kernel call to use.  I expect this needs to look
>> like a memory hotplug event.  This does not put the pages into the
>> free page pool.
>>   
>
> Well, I also wanted to use an memory-hotplug API, but that will make the code
> depend on memory-hotplug, which certainly is not what we want...
>
> I checked the mm code, actually what I need is an API which is similar to
> add_active_range(), but add_active_range() can't be used here since it is marked
> as "__init".
>
> Do we have that kind of API in mm? I can't find one.

Perhaps we will need to remove __init from add_active_range.  I know the logic
but I'm not up to speed on the mm pieces at the moment.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/