[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1eirg5j9i.fsf@fess.ebiederm.org>
Date: Wed, 12 Aug 2009 23:18:33 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Amerigo Wang <amwang@...hat.com>
Cc: linux-kernel@...r.kernel.org, tony.luck@...el.com,
linux-ia64@...r.kernel.org, linux-mm@...ck.org,
Neil Horman <nhorman@...hat.com>,
Andi Kleen <andi@...stfloor.org>, akpm@...ux-foundation.org,
bernhard.walle@....de, Fenghua Yu <fenghua.yu@...el.com>,
Ingo Molnar <mingo@...e.hu>,
Anton Vorontsov <avorontsov@...mvista.com>
Subject: Re: [Patch 8/8] kexec: allow to shrink reserved memory
Amerigo Wang <amwang@...hat.com> writes:
> Eric W. Biederman wrote:
>> Amerigo Wang <amwang@...hat.com> writes:
>>
>>
>>> This patch implements shrinking the reserved memory for crash kernel,
>>> if it is more than enough.
>>>
>>> For example, if you have already reserved 128M, now you just want 100M,
>>> you can do:
>>>
>>> # echo $((100*1024*1024)) > /sys/kernel/kexec_crash_size
>>>
>>
>> Getting closer (comments inline)
>>
>> Semantically this patch is non-contriversial and pretty
>> simple, but still needs a fair amount of review. Can
>> you put this patch at the front of your patch set.
>>
>>
>
> Sure, I will do it when I resend them next time.
>
> I add mm people into Cc.
>>> Index: linux-2.6/kernel/kexec.c
>>> ===================================================================
>>> --- linux-2.6.orig/kernel/kexec.c
>>> +++ linux-2.6/kernel/kexec.c
>>> @@ -1083,6 +1083,76 @@ void crash_kexec(struct pt_regs *regs)
>>> }
>>> }
>>> +int kexec_crash_kernel_loaded(void)
>>> +{
>>> + int ret;
>>> + if (!mutex_trylock(&kexec_mutex))
>>> + return 1;
>>>
>>
>> We don't need trylock on this code path
>
> OK.
>
>>
>>> + ret = kexec_crash_image != NULL;
>>> + mutex_unlock(&kexec_mutex);
>>> + return ret;
>>> +}
>>> +
>>> +size_t get_crash_memory_size(void)
>>> +{
>>> + size_t size;
>>> + if (!mutex_trylock(&kexec_mutex))
>>> + return 1;
>>>
>>
>> We don't need trylock on this code path
>>
>>
>
> Hmm, crashk_res is a global struct, so other process can also
> change it... but currently no process does that, right?
>
We still need the lock. Just doing trylock doesn't instead
of just sleeping doesn't seem to make any sense on these
code paths.
>>> + size = crashk_res.end - crashk_res.start + 1;
>>> + mutex_unlock(&kexec_mutex);
>>> + return size;
>>> +}
>>> +
>>> +int shrink_crash_memory(unsigned long new_size)
>>> +{
>>> + struct page **pages;
>>> + int ret = 0;
>>> + int npages, i;
>>> + unsigned long addr;
>>> + unsigned long start, end;
>>> + void *vaddr;
>>> +
>>> + if (!mutex_trylock(&kexec_mutex))
>>> + return -EBUSY;
>>>
>>
>> We don't need trylock on this code path
>>
>> We are missing the check to see if the crash_kernel is loaded
>> under this lock instance. So I please move the kexec_crash_image != NULL
>> test inline here and kill the kexec_crash_kernel_loaded function.
>>
>
> Ok, no problem.
>
>>
>>> + start = crashk_res.start;
>>> + end = crashk_res.end;
>>> +
>>> + if (new_size >= end - start + 1) {
>>> + ret = -EINVAL;
>>> + if (new_size == end - start + 1)
>>> + ret = 0;
>>> + goto unlock;
>>> + }
>>> +
>>> + start = roundup(start, PAGE_SIZE);
>>> + end = roundup(start + new_size, PAGE_SIZE) - 1;
>>> + npages = (end + 1 - start ) / PAGE_SIZE;
>>> +
>>> + pages = kmalloc(sizeof(struct page *) * npages, GFP_KERNEL);
>>> + if (!pages) {
>>> + ret = -ENOMEM;
>>> + goto unlock;
>>> + }
>>> + for (i = 0; i < npages; i++) {
>>> + addr = end + 1 + i * PAGE_SIZE;
>>> + pages[i] = virt_to_page(addr);
>>> + }
>>> +
>>> + vaddr = vm_map_ram(pages, npages, 0, PAGE_KERNEL);
>>>
>>
>> This is the wrong kernel call to use. I expect this needs to look
>> like a memory hotplug event. This does not put the pages into the
>> free page pool.
>>
>
> Well, I also wanted to use an memory-hotplug API, but that will make the code
> depend on memory-hotplug, which certainly is not what we want...
>
> I checked the mm code, actually what I need is an API which is similar to
> add_active_range(), but add_active_range() can't be used here since it is marked
> as "__init".
>
> Do we have that kind of API in mm? I can't find one.
Perhaps we will need to remove __init from add_active_range. I know the logic
but I'm not up to speed on the mm pieces at the moment.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists