[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63738a00-a961-2b2b-b60d-1d517eab7216@huawei.com>
Date: Wed, 10 Apr 2024 09:44:59 +0800
From: Zheng Yejian <zhengyejian1@...wei.com>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
CC: <naveen.n.rao@...ux.ibm.com>, <anil.s.keshavamurthy@...el.com>,
	<davem@...emloft.net>, <linux-kernel@...r.kernel.org>,
	<linux-trace-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] kprobes: Avoid possible warn in __arm_kprobe_ftrace()
On 2024/4/9 21:49, Masami Hiramatsu (Google) wrote:
> On Tue, 9 Apr 2024 14:20:45 +0800
> Zheng Yejian <zhengyejian1@...wei.com> wrote:
> 
>> On 2024/4/8 20:41, Masami Hiramatsu (Google) wrote:
>>> Hi Zheng,
>>>
>>> On Mon, 8 Apr 2024 16:34:03 +0800
>>> Zheng Yejian <zhengyejian1@...wei.com> wrote:
>>>
>>>> There is once warn in __arm_kprobe_ftrace() on:
>>>>
>>>>    ret = ftrace_set_filter_ip(ops, (unsigned long)p->addr, 0, 0);
>>>>    if (WARN_ONCE(..., "Failed to arm kprobe-ftrace at %pS (error %d)\n", ...)
>>>>      return ret;
>>>>
>>>> This warning is generated because 'p->addr' is detected to be not a valid
>>>> ftrace location in ftrace_set_filter_ip(). The ftrace address check is done
>>>> by check_ftrace_location() at the beginning of check_kprobe_address_safe().
>>>> At that point, ftrace_location(addr) == addr should return true if the
>>>> module is loaded. Then the module is searched twice:
>>>>     1. in is_module_text_address(), we find that 'p->addr' is in a module;
>>>>     2. in __module_text_address(), we find the module;
>>>>
>>>> If the module has just been unloaded before the second search, then
>>>> '*probed_mod' is NULL and we would not go to get the module refcount,
>>>> then the return value of check_kprobe_address_safe() would be 0, but
>>>> actually we need to return -EINVAL.
>>>
>>> OK, so you found a race window in check_kprobe_address_safe().
>>>
>>> It does something like below.
>>>
>>> check_kprobe_address_safe() {
>>> 	...
>>>
>>> 	/* Timing [A] */
>>>
>>> 	if (!(core_kernel_text(p->addr) ||
>>> 		is_module_text_address(p->addr)) ||
>>> 		...(other reserved address check)) {
>>> 		return -EINVAL;
>>> 	}
>>>
>>> 	/* Timing [B] */
>>>
>>> 	*probed_mod = __module_text_address(p->addr):
>>> 	if (*probe_mod) {
>>> 		if (!try_module_get(*probed_mod)) {
>>> 			return -ENOENT;
>>> 		}
>>> 		...	
>>> 	}
>>> }
>>>
>>> So, if p->addr is in a module which is alive at the timing [A], but
>>> unloaded at timing [B], 'p->addr' is passed the
>>> 'is_module_text_address(p->addr)' check, but *probed_mod becomes NULL.
>>> Thus the corresponding module is not referenced and kprobe_arm(p) will
>>> access a wrong address (use after free).
>>> This happens either kprobe on ftrace is enabled or not.
>>
>> Yes, This is the problem. And for this case, check_kprobe_address_safe()
>> still return 0, and then going on to arm kprobe may cause problems. So
>> we should make check_kprobe_address_safe() return -EINVAL when refcount
>> of the module is not got.
> 
> Yes,
> 
>>
>>>
>>> To fix this problem, we should move the mutex_lock(kprobe_mutex) before
>>> check_kprobe_address_safe() because kprobe_module_callback() also lock it
>>> so it can stop module unloading.
>>>
>>> Can you ensure this will fix your problem?
>>
>> It seems not, the warning in __arm_kprobe_ftrace() still occurs. I
>> contrived following simple test:
>>
>>       #!/bin/bash
>>       sysctl -w kernel.panic_on_warn=1
>>       while [ True ]; do
>>           insmod mod.ko    # contain function 'foo'
>>           rmmod mod.ko
>>       done &
>>       while [ True ]; do
>>           insmod kprobe.ko  # register kprobe on function 'foo'
>>           rmmod kprobe.ko
>>       done &
>>
>> I think holding kprobe_mutex cannot make sure we get the refcount of the
>> module.
> 
> Aah, yes, it cannot, because the kallsyms in a module will be removed
> after module->state becomes MODULE_STATE_UNFORMED. Before UNFORMED,
> the state is MODULE_STATE_GOING and the kprobe_module_callback() is
> called at that point. Thus, the following scenario happens.
> 
>               CPU1 					CPU2
> 
> mod->state = MODULE_STATE_GOING
> kprobe_module_callback() {
> 	mutex_lock(&kprobe_mutex)
> 		loop on kprobe_table
> 		to disable kprobe in the module.
> 	mutex_unlock(&kprobe_mutex)
> }
> 						register_kprobe(p) {
> 							mutex_lock(&kprobe_mutex)
> 							check_kprobe_address_safe(p->addr) {
> 								[A'']
> 								is_module_text_address() return true
> 								until mod->state == UNFORMED.
> mod->state = MODULE_STATE_UNFORMED
> 								[B'']
> 								__module_text_address() returns NULL.
> 							}
> 							p is on the kprobe_table.
> 							mutex_unlock(&kprobe_mutex)
> 
> So, as your fix, if we save the module at [A''] and use it at [B''],
> the mod is NOT able to get because mod->state != MODULE_STATE_LIVE.
> 
> 
>>
>>> I think your patch is just optimizing but not fixing the fundamental
>>> problem, which is we don't have an atomic search symbol and get module
>>
>> Sorry, this patch is a little confusing, but it is not just optimizing :)
>>
>> As shown below, after my patch, if p->addr is in a module which is alive
>> at the timing [A'] but unloaded at timing [B'], then *probed_mod must
>> not be NULL. Then after timing [B'], it will go to try_module_get() and
>> expected to fail and return -ENOENT. So this is the different.
>>
>>       check_kprobe_address_safe() {
>>           ...
>>           *probed_mod = NULL;
>>           if (!core_kernel_text((unsigned long) p->addr)) {
>>
>>               /* Timing [A'] */
>>
>>               *probed_mod = __module_text_address((unsigned long) p->addr);
>>               if (!(*probed_mod)) {
>>                   return -EINVAL;
>>               }
>>           }
>>           ...
>>
>>           /* Timing [B'] */
>>
>>           if (*probed_mod) {
>>               if (!try_module_get(*probed_mod)) {
>>                   return -ENOENT;
>>               }
>>               ...
>>           }
> 
> OK, I got it. Hmm, but this is a bit long story to explain, the
> root cause is the delay of module unloading process. So more
> precisely, we can explain it as below.
> 
> ----
> When unloading a module, its state is changing MODULE_STATE_LIVE ->
>   MODULE_STATE_GOING -> MODULE_STATE_UNFORMED. Each change will take
> a time. `is_module_text_address()` and `__module_text_address()`
> works with MODULE_STATE_LIVE and MODULE_STATE_GOING.
> If we use `is_module_text_address()` and `__module_text_address()`
> separately, there is a chance that the first one is succeeded but the
> next one is failed because module->state becomes MODULE_STATE_UNFORMED
> between those operations.
> 
> In `check_kprobe_address_safe()`, if the second `__module_text_address()`
> is failed, that is ignored because it expected a kernel_text address.
> But it may have failed simply because module->state has been changed
> to MODULE_STATE_UNFORMED. In this case, arm_kprobe() will try to modify
> non-exist module text address (use-after-free).
> 
> To fix this problem, we should not use separated `is_module_text_address()`
> and `__module_text_address()`, but use only `__module_text_address()` once
> and do `try_module_get(module)` which is only available with
> MODULE_STATE_LIVE.
> ----
> 
> Would it be good for you too? The code itself looks good to me now :-)
Yes, of course :)
> 
> Thank you!
> 
>>
>>> API. In that case, we should stop a whole module unloading system until
>>> registering a new kprobe on a module. (After registering the kprobe,
>>> the callback can mark it gone and disarm_kprobe does not work anymore.)
>>>
>>> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
>>> index 9d9095e81792..94eaefd1bc51 100644
>>> --- a/kernel/kprobes.c
>>> +++ b/kernel/kprobes.c
>>> @@ -1633,11 +1633,11 @@ int register_kprobe(struct kprobe *p)
>>>    	p->nmissed = 0;
>>>    	INIT_LIST_HEAD(&p->list);
>>>    
>>> +	mutex_lock(&kprobe_mutex);
>>> +
>>>    	ret = check_kprobe_address_safe(p, &probed_mod);
>>>    	if (ret)
>>> -		return ret;
>>> -
>>> -	mutex_lock(&kprobe_mutex);
>>> +		goto out;
>>>    
>>>    	if (on_func_entry)
>>>    		p->flags |= KPROBE_FLAG_ON_FUNC_ENTRY;
>>>
>>> ----
>>>
>>> Thank you,
>>>
>>>>
>>>> To fix it, originally we can simply check 'p->addr' is out of text again,
>>>> like below. But that would check twice respectively in kernel text and
>>>> module text, so finally I reduce them to be once.
>>>>
>>>>     if (!(core_kernel_text((unsigned long) p->addr) ||
>>>>         is_module_text_address((unsigned long) p->addr)) || ...) {
>>>> 	ret = -EINVAL;
>>>> 	goto out;
>>>>     }
>>>>     ...
>>>>     *probed_mod = __module_text_address((unsigned long) p->addr);
>>>>     if (*probed_mod) {
>>>> 	...
>>>>     } else if (!core_kernel_text((unsigned long) p->addr)) { // check again!
>>>> 	ret = -EINVAL;
>>>> 	goto out;
>>>>     }
>>>>
>>>> Signed-off-by: Zheng Yejian <zhengyejian1@...wei.com>
>>>> ---
>>>>    kernel/kprobes.c | 18 ++++++++++++------
>>>>    1 file changed, 12 insertions(+), 6 deletions(-)
>>>>
>>>> v2:
>>>>    - Update commit messages and comments as suggested by Masami.
>>>>      Link: https://lore.kernel.org/all/20240408115038.b0c85767bf1f249eccc32fff@kernel.org/
>>>>
>>>> v1:
>>>>    - Link: https://lore.kernel.org/all/20240407035904.2556645-1-zhengyejian1@huawei.com/
>>>>
>>>> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
>>>> index 9d9095e81792..65adc815fc6e 100644
>>>> --- a/kernel/kprobes.c
>>>> +++ b/kernel/kprobes.c
>>>> @@ -1567,10 +1567,17 @@ static int check_kprobe_address_safe(struct kprobe *p,
>>>>    	jump_label_lock();
>>>>    	preempt_disable();
>>>>    
>>>> -	/* Ensure it is not in reserved area nor out of text */
>>>> -	if (!(core_kernel_text((unsigned long) p->addr) ||
>>>> -	    is_module_text_address((unsigned long) p->addr)) ||
>>>> -	    in_gate_area_no_mm((unsigned long) p->addr) ||
>>>> +	/* Ensure the address is in a text area, and find a module if exists. */
>>>> +	*probed_mod = NULL;
>>>> +	if (!core_kernel_text((unsigned long) p->addr)) {
>>>> +		*probed_mod = __module_text_address((unsigned long) p->addr);
>>>> +		if (!(*probed_mod)) {
>>>> +			ret = -EINVAL;
>>>> +			goto out;
>>>> +		}
>>>> +	}
>>>> +	/* Ensure it is not in reserved area. */
>>>> +	if (in_gate_area_no_mm((unsigned long) p->addr) ||
>>>>    	    within_kprobe_blacklist((unsigned long) p->addr) ||
>>>>    	    jump_label_text_reserved(p->addr, p->addr) ||
>>>>    	    static_call_text_reserved(p->addr, p->addr) ||
>>>> @@ -1580,8 +1587,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
>>>>    		goto out;
>>>>    	}
>>>>    
>>>> -	/* Check if 'p' is probing a module. */
>>>> -	*probed_mod = __module_text_address((unsigned long) p->addr);
>>>> +	/* Get module refcount and reject __init functions for loaded modules. */
>>>>    	if (*probed_mod) {
>>>>    		/*
>>>>    		 * We must hold a refcount of the probed module while updating
>>>> -- 
>>>> 2.25.1
>>>>
>>>
>> --
>> Thanks
>> Zheng Yejian
>>>
>>
>>
> 
> 
Powered by blists - more mailing lists
 
