linux-kernel - Re: [PATCH 2/2] livepatch: fix patched module loading race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150304131752.GD15177@pathway.suse.cz>
Date:	Wed, 4 Mar 2015 14:17:52 +0100
From:	Petr Mladek <pmladek@...e.cz>
To:	Josh Poimboeuf <jpoimboe@...hat.com>
Cc:	Seth Jennings <sjenning@...hat.com>, Jiri Kosina <jkosina@...e.cz>,
	Vojtech Pavlik <vojtech@...e.cz>,
	live-patching@...r.kernel.org, linux-kernel@...r.kernel.org,
	Rusty Russell <rusty@...tcorp.com.au>,
	Miroslav Benes <mbenes@...e.cz>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	mingo@...nel.org, mathieu.desnoyers@...icios.com, oleg@...hat.com,
	paulmck@...ux.vnet.ibm.com, andi@...stfloor.org,
	rostedt@...dmis.org, tglx@...utronix.de
Subject: Re: [PATCH 2/2] livepatch: fix patched module loading race

On Tue 2015-03-03 17:02:22, Josh Poimboeuf wrote:
> It's possible for klp_register_patch() to see a module before the COMING
> notifier is called, or after the GOING notifier is called.
> 
> That can cause all kinds of ugly races.  As Pter Mladek reported:
> 
>   "The problem is that we do not keep the klp_mutex lock all the time when
>   the module is being added or removed.
> 
>   First, the module is visible even before ftrace is ready. If we enable a patch
>   in this time frame, adding ftrace ops will fail and the patch will get rejected
>   just because bad timing.

Ah, this is not true after all. I did not properly check when
MODULE_STATE_COMING was set. I though that it was before ftrace was
initialized but it was not true.


>   Second, if we are "lucky" and enable the patch for the coming module when the
>   ftrace is ready but before the module notifier has been called. The notifier
>   will try to enable the patch as well. It will detect that it is already patched,
>   return error, and the module will get rejected just because bad
>   timing. The more serious problem is that it will not call the notifier for
>   going module, so that the mess will stay there and we wont be able to load
>   the module later.

Ah, the race is there but the effect is not that serious in the
end. It seems that errors from module notifiers are ignored. In fact,
we do not propagate the error from klp_module_notify_coming(). It means
that WARN() from klp_enable_object() will be printed but the module
will be loaded and patched.

I am sorry, I was confused by kGraft where kgr_module_init() was
called directly from module_load(). The errors were propagated. It
means that kGraft rejects module when the patch cannot be applied.

Note that the current solution is perfectly fine for the simple
consistency model.


>   Third, similar problems are there for going module. If a patch is enabled after
>   the notifier finishes but before the module is removed from the list of modules,
>   the new patch will be applied to the module. The module might disappear at
>   anytime when the patch enabling is in progress, so there might be an access out
>   of memory. Or the whole patch might be applied and some mess will be left,
>   so it will not be possible to load/patch the module again."

This is true.


> Fix these races by letting the first one who sees the module do the
> needed work.
> 
> Reported-by: Petr Mladek <pmladek@...e.cz>
> Signed-off-by: Josh Poimboeuf <jpoimboe@...hat.com>
> ---
>  kernel/livepatch/core.c | 53 +++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index a74e4e8..19a758c 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -89,16 +89,42 @@ static bool klp_is_object_loaded(struct klp_object *obj)
>  /* sets obj->mod if object is not vmlinux and module is found */
>  static void klp_find_object_module(struct klp_object *obj)
>  {
> +	struct module *mod;
> +
>  	if (!klp_is_module(obj))
>  		return;
>  
>  	mutex_lock(&module_mutex);
> +
>  	/*
>  	 * We don't need to take a reference on the module here because we have
>  	 * the klp_mutex, which is also taken by the module notifier.  This
>  	 * prevents any module from unloading until we release the klp_mutex.
>  	 */
> -	obj->mod = find_module(obj->name);
> +	mod = find_module(obj->name);
> +
> +	/*
> +	 * Be careful here to prevent races with the notifier:
> +	 *
> +	 * - MODULE_STATE_COMING: This may be before or after the notifier gets
> +	 *   called.  If after, the notifier didn't see the object anyway
> +	 *   because it hadn't been added to the patch list yet.  Either way,
> +	 *   ftrace is already initialized, so it's safe to just go ahead and
> +	 *   initialize the object here.

Well, we need to be careful. This will handle only the newly
registered patch. If there are other patches against the module
on the stack, it might produce wrong order at func_stack, see below.


> +	 *
> +	 * - MODULE_STATE_GOING: Similar to MODULE_STATE_COMING, this may be
> +	 *   before or after the notifier gets called.  If after, the notifer
> +	 *   didn't see the object anyway.  Either way it's safe to just
> +	 *   pretend that it's already gone and leave obj->mod at NULL.

This is true for the current simple consistency model.

Note that there will be a small race window if we allow dependency
between patched functions and introduce more a complex consistency model
with lazy patching. The problem is the following sequence:


CPU0					CPU1

delete_module()  #SYSCALL

   try_stop_module()
     mod->state = MODULE_STATE_GOING;

   mutex_unlock(&module_mutex);

					klp_register_patch()
					klp_enable_patch()

					#save place to switch universe

					b()     # from module that is going
					  a()   # from core (patched)


   mod->exit();


Note that the function b() can be called until we call mod->exit().

If we do not apply patch against b() because it is in
MODULE_STATE_GOING. It will call patched a() with modified semantic
and things might get wrong.


Well, the above described race is rather theoretical. It will happen
only when a patched module is being removed and a module with a patch
is added at the same time. Also it means that some other CPU will
manage to register, enable the patch, switch the universe, and
call function from the patched module during the small race window.

I am not sure if we need to handle such a type of race if the solution
adds too big complexity.


> +	 *   MODULE_STATE_LIVE: The common case.  The module already finished
> +	 *   loading.  Just like with MODULE_STATE_COMING, we know the notifier
> +	 *   didn't see the object, so we init it here.
> +	 */
> +	if (mod && (mod->state == MODULE_STATE_COMING ||
> +		    mod->state == MODULE_STATE_LIVE))
> +		obj->mod = mod;
> +
>  	mutex_unlock(&module_mutex);
>  }
>  
> @@ -695,8 +721,6 @@ static void klp_free_object_loaded(struct klp_object *obj)
>  {
>  	struct klp_func *func;
>  
> -	obj->mod = NULL;
> -
>
>  	for (func = obj->funcs; func->old_name; func++)
>  		func->old_addr = 0;
>  }
> @@ -765,6 +789,7 @@ static int klp_init_object(struct klp_patch *patch, struct klp_object *obj)
>  		return -EINVAL;
>  
>  	obj->state = KLP_DISABLED;
> +	obj->mod = NULL;
>  
>  	klp_find_object_module(obj);
>  
> @@ -965,10 +990,30 @@ static int klp_module_notify(struct notifier_block *nb, unsigned long action,
>  				continue;
>  
>  			if (action == MODULE_STATE_COMING) {
> +
> +				/*
> +				 * Check for a small window where the register
> +				 * path already initialized the object.
> +				 */
s/path/patch/



> +				if (obj->mod)
> +					continue;

This might break stacking. The recently registered patch might become
the last on the stack and thus unused.

>  				obj->mod = mod;
>  				klp_module_notify_coming(patch, obj);
> -			} else /* MODULE_STATE_GOING */
> +			} else {
> +				/* MODULE_STATE_GOING */
> +
> +				/*
> +				 * Check for a small window where the register
> +				 * path already saw the GOING state and thus
> +				 * didn't set obj->mod.

Same typo here.

Also I would write that it did not initialize the object. It is not
only about setting obj->mod.

Best Regards,
Petr

> +				 */
> +				if (!obj->mod)
> +					continue;


>  				klp_module_notify_going(patch, obj);
> +				obj->mod = NULL;
> +			}
>  
>  			break;
>  		}
> -- 
> 2.1.0
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/