linux-kernel - uprobes && pre-filtering

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121105190446.GA6188@redhat.com>
Date:	Mon, 5 Nov 2012 20:04:46 +0100
From:	Oleg Nesterov <oleg@...hat.com>
To:	Ananth N Mavinakayanahalli <ananth@...ibm.com>,
	Anton Arapov <anton@...hat.com>,
	David Smith <dsmith@...hat.com>,
	"Frank Ch. Eigler" <fche@...hat.com>, Ingo Molnar <mingo@...e.hu>,
	Josh Stone <jistone@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
	"Suzuki K. Poulose" <suzuki@...ibm.com>
Cc:	linux-kernel@...r.kernel.org
Subject: uprobes && pre-filtering

Hello.

There is a known (and by design) problem with uprobes. They act
systemwide, there is no pre-filtering. Just some random thoughts
to provoke the discussion.

- I think that the current uprobe_consumer->filter(task) should die.

  It buys nothing. It is called right before ->handler() and this is
  pointless, ->handler() can call it itself.

  And more importantly, I do not think this hook (with the same
  semantics) can/should be used by both register and handler_chain().

- We should change register_ and and mmap to filter out the tasks
  which we do not want to trace. To simplify, lets forget about
  multiple consumers first.

  Everything is simple, install_breakpoint() callers should simply
  call consumer->filter(args) and do nothing if it returns false.

  The main problem is, what should be passed as "args". I think it is
  pointless to use task_struct as an argument. And not because there
  is no simple way to find all tasks which use this particular mm in
  register_for_each_vma(), even if it was possible I think this makes
  no sense.

  If 2 tasks/threads share the same mm they will share int3 as well,
  so I think we need

  	enum filter_mode {
  		UPROBE_FILTER_REGISTER,
  		UPROBE_FILTER_MMAP,
  		/* more */
  	};

  	consumer->filter(enum filter_mode mode, struct mm_struct *mm);

  Sure, this does not allow to, say, probe the tasks with the given uid.
  But once again, currently we do not have for_each_mm_user(task, mm)
  and even if we implement it

  	a) ->filter(mm) can use it itself)

  	b) I do not think register_for_each_vma() should call it.

  	   Suppose that a consumer wants to track the task with the
  	   given pid PID. In this case ->filter() can simply check
  	   find_task_by_vpid(PID)->mm = mm. This is fast and simple.

  	   Or you want to probe all instances of /bin/ls. In this case
  	   filter() can check mm->exe_file->f_path == saved_path, this
  	   is very cheap.

  	   But if we add for_each_mm_user() into register_for_each_vma()
  	   it will be called even if the filtering is simple.

  So. If filter(UPROBE_FILTER_REGISTER) needs to check task_uid(task) == UID
  it has to do for_each_process() until we have (if ever) for_each_mm_user().

  Or it should always return true and remove the unnecessary breakpoints
  later, or use another API (see below).

  Also. Who needs the "nontrivial" filtering? I do not see any potential
  in-kernel user. And the tools like systemtap can take another approach
  (perhaps).

- Perhaps we should extend the API. We can add

	uprobe_apply(consumer, task, bool add_remove);

  which adds/removes breakpoints to task->mm.

  This way consumer can probe every task it wants to trace after
  uprobe_register().

  Its ->filter(UPROBE_FILTER_REGISTER) should simply return false. Or,
  better, we can split uprobe_register() into 2 functions,
  __uprobe_register() and uprobe_apply_all() which actually does
  register_for_each_vma().

  ***** QUESTION *****: perhaps this is all systemtap needs? ignoring
  UPROBE_FILTER_MMAP.

- Multiple consumers. uprobe_register/uprobe_unregister should be
  modified so that register_for_each_vma() is called every time when
  the new consumer comes or consumer goes away.

  uprobe_apply(add_remove => false) should obviously consult other
  consumers too.

- Perhaps we should teach handle_swbp() to remove the unwanted breakpoints.

  If every ->handler() returns, say, UPROBE_GO_AWAY handle_swbp() should
  remove this breakpoint. Of course, a consumer must be sure that if it
  returns UPROBE_GO_AWAY this task can't share ->mm with another task it
  wants to trace.

  Or consumer->handler() can do uprobe_apply(add_remove => false) itself,
  but this needs more discussion.

  The point is that if the filtering at UPROBE_FILTER_REGISTER time is
  not possible, ->filter(UPROBE_FILTER_REGISTER) can return true. A
  "wrong" int3 doesn't hurt until the task actually hits the breakpoint,
  and I think that a single bp-hit is fine performance-wise.

- fork(). The child inherits all breakpoints from parent, and uprobes
  can't control this. What can we do?

	* We can add another uprobe hook which does something like

	  	for_each_uprobe_in_each_vma(child_mm) {
	  		if (filter(UPROBE_FILTER_FORK))
	  			install_breakoint();
	  		else
	  			remove_breakpoint();
	  	}


	  But is is not clear where can we add this hook. dup_mmap()
	  looks appealing, but at this time the child is still under
	  construction, consumer->filter() can't look at task_struct.

	  And of course, it is not nice to slow down fork().

	* If we only care about the unwanted breakpoints, perhaps it
	  would be better to rely on UPROBE_GO_AWAY above?

	* Finally, do we care at all? Again, who can ever need to
	  re-install breakpoints after fork?

	  systemtap (iiuc) doesn't need this. And even if it does
	  or will need, I guess it can hook fork itself and use
	  uprobe_apply() ?

Please comment.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/