lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 05 Nov 2013 10:53:20 +0900
From:	Namhyung Kim <namhyung@...nel.org>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	Namhyung Kim <namhyung.kim@....com>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Hyeoncheol Lee <cheol.lee@....com>,
	Hemant Kumar <hkshaw@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
	"zhangwei\(Jovi\)" <jovi.zhangwei@...wei.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>
Subject: Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)

On Mon, 4 Nov 2013 16:01:12 +0100, Oleg Nesterov wrote:
> On 11/04, Namhyung Kim wrote:
>>
>> On Sat, 2 Nov 2013 16:54:58 +0100, Oleg Nesterov wrote:
>> >
>> > This does not look right to me.
>> >
>> > - get_user_vaddr() is costly, it does vma_interval_tree_foreach() under
>> >   ->i_mmap_mutex.
>>
>> Hmm.. yes, I think this is not needed.  I guess it should lookup a
>> proper vma in current->mm with mmap_sem read-locked.
>>
>> >
>> > - this only allows to read the data from the same binary.
>>
>> Right.  This is also an unnecessary restriction.  We should be able to
>> access data in other binary.
>
> Yes... but this needs another discussion. In general, we simply can not
> do this with the suggested syntax.

Agreed.

>
> Say you want to probe this "foo" binary and dump "stdin" from libc.so.
> You can't do this. You simply can't know where libc.so will be mmaped.
>
> But: if we attach the event to the already running process, or if we
> disable the randomization, then we can probably do this, see below.
>
> Or the syntax should be "name=probe @file/addr" or something like this.

Okay.  Let's call this kind of thing "cross-fetch" (or a better name can
be suggested).  This is more complex situation and needs more discussion
as you said.  So let's skip the discussion for now. :)

>
>> > - in particular, you can't read the data from bss
>>
>> I can't understand why..  The bss region should also be in a same vma of
>> normal data, no?
>
> No, no. bss is mmaped anonymously, at least in general. See set_brk() in
> load_elf().

Ah, thanks for the pointer.  I also need to say that I'm not familiar
with the code base.

Looking at the code, it seems to add a anon mapping iff the bss region
spans on two or more pages - that's why I missed it from my simple
test. :/

>
>> I thought the gcc somehow aligns data to next page boundary.
>
> And perhaps it even should, my system is old. But this doesn't really
> matter, the process itself can create another mapping.

Right.

>
>> But if
>> it's not the case, we need to recognize which is the proper one..
>>
>> Simply preferring a writable vma to a read-only vma is what's came to my
>> head now.  Do you have an idea?
>
> So far I think that trace_uprobes.c should not play games with vma. At all.

Yes, playing with vma is fragile.  But otherwise how can we get the
address from the file+offset in random processes?

>
>> > -------------------------------------------------------------------------------
>> > Can't we simply implement get_user_vaddr() as
>> >
>> > 	static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
>> > 	{
>> > 		void __user *vaddr = (void __force __user *)addr;
>> >
>> > 		/* A NULL tu means that we already got the vaddr */
>> > 		if (tu)
>> > 			vaddr += (current->mm->start_data & PAGE_MASK);
>> >
>> > 		return vaddr;
>> > 	}
>> >
>> > ?
>> >
>> > I did this change, and now the test-case above works. And it also works
>> > with "cc -pie -fPIC",
>> >
>> > 	# nm foo | grep -w global
>> > 	0000000000200c9c D global
>> >
>> > 	# perf probe -x ./foo -a "func var=@...9c:u32"
>> > 	# perf record -e probe_foo:func ./foo
>> > 	...
>> > 	# perf script | tail -1
>> > 		foo   576 [001]   475.519940: probe_foo:func: (7ffe95ca3814) var=4321
>> >
>> > What do you think?
>>
>> This can only work with the probes fetching data from the executable,
>> right?  But as I said it should support any other binaries too.
>
> See above, we can't in general read other binaries.

Okay, I need to clarify my words.  I'm not saying about "cross-fetch"
here, what I wanted to say is adding a probe in some dso and fetch data
from the dso.

Primary usecase I have in mind is supporting SDTs in the perf probe
tool.  Currently many libraries including glibc add tracepoints (SDTs)
within themselves to be traced/profilied easily.

You can see Hemant's work on this here:

  https://lkml.org/lkml/2013/10/18/274

>
> But: if we know know where it is mmapped we can do this, just we need
> to calculate the right addr passed to trace_uprobes.
>
> Or: we should support both absolute and relative addresses, this is what
> I was going to discuss later.

But I guess this "specifying address directly" is hard to apply to
multiple processes - like system-wide tracing in perf.

>
>> static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
>> {
>> 	unsigned long pgoff = addr >> PAGE_SHIFT;
>> 	struct vm_area_struct *vma, *orig_vma = NULL;
>> 	unsigned long vaddr = 0;
>>
>> 	if (tu == NULL) {
>> 		/* A NULL tu means that we already got the vaddr */
>> 		return (void __force __user *) addr;
>> 	}
>>
>> 	down_read(&current->mm->mmap_sem);
>>
>> 	vma = current->mm->mmap;
>
> Cough, it can be null if another thread does munmap(0, TASK_SIZE) ;)
>
> But this doesn't matter.

:)

>
>> 	do {
>> 		if (!vma->vm_file || vma->vm_file->f_inode != tu->inode) {
>> 			/*
>> 			 * We found read-only mapping for this inode.
>> 			 * (provided that all mappings for this inode
>> 			 * have consecutive addresses)
>> 			 */
>> 			if (orig_vma)
>> 				break;
>> 			continue;
>> 		}
>>
>> 		if (vma->vm_pgoff > pgoff ||
>> 		    (vma->vm_pgoff + vma_pages(vma) <= pgoff))
>> 			continue;
>>
>> 		orig_vma = vma;
>>
>> 		/*
>> 		 * We prefer writable mapping over read-only since
>> 		 * data is usually in read/write memory region.  But
>> 		 * in case of read-only data, it only can be found in
>> 		 * read-only mapping so we save orig_vma and check
>> 		 * whether it also has writable mapping.
>> 		 */
>> 		if (vma->vm_flags & VM_WRITE)
>> 			break;
>> 	} while ((vma = vma->vm_next) != NULL);
>>
>> 	if (orig_vma)
>> 		vaddr = offset_to_vaddr(orig_vma, addr);
>>
>> 	up_read(&current->mm->mmap_sem);
>>
>> 	return (void __force __user *) vaddr;
>> }
>
> For what? Why it is better then my suggestion?

Just to support fetching (not cross-fetching!) from other binaries
(dsos) other than an executable.

>
> How it can read bss? How it can read the data from other binaries?

Yes, it'd fail if bss resides in a separate vma. :-/

>
> How we can trust the result? This code relies on some guesses and
> none of them are "strict".
>
> If nothing else, elf can have the arbitrary number of mmaped sections,
> this can't work in general?

These two are still problems to be solved.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ