linux-kernel - Flushing whole page instead of work for ptrace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <4CEFA8AE.2090804@petalogix.com>
Date:	Fri, 26 Nov 2010 13:31:42 +0100
From:	Michal Simek <michal.simek@...alogix.com>
To:	Oleg Nesterov <oleg@...hat.com>,
	Roland McGrath <roland@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
CC:	LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
	John Williams <john.williams@...alogix.com>,
	"Edgar E. Iglesias" <edgar.iglesias@...il.com>
Subject: Flushing whole page instead of work for ptrace

Hi,

I have found one problem when I debug multithread application on 
Microblaze. Let me describe what I discovered.

GDB has internal timeout which is setup to value 3. Which should mean if 
GDB sends packet and doesn't receive answer for it then after 3 internal 
timeouts GDB announces "Ignoring packet error, continuing..." and then 
fail. (communication is done over TCP).

In any older version we could debug multithread application that's why
I bisected all new patches which I have added to the kernel and I 
identify that the problem is caused by my patch.

microblaze: Implement flush_dcache_page macro
sha1(79e87830faf22ca636b1a1d8f4deb430ea6e1c8b)

I had to implemented flush_dcache_page macro for new systems with 
write-back(WB) cache which is important for several components (for 
example jffs2 rootfs) to get it work on WB.
BTW: For systems with write-through(WT) caches I don't need to implement 
this macro because flushing is done automatically.

Then I replaced macro on WT by udelay loop to find out if the problem is 
time dependent. I tested it on two hw designs(on the same HZ and cache 
size) with two different network IPs/drivers (one with DMA and second 
without) and I found that system with dma network driver can spend more 
time on dcache flushing before GDB timeout happens because TCP 
communication is faster. Which means that the problem also depends on 
cpu speed and cache configuration - size, cache line length.

Then I traced kernel part and I was focused why this macro is causing 
this problem.

GDB sends symbol-lookup command (qSymbol) and I see a lot of kernel 
ptrace PEEKTEXT requests. I parse it and here is calling sequence.

(kernel/ptrace.c) sys_ptrace -> 
(arch/microblaze/kernel/ptrace.c)arch_ptrace -> 
(kernel/ptrace.c)ptrace_request -> generic_ptrace_peek/poke data/text -> 
(mm/memory.c) access_process_vm -> get_user_pages -> __get_user_pages -> 
flush_dcache_page

Function access_process_vm calls __get_user_pages which doesn't work 
with buffer len (which is for PEEK/POKE TEXT/DATA just 32 bit - for 
32bit Microblaze) but only with start and PAGE size. There is also 
called flush_dcache_page macro which takes more time than in past, 
because was empty. Macro flushes whole page but it is necessary, for 
this case, just flush one address if is called from ptrace.

What is the best way how to ensure that there will be flush only address 
instead of whole page for ptrace requests?
I think that there shouldn't be a reason to flush whole page for ptraces.

Please correct me if I am wrong somewhere.

Thanks,
Michal

-- 
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/