linux-kernel - PROBLEM: infinite loop do_sparc64_fault with fault

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <5a046f53.195cb.14db30cc3a3.Coremail.wei_qi_k@163.com>
Date:	Tue, 2 Jun 2015 14:54:27 +0800 (CST)
From:	weiqi <wei_qi_k@....com>
To:	linux-kernel@...r.kernel.org
Subject: PROBLEM: infinite loop do_sparc64_fault with fault_code 2

Hello,
   Everyone
       Nearly, I'm working on a sparc64 machine with linux-2.6.32 (32 cores, SMP) ,64bit kernel and userspace is 32bit.

      when I run LTP test case with command :"./kill10 -c100 -g 1 -n 
1",  It will trap in  an infinite page_fault   loop  occasionally.  and 
 one of the kill10 process will  use 100% CPU . (easy to repeat, just 
run command again and again)

       After some debug, I find :

      1) the fault address is the same, and always at kill10's user-stack, for example "0xffb0b470".

    2) the fault  happend when kill10 handle signal at  put_user()  , 
code path: arch/sparc/kernel/signal32.c: setup_frame32()  --> 
put_user().

      3) The first  fault is handled by do_wp_page() 
because of COW,  and then do_wp_page() found PageAnon(old_page)  then 
reuse old_page.

   4) then go into  infinite loop  fault  with fault_code 2 (D-TLB 
miss), and  handled by handle_pte_fault() out at flush_tlb_page()  which
 has a comment :
                /*
                 * This is needed only for protection faults but the arch code
                 * is not yet telling us if this is a protection fault or not.
                 * This still avoids useless tlb flushes for .text page faults
                 * with threads.
                 */
                   if (flags & FAULT_FLAG_WRITE)
                        flush_tlb_page(vma, address);

     I'v also tested  with linux-3.10,  and almost same result.

   I know sparc has software tlb process,  In the function do_wp_page(),
 it will call  flush_tlb_page() and update_mmu_cache() , but It seems  
no effect, just   D-TLB miss  infinitely at same address