linux-kernel - Re: [PATCH] Revert "MIPS: Remove race window in page fault handling"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 8 Dec 2014 10:18:37 +0100
From:	Lars Persson <lars.persson@...s.com>
To:	Leonid Yegoshin <Leonid.Yegoshin@...tec.com>
CC:	Ralf Baechle <ralf@...ux-mips.org>,
	"linux-mips@...ux-mips.org" <linux-mips@...ux-mips.org>,
	"james.hogan@...tec.com" <james.hogan@...tec.com>,
	"keescook@...omium.org" <keescook@...omium.org>,
	"paul.burton@...tec.com" <paul.burton@...tec.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"manuel.lauss@...il.com" <manuel.lauss@...il.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"blogic@...nwrt.org" <blogic@...nwrt.org>,
	"markos.chandras@...tec.com" <markos.chandras@...tec.com>
Subject: Re: [PATCH] Revert "MIPS: Remove race window in page fault handling"

Hi

I have reconstructed the stack trace on the 3.14 kernel where we caught
the problem.

The call path was __do_fault() -> set_pte_at(). In later kernels it
corresponds to the do_set_pte().

You are right that there are calls to flush_icache_page(). I did not dig
into the code enough to check if page_mapped() will return true at that
point. If it does, then your proposed patch for flush_icache_page is a
better fix.

- Lars

On fre, 2014-12-05 at 22:41 +0100, Leonid Yegoshin wrote:
> Lars,
> 
> On 12/05/2014 01:32 AM, Lars Persson wrote:
> > Hi
> >
> > Our setup includes both a non-DMA block device and a compressing
> > file-system (UBIFS). A flush_dcache_page() is issued by UBIFS so your
> > patch fixes another problem that we do not hit.
> >
> > The stack trace is not available now. Do we need it for any further
> > analysis ? I think the mechanism of the race window is understood and it
> > depends on the __flush_dcache_page() deciding that the flush should be
> > postponed.
> 
> Unfortunately, the research of original case is still needed.
> I looked into all cases of update_mmu_cache() besides HUGE page support 
> and NUMA, and I see:
> 
> 1.  insert_pfn()
>      It is used to put a special page (read - VDSO) into memory map. No 
> cache flush is needed here because page is done and flushed during 
> system boot.
> 
> 2.  do_wp_page(), first occurrence
>      It has flush_cache_page() before it sets PTE in 
> ptep_set_access_flags(). This flush is unconditional and affects all caches.
> 
> 3.  do_wp_page(), second case
>      It is done after preparing a clear new page or after COW. COW has 
> an appropriate cache flush of destination in copy_user_highpage(). The 
> immediate use of cleared new page as instruction (you had SIGILL, 
> right?)... hm-m, something wrong in application in this case.
> 
> 4.  do_swap_page()
>      Well, it may be a case of flush_icache_page() is not used (see 
> below) and page is taken from non-DMA swap. But I also recommend to look 
> into
> 
>          http://patchwork.linux-mips.org/patch/7615/
> 
> there is a bug in swap entry number presentation and it may affect your 
> system.
> 
> 5.  do_anonymous_page()
>      The similar to case (3) - cleared new page, using of it as 
> instruction page may point to some app problem.
> 
> 6.  do_set_pte()
>      It also has flush_icache_page() which may have impact if not 
> implemented, see below.
> 
> 7.  handle_pte_fault()
>      Page is not touched and cache flush is NO-OP.
> 
> 8.  remove_migration_pte()
>      Well, it is a place for suspicion. But it should not run in 
> parallel with any running thread - dirtying page while other thread is 
> running is a way to disaster.
> 
> So, you see - if I understand it correctly, there is no place for 
> failure... besides application misbehaviour or potential kernel bug in 
> migration. Of course, I may miss something and that is a reason why 
> stack trace is desirable.
> 
> 
> > I think the mechanism of the race window is understood and it
> > depends on the __flush_dcache_page() deciding that the flush should be
> > postponed.
> 
> As I remember, you said you use HIGHMEM patch, right? It changes a 
> little __flush_dcache_page() and flush of any mapped page is not 
> postponed anymore. So, it has an immediate effect for application pages.
> 
> - Leonid.
> 
> >
> >
> > - Lars
> >
> > On Fri, 2014-12-05 at 03:16 +0100, Leonid Yegoshin wrote:
> >> (repeat mesg, first one went to wrong place)
> >>
> >> Lars,
> >>
> >> Do you have a stack trace or so then you found the second VPE between
> >> set_pte_at and update_mmu_cache?
> >> It would be interesting how it happens - generally, to get a consistent
> >> SIGILL in applications due to misbehaviour of memory subsystem, the bug
> >> in FS is not enough.
> >>
> >> Hold on - do you use non-DMA file system?
> >> If so, I advice you to try this simple patch:
> >>
> >>       Author: Leonid Yegoshin <yegoshin@...s.com>
> >>       Date:   Tue Apr 2 14:20:37 2013 -0700
> >>
> >>       MIPS: (opt) Fix of reading I-pages from non-DMA FS devices for ID
> >> cache separation
> >>
> >>       This optional fix provides a D-cache flush for instruction code
> >> pages on
> >>       page faults. In case of non-DMA block device a driver doesn't know
> >> that it
> >>       reads I-page and doesn't flush D-cache generally on systems without
> >>       cache aliasing. And that takes toll during page fault of
> >> instruction pages.
> >>
> >>       It is not a perfect fix, it should be considered as a temporary fix.
> >>       The permanent fix would track page origin in page cache and flushes
> >> D-cache
> >>       during reception of page from driver only but not at each page fault.
> >>       It is not done yet.
> >>
> >>       Change-Id: I43f5943d6ce0509729179615f6b81e77803a34ac
> >>       Author: Leonid Yegoshin <yegoshin@...s.com>
> >>       Signed-off-by: Leonid Yegoshin <yegoshin@...s.com>(imported from
> >> commit 6ebd22eb7a3d9873582ebe990a77094f971652ee)(imported from commit
> >> 0caf3b4a1eebb64572e81e4df6fdb3abf12c70
> >>
> >> arch/mips/include/asm/cacheflush.h:
> >>
> >>      @@ -61,6 +61,9 @@ static inline void flush_anon_page(struct
> >> vm_area_struct *vma,
> >>       static inline void flush_icache_page(struct vm_area_struct *vma,
> >>              struct page *page)
> >>       {
> >>      +       if (cpu_has_dc_aliases ||
> >>      +           ((vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc))
> >>      +               __flush_dcache_page(page);
> >>       }
> >>
> >>       extern void (*flush_icache_range)(unsigned long start, unsigned
> >> long end);
> >>
> >>
> >> It fixed crash problems with non-DMA FS in a couple of our customers.
> >> Without it the non-DMA root FS crashes are catastrophic in aliasing
> >> systems but it is still a problem for I-cache too but much rare.
> >>
> >> Unfortunately, it is also a performance hit, however is less than run a
> >> page cache flush at each PTE setup. On 12/03/2014 06:03 AM, Lars Persson
> >> wrote:
> >>> It is the flush_dcache_page() that was called from the file-system
> >>> reading the page contents into memory.
> >>>
> >>> - Lars
> >>>
> >>>
> >
> >
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/