lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1457067768.15454.181.camel@hpe.com>
Date:	Thu, 03 Mar 2016 22:02:48 -0700
From:	Toshi Kani <toshi.kani@....com>
To:	Paul Gortmaker <paul.gortmaker@...driver.com>,
	Borislav Petkov <bp@...e.de>,
	Richard Purdie <richard.purdie@...uxfoundation.org>,
	Toshi Kani <toshi.kani@...com>
Cc:	Bruce Ashfield <bruce.ashfield@...driver.com>,
	openembedded-core <openembedded-core@...ts.openembedded.org>,
	"Hart, Darren" <darren.hart@...el.com>,
	"saul.wold" <saul.wold@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: runtime regression with "x86/mm/pat: Emulate PAT when it is
 disabled"

On Thu, 2016-03-03 at 15:59 -0500, Paul Gortmaker wrote:
> So, the yocto folks moved from 4.1 to 4.4 and one of their automated
> qemu x86-32 boot tests started failing.  None of the yocto details seem
> to matter since I offered to help and I've repropduced it using 100%
> mainline kernels and a generic distro toolchain as well.
> 
> The test case is slightly complicated, in that it relies on uvesafb
> being modular, and so one has to juggle modules within an ext4 image
> that qemu boots from.  We tried making uvesafb builtin, but that made
> the issue magically vanish.  Given PAT, this isn't too surprising.
> 
> Richard did the preliminary investigation and analysis, and from that I
> did a bisect, and found the commit in $SUBJECT to be the root cause, as
> per the discussion here:
> 
> http://lists.openembedded.org/pipermail/openembedded-core/2016-March/1183
> 97.html
> 
> I'd mentioned the above to bpetkov on IRC and after confirming it was
> still an issue on 4.5-rc6, he'd asked if I had a portable reproducer.  
> 
> Not sure how complicated that would be, I set out to make one from my
> build.   With a little LD_PRELOAD type magic and ensuring all the qemu
> components are in ./  I have one that runs on an otherwise qemu-free
> x86-64 box. 
> 
> The stand alone reproducer is here; launched in 00-runme:
> 
> http://openlinux.wrs.com/pat-splat/reproducer.tar.bz2  
> 
> It is nothing fancy, just a generic yocto build of "sato" (gfx enabled
> rootfs).  When it "works" it boots to a UI touchscreen interface.  When
> it fails, you get a black screen with a blinking cursor (as seen in
> "vncviewer localhost:0").

Thanks for tracking down, and packaging the reproducer.  I simply untar'd
and ran 00-runme, but was not able to connect with localhost:0.  I am not
familiar with qemu, so I have not looked into why, though...

Anyway, with regarding the error message:
  "x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem
0xfd000000-0xfdffffff], got write-combining"

Did it came from the following path during fork()?
 copy_process
  copy_mm
   dup_mm
    dup_mmap
     copy_page_range
      track_pfn_copy
       reserve_pfn_range

If so, track_pfn_copy() obtained pgprot from a PTE, and called
reserve_pfn_range() with it.  So, the error message indicates that previous
ioremap_wc() (i.e. pcm WC) resulted in creating UC- map (i.e. pgprot UC-).
 pcm is a logical cache type and pgprot is a HW cache type.  They can be
different when CPU does not have support for a given logical type.  This WC
to UC- conversion happens when CPU does not support PAT.

Richard's change, which compares with pgprot values in reserve_pfn_range()
is a good one, but I do not understand how we get into this mess.  We do
not have this check when PAT is disabled, and WC is supported when PAT is
enabled.

Commit 9cd25aac1 changed the initial values of the pcm<->pgrot conversion
tables.  The tables should be initialized with the same values after
pat_init() is called.  Is there any possibility that ioremap_wc() was
called before pat_init()..?

Also, can you send me a whole dmesg output?  I'd like to check how PAT is
initialized.

Thanks!
-Toshi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ