lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20140804.163542.2163737198733907800.davem@davemloft.net>
Date:	Mon, 04 Aug 2014 16:35:42 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	npiggin@...nel.dk
Cc:	cat.schulze@...ce-dsl.net, sparclinux@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: overzealous TLB flushing by lazy VMAP flushing

From: David Miller <davem@...emloft.net>
Date: Mon, 04 Aug 2014 16:23:14 -0700 (PDT)

Sorry, I screwed up the lkml CC:, fixing that here.

> Hey Nick,
> 
> The lazy VMAP flushing in mm/vmalloc.c seems to make various
> assumptions about vmalloc area layout.
> 
> In particular it assumes that if there are pending VMAP flushes
> in multiple regions managed by vmap/vunmap, it's safe to queue
> up a range flush from the lowest such address to the highest
> such address.
> 
> This is problematic and causes problems on sparc64 as diagnosed by
> Christopher (CC:'d).
> 
> On sparc64 we have the following regions:
> 
> modules		0x010000000 --> 0x0f0000000
> openfirmware	0x0f0000000 --> 0x100000000
> vmalloc		0x100000000 --> 0x10000000000
> 
> So if a module is unloaded as well as some vfree()'s occur, the next
> lazy VMAP flush will flush a range that covers all of openfirmware.
> 
> This will flush the firmware's locked TLB entries, which in turn cause
> all sorts of problems.
> 
> It is not possible to adjust where these ranges are in order to make
> the vmalloc and module ranges be right next to eachother.  The
> firmware area is fixed, first of all.  Second of all the module area
> has to be in the low 4GB because of the code model we compile the
> kernel with (all symbols are 32-bit), and we want to use as little of
> the sub-4GB area as possible because it has to fit the main kernel
> image, modules, and the firmware region.
> 
> We could add all sorts of range logic to the flush_tlb_range()
> implementation on sparc64, but I really think that the kernel should
> not trigger a TLB flush across a range for which it never managed any
> mappings.
> 
> I also think that the lazy VMAP flusher should be mindful of this for
> another reason.  Specifically, issuing such an enormous flush range is
> going to be expensive, more expensive that whatever we were gaining by
> batching these flushes.
> 
> Unlike for userspace mappings, for kernel mappings we can't have a
> cutoff for page-by-page flushes and just do a context based TLB flush.
> We always have to do page-by-page flushes.  So these huge ranges
> really do hurt.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ