linux-kernel - Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160131180738.GB2948@linux.intel.com>
Date:	Mon, 1 Feb 2016 05:07:38 +1100
From:	Matthew Wilcox <willy@...ux.intel.com>
To:	Dan Williams <dan.j.williams@...el.com>
Cc:	Ross Zwisler <zwisler@...il.com>,
	linux-nvdimm <linux-nvdimm@...1.01.org>,
	Dave Chinner <david@...morbit.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Christoph Hellwig <hch@...radead.org>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Jan Kara <jack@...e.com>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences

On Sun, Jan 31, 2016 at 08:38:20AM -0800, Dan Williams wrote:
> On Sun, Jan 31, 2016 at 2:55 AM, Matthew Wilcox <willy@...ux.intel.com> wrote:
> > On Sat, Jan 30, 2016 at 11:12:12PM -0700, Ross Zwisler wrote:
> >> Is there a reason to store pnfs instead of kaddrs in the radix tree?
> >
> > Once ARM, MIPS and SPARC get supported, they're going to need temporary
> > kernel addresses assigned to PFNs rather than permanent ones.  Also,
> > it'll be easier for teardown to delete PFNs associated with a particular
> > device than kaddrs associated with a particular device.  And it lets
> > us support more persistent memory on a 32-bit machine (also on a 64-bit
> > machine, but that's mostly theoretical)
> >
> > +/*
> > + * DAX uses the 'exceptional' entries to store PFNs in the radix tree.
> > + * Bit 0 is clear (the radix tree uses this for its own purposes).  Bit
> > + * 1 is set (to indicate an exceptional entry).  Bits 2 & 3 are PFN_DEV
> > + * and PFN_MAP.  The top two bits denote the size of the entry (PTE, PMD,
> > + * PUD, one reserved).  That leaves us 26 bits on 32-bit systems and 58
> > + * bits on 64-bit systems, able to address 256GB and 1024EB respectively.
> > + */
> >
> > It's also pretty cheap to look up the kaddr from the pfn, at least on
> > 64-bit architectures without cache aliasing problems:
> >
> > +static void *dax_map_pfn(pfn_t pfn, unsigned long index)
> > +{
> > +       preempt_disable();
> > +       pagefault_disable();
> > +       return pfn_to_kaddr(pfn_t_to_pfn(pfn));
> 
> pfn_to_kaddr() assumes persistent memory is direct mapped which is not
> always the case.

Yes.  This is just the default implementation of dax_map_pfn() which works
for most situations.  We can introduce more complex implementations of
dax_map_pfn() as necessary.  You make another excellent point for why
we should store PFNs in the radix tree instead of kaddrs :-)

One option that I've been looking at (primarily for x86-32) is
having an rbtree of PFN ranges that drivers add to when they register
peristent memory.  That would let us use the io_mapping_create_wc() /
io_mapping_map_atomic_wc() API.  But having great support for persistent
memory with 32-bit x86 kernels is very very low on my priority list.