linux-kernel - Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <E1GxQF2-0000i6-00@dorka.pomaz.szeredi.hu>
Date:	Thu, 21 Dec 2006 16:53:56 +0100
From:	Miklos Szeredi <miklos@...redi.hu>
To:	rmk+lkml@....linux.org.uk
CC:	linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org
Subject: Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again

I'll first answer the last paragraph.

> I suggest that in order for fuse to work reliably on ARM, it is modified
> to behave more like a reasonable device driver, and use the functions
> defined in asm/uaccess.h when it wants to access the current processes
> VM space.

Fuse needs to use get_user_pages() to work around certain deadlock
scenarios (see Documentation/filesystems/fuse.txt), that are
problematic with copy_*_user().

Other possibilities for dealing with these kinds of deadlocks are:

  1) detect deadlock before it happens

  2) copy data to a temporary buffer before copying to userspace

I think 1) is basically impossible.  2) is easy and would save a lot
of complexity, but it would certainly be much slower than the current
approach, especially for the mainstream architectures, where the
dcache aliasing problem doesn't exist.

> I've recently been asked to look at why fuse doesn't work on ARM.
> In doing so and thinking about what fuse is doing, I've come to
> the conclusion that the way fuse accesses the _current_ processes
> memory space is less than ideal.
> 
> The problem is as follows:
> 
> - current process calls writev()
> - fuse_dev_write is invoked, and this ends up calling
>    get_user_pages(current, current->mm, address, 1, 0, ...)
> - data is then read from the kernel mapping of this page (briefly) via:
>    void *mapaddr = kmap_atomic(page, ..);
>    memcpy(dest, mapaddr + offset, size);
>    kunmap_atomic(mapaddr);
> 
> With a VIVT cache, there is an aliasing issue.  Data may be in the
> cache lines corresponding with the userspace mapping, thereby making
> data read from the kernel mapping incoherent.
> 
> On the face of it, the solution seems simple - we could assume that the
> kernel mapping does not have any cache lines allocated with it, so we
> can just write back the userspace mapping to make that data visible.
> 
> However, the action of that memcpy() will allocate some cache lines
> in the kernel mapping of this page, which means when we next come to
> read it (maybe via the same code) we could end up reading stale data.
> 
> Therefore, we also need to flush - more specifically, discard - the
> kernel mapping cache lines.
> 
> 
> Okay, now let's look at the read case:
> 
> - current process calls readv()
> - fuse_dev_read is invoked, and eventually calls:
>    get_user_pages(current, current->mm, address, 1, 1, ...)
> - data is then written to the kernel mapping of this page (briefly) via:
>    void *mapaddr = kmap_atomic(page, ..);
>    memcpy(mapaddr + offset, source, size);
>    kunmap_atomic(mapaddr);
> 
> A similar problem exists here.  The userspace mapping may have some
> cache lines assocated with it, and as we found out above, so may the
> kernel mapping.  We can apply the same fix to it.
> 
> However, and this is the problem, we need cache maintainence _after_
> that memcpy() has completed - with a write allocate VIVT cache, the
> memcpy() itself will allocate cache lines in the kernel mapping of
> the page which will need to be written back for the user process to
> see that data.

Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
could be replaced by the flush_kernel_dcache_page() (added by James
Bottomley together with flush_anon_page()) when all relevant
architectures have defined it.

> Moreover, the userspace mapping would need its cache lines evicted
> (maybe by get_user_pages()) for the written back kernel cache lines
> to be visible.  The more I think about this, the more I'm convinced
> that this part could only be done by get_user_pages().

Yes, I think that was the concept.

> Now, throw in SMP or preempt with a multi-threaded userspace program
> touching the page in question, and the problem just gets much much
> worse.  In such a scenario, we can not guarantee, no matter how much
> cache maintainence is applied to the kernel, that this API comes
> anywhere near to being safe.

This is only problematic if multiple threads are touching the same
page, no?  If the page(s) used for reading/writing requests are
exclusive to each thread, then there should be no problem.  This is a
reasonable requirement towards the userspace filesystem I think.

> To summarise, in order for get_user_pages() to vaguely work with this
> method of accessing the current processes address space on ARM, we'd
> need to make the following changes:
> 
> 1. get_user_pages() needs to unconditionally write back and invalidate
>    the user space mapping.
> 2. get_user_pages() needs to invalidate the kernel space mapping.
> 3. all users of get_user_pages(current, current->mm, ...) need auditing,
>    and additional cache maintainence added in their write paths to
>    write back and invalidate the kernel mapping.
> 4. all fuse userspace programs need to run single-threaded.
> 
> Note: flush_anon_page() was added to work around 1 and 2 on other
> architectures because fuse tripped over this issue there.
> 
> So, given all this additional complexity _and_ that it would only be
> safe on non-preempt UP, the question becomes: is using get_user_pages()
> to access the current processes memory space legal?  Given the above,
> I would say not.

Note again, fuse is not the only user of get_user_pages().  Other uses
are just as prone to these issues.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/