[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161206185440.GA4654@yury-N73SV>
Date: Wed, 7 Dec 2016 00:24:40 +0530
From: Yury Norov <ynorov@...iumnetworks.com>
To: <libc-alpha@...rceware.org>, <linux-arch@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
CC: Catalin Marinas <catalin.marinas@....com>, <szabolcs.nagy@....com>,
<heiko.carstens@...ibm.com>, <cmetcalf@...hip.com>,
<philipp.tomsich@...obroma-systems.com>, <joseph@...esourcery.com>,
<zhouchengming1@...wei.com>, <Prasun.Kapoor@...iumnetworks.com>,
<agraf@...e.de>, <geert@...ux-m68k.org>, <kilobyte@...band.pl>,
<manuel.montezelo@...il.com>, <arnd@...db.de>, <pinskia@...il.com>,
<linyongting@...wei.com>, <klimov.linux@...il.com>,
<broonie@...nel.org>, <bamvor.zhangjian@...wei.com>,
<linux-arm-kernel@...ts.infradead.org>,
<maxim.kuvyrkov@...aro.org>, <Nathan_Lynch@...tor.com>,
<schwidefsky@...ibm.com>, <davem@...emloft.net>,
<christoph.muellner@...obroma-systems.com>
Subject: [Question] New mmap64 syscall?
Hi all,
(Sorry if there is similar discussion, and I missed it. I didn't
find something in LKML in last half a year.)
In aarch64/ilp32 discussion Catalin wondered why we don't pass offset
in mmap() as 64-bit value (in 2 registers if needed). Looking at kernel
code I found that there's no generic interface for it. But almost all
architectures provide their own implementations, like this:
SYSCALL_DEFINE6(mips_mmap, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags, unsigned long,
fd, off_t, offset)
{
unsigned long result;
result = -EINVAL;
if (offset & ~PAGE_MASK)
goto out;
result = sys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);
out:
return result;
}
On glibc side things are even worse. There's no mmap() implementation
that allows to pass 64-bit offset in 32-bit architecture. mmap64() which
is supposed to do this is simply broken:
void *
__mmap64 (void *addr, size_t len, int prot, int flags, int fd, off64_t
offset)
{
[...]
void *result;
result = (void *) INLINE_SYSCALL (mmap2, 6, addr,
len, prot, flags, fd,
(off_t) (offset >> page_shift));
return result;
}
It explicitly declares offset as 64-bit value, but casts it to 32-bit
before passing to the kernel, which is wrong for me. Even if arch has
64-bit off_t, like aarch64/ilp32, the cast will take place because
offset is passed in a single register, which is 32-bit.
I see 3 solutions for my problem:
1. Reuse aarch64/lp64 mmap code for ilp32 in glibc, but wrap offset with
SYSCALL_LL64() macro - which converts offset to the pair for 32-bit
ports. This is simple but local solution. And most probably it's enough.
2. Add new flag to mmap, like MAP_OFFSET_IN_PAIR. This will also work.
The problem here is that there are too much arches that implement
their custom sys_mmap2(). And, of course, this type of flags is
looking ugly.
3. Introduce new mmap64() syscall like this:
sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
(The pointer here because otherwise we have 7 args, if simply pass off_hi and
off_lo in registers.)
With new 64-bit interface we can deprecate mmap2(), and generalize all
implementations in kernel.
I think we can discuss it because 64-bit is the default size for off_t
in all new 32-bit architectures. So generic solution may take place.
The last question here is how important to support offsets bigger than
2^44 on 32-bit machines in practice? It may be a case for ARM64 servers,
which are looking like main aarch64/ilp32 users. If no, we can leave
things as is, and just do nothing.
Yury
On Mon, Dec 05, 2016 at 05:12:43PM +0000, Catalin Marinas wrote:
> On Fri, Oct 21, 2016 at 11:33:10PM +0300, Yury Norov wrote:
> > off_t is passed in register pair just like in aarch32.
> > In this patch corresponding aarch32 handlers are shared to
> > ilp32 code.
> [...]
> > +/*
> > + * Note: off_4k (w5) is always in units of 4K. If we can't do the
> > + * requested offset because it is not page-aligned, we return -EINVAL.
> > + */
> > +ENTRY(compat_sys_mmap2_wrapper)
> > +#if PAGE_SHIFT > 12
> > + tst w5, #~PAGE_MASK >> 12
> > + b.ne 1f
> > + lsr w5, w5, #PAGE_SHIFT - 12
> > +#endif
> > + b sys_mmap_pgoff
> > +1: mov x0, #-EINVAL
> > + ret
> > +ENDPROC(compat_sys_mmap2_wrapper)
>
> For compat sys_mmap2, the pgoff argument is in multiples of 4K. This was
> traditionally used for architectures where off_t is 32-bit to allow
> mapping files to 2^44.
>
> Since off_t is 64-bit with AArch64/ILP32, should we just pass the off_t
> as a 64-bit value in two different registers (w5 and w6)?
Powered by blists - more mailing lists