[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mhng-378c6e4e-9eba-4bfd-89d6-b4d2549ff3a1@palmerdabbelt-glaptop1>
Date: Tue, 21 Jul 2020 22:46:38 -0700 (PDT)
From: Palmer Dabbelt <palmer@...belt.com>
To: mpe@...erman.id.au
CC: benh@...nel.crashing.org, alex@...ti.fr, paulus@...ba.org,
Paul Walmsley <paul.walmsley@...ive.com>,
aou@...s.berkeley.edu, Anup Patel <Anup.Patel@....com>,
Atish Patra <Atish.Patra@....com>, zong.li@...ive.com,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-riscv@...ts.infradead.org, linux-mm@...ck.org
Subject: Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone
On Tue, 21 Jul 2020 21:50:42 PDT (-0700), mpe@...erman.id.au wrote:
> Benjamin Herrenschmidt <benh@...nel.crashing.org> writes:
>> On Tue, 2020-07-21 at 16:48 -0700, Palmer Dabbelt wrote:
>>> > Why ? Branch distance limits ? You can't use trampolines ?
>>>
>>> Nothing fundamental, it's just that we don't have a large code model in the C
>>> compiler. As a result all the global symbols are resolved as 32-bit
>>> PC-relative accesses. We could fix this with a fast large code model, but then
>>> the kernel would need to relax global symbol references in modules and we don't
>>> even do that for the simple code models we have now. FWIW, some of the
>>> proposed large code models are essentially just split-PLT/GOT and therefor
>>> don't require relaxation, but at that point we're essentially PIC until we
>>> have more that 2GiB of kernel text -- and even then, we keep all the
>>> performance issues.
>>
>> My memory might be out of date but I *think* we do it on powerpc
>> without going to a large code model, but just having the in-kernel
>> linker insert trampolines.
>
> We build modules with the large code model, and always have AFAIK:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/Makefile?commit=4fa640dc52302b5e62b01b05c755b055549633ae#n129
>
> # -mcmodel=medium breaks modules because it uses 32bit offsets from
> # the TOC pointer to create pointers where possible. Pointers into the
> # percpu data area are created by this method.
> #
> # The kernel module loader relocates the percpu data section from the
> # original location (starting with 0xd...) to somewhere in the base
> # kernel percpu data space (starting with 0xc...). We need a full
> # 64bit relocation for this to work, hence -mcmodel=large.
> KBUILD_CFLAGS_MODULE += -mcmodel=large
Well, a fast large code model would solve a lot of problems :). Unfortunately
we just don't have enough people working on this stuff to do that. It's a
somewhat tricky thing to do on RISC-V as there aren't any quick sequences for
long addresses, but I don't think we're that much worse off than everyone else.
At some point I had a bunch of designs written up, but they probably went along
with my SiFive computer. I think we ended up decided that the best bet would
be to distribute constant tables throughout the text such that they're
accessible via the 32-bit PC-relative loads at any point -- essentially the
multi-GOT stuff that MIPS used for big objects. Doing that well is a lot of
work and doing it poorly is just as slow as PIC, so we never got around to it.
> We also insert trampolines for branches, but IIUC that's a separate
> issue.
"PowerPC branch trampolines" points me here
https://sourceware.org/binutils/docs-2.20/ld/PowerPC-ELF32.html . That sounds
like what we're doing already in the medium code models: we have short and
medium control transfer sequences, linker relaxation optimizes them when
possible. Since we rely on linker relaxation pretty heavily we just don't
bother with the smaller code model: it'd be a 12-bit address space for data and
a 21-bit address space for text (with 13-bit maximum function size). Instead
of building out such a small code model we just spent time improving the linker.
Powered by blists - more mailing lists