linux-kernel - Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <mhng-378c6e4e-9eba-4bfd-89d6-b4d2549ff3a1@palmerdabbelt-glaptop1>
Date:   Tue, 21 Jul 2020 22:46:38 -0700 (PDT)
From:   Palmer Dabbelt <palmer@...belt.com>
To:     mpe@...erman.id.au
CC:     benh@...nel.crashing.org, alex@...ti.fr, paulus@...ba.org,
        Paul Walmsley <paul.walmsley@...ive.com>,
        aou@...s.berkeley.edu, Anup Patel <Anup.Patel@....com>,
        Atish Patra <Atish.Patra@....com>, zong.li@...ive.com,
        linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        linux-riscv@...ts.infradead.org, linux-mm@...ck.org
Subject:     Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

On Tue, 21 Jul 2020 21:50:42 PDT (-0700), mpe@...erman.id.au wrote:
> Benjamin Herrenschmidt <benh@...nel.crashing.org> writes:
>> On Tue, 2020-07-21 at 16:48 -0700, Palmer Dabbelt wrote:
>>> > Why ? Branch distance limits ? You can't use trampolines ?
>>>
>>> Nothing fundamental, it's just that we don't have a large code model in the C
>>> compiler.  As a result all the global symbols are resolved as 32-bit
>>> PC-relative accesses.  We could fix this with a fast large code model, but then
>>> the kernel would need to relax global symbol references in modules and we don't
>>> even do that for the simple code models we have now.  FWIW, some of the
>>> proposed large code models are essentially just split-PLT/GOT and therefor
>>> don't require relaxation, but at that point we're essentially PIC until we
>>> have more that 2GiB of kernel text -- and even then, we keep all the
>>> performance issues.
>>
>> My memory might be out of date but I *think* we do it on powerpc
>> without going to a large code model, but just having the in-kernel
>> linker insert trampolines.
>
> We build modules with the large code model, and always have AFAIK:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/Makefile?commit=4fa640dc52302b5e62b01b05c755b055549633ae#n129
>
>   # -mcmodel=medium breaks modules because it uses 32bit offsets from
>   # the TOC pointer to create pointers where possible. Pointers into the
>   # percpu data area are created by this method.
>   #
>   # The kernel module loader relocates the percpu data section from the
>   # original location (starting with 0xd...) to somewhere in the base
>   # kernel percpu data space (starting with 0xc...). We need a full
>   # 64bit relocation for this to work, hence -mcmodel=large.
>   KBUILD_CFLAGS_MODULE += -mcmodel=large

Well, a fast large code model would solve a lot of problems :).  Unfortunately
we just don't have enough people working on this stuff to do that.  It's a
somewhat tricky thing to do on RISC-V as there aren't any quick sequences for
long addresses, but I don't think we're that much worse off than everyone else.
At some point I had a bunch of designs written up, but they probably went along
with my SiFive computer.  I think we ended up decided that the best bet would
be to distribute constant tables throughout the text such that they're
accessible via the 32-bit PC-relative loads at any point -- essentially the
multi-GOT stuff that MIPS used for big objects.  Doing that well is a lot of
work and doing it poorly is just as slow as PIC, so we never got around to it.

> We also insert trampolines for branches, but IIUC that's a separate
> issue.

"PowerPC branch trampolines" points me here
https://sourceware.org/binutils/docs-2.20/ld/PowerPC-ELF32.html .  That sounds
like what we're doing already in the medium code models: we have short and
medium control transfer sequences, linker relaxation optimizes them when
possible.  Since we rely on linker relaxation pretty heavily we just don't
bother with the smaller code model: it'd be a 12-bit address space for data and
a 21-bit address space for text (with 13-bit maximum function size).  Instead
of building out such a small code model we just spent time improving the linker.