lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8P3a3AQ43t2q1MVN9PoqQXZiuXQo1uDcGnRAPCiT66_oDyLg@mail.gmail.com>
Date:   Wed, 6 Sep 2017 22:57:58 +0200
From:   Arnd Bergmann <arnd@...db.de>
To:     Romain Izard <romain.izard.pro@...il.com>
Cc:     Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Sven Schmidt <4sschmid@...ormatik.uni-hamburg.de>
Subject: Re: HAVE_EFFICIENT_UNALIGNED_ACCESS on ARM32 (was: Alignment issues
 in zImage with Linux 4.12, LZ4 and GCC5.3)

On Mon, Sep 4, 2017 at 6:19 PM, Romain Izard <romain.izard.pro@...il.com> wrote:
> 2017-07-24 13:07 GMT+02:00 Ard Biesheuvel <ard.biesheuvel@...aro.org>:
>> On 24 July 2017 at 11:57, Romain Izard <romain.izard.pro@...il.com> wrote:
>>>
>>> While upgrading the kernel from 4.9 to 4.12 for a custom board with a
>>> Cortex-A5 based CPU, I have encountered a compilation issue that leads to
>>> a data abort during the execution of the LZ4 decompression code in
>>> zImage.
>>>
>>> [...]
>>>
>>> The compilation options are a little different between both cases:
>>> The library is built with -O3, whereas the zImage decompressor is built
>>> with -O2, -DDISABLE_BRANCH_PROFILING, -fpic, -mno-single-pic-base,
>>> -fno-builtin. All other compilation options are shared in both cases.
>>>
>
> This is a red herring: the critical option here is '-fno-builtin'. If it is
> not set, the bug disappears. It also disappears if we replace it with
> '-fno-builtin-putc'. But it only changes the optimizations applied by
> the compiler itself, and cannot explain the issue.
>
> Before updating the LZ4 decompressor, the LZ4 header contained specific
> code for handling alignment issues, which has been changed.
>
>>> For Linux 4.9, the LZ4 decompressor code is completely different, which
>>> explains why the issue appeared when changing kernel versions.
>>>
>>
>> I see some void* to u32* casts in the new code, which makes me think
>> that it is perhaps not valid C, and has maybe not been tested on an
>> architecture that has stricter alignment requirements than x86?
>>
>
> I can reproduce it easily on v4.13 with GCC6.3:
> - Configure with allnoconfig
> - Enable CONFIG_MMU, CONFIG_KERNEL_LZ4
> - Check the generated assembly for arch/arm/boot/compressed/decompress.o:
> In the LZ4_decompress_fast function, the memory access after the third
> branch uses ldm and stm. This is invalid, as the addresses can be unaligned.
>
> With this configuration, HAVE_EFFICIENT_UNALIGNED_ACCESS is set, but this is
> wrong. On 32-bit ARM, the compiler is free to generate LDM or LDRD access
> that will always fail on unaligned addresses. In this case, we have two
> LDR/STR access to adjascent addresses that appear in inline code. The
> get_unaligned functions in "include/linux/unaligned/access_ok.h" cast the
> pointers directly as regular 32-bit access, and as those are by default
> aligned, the compiler will optimise and combine the access.
>
> If we use the functions from "include/linux/unaligned/le_struct.h", the
> get_unaligned() function correctly tells the compiler that the access is
> special, and that it should not merge memory access. But we do not fall back
> to byte-by-byte access, as the compiler itself knows how to use 32-bit
> access when -funaligned-access is set (by default for ARMv7).

Right, I've come across this in the past as well.

> The issue is probably hidden by the kernel fault handler in normal kernel
> code, but for this case it does nothing as we are working in the boot
> decompressor, that cannot use the fault handler. But it should have a
> performance inpact.
>
> As a result, this means that HAVE_EFFICIENT_UNALIGNED_ACCESS should not
> be set at least in the context of "include/asm-generic/unaligned.h". But
> as this option is also used in other places, where it is not related to
> the get_unaligned functions, it is not possible to remove it on ARM 32-bit
> without further study.

This is a patch I prototyped in the past https://pastebin.com/apPTPXys

I'm not entirely sure if this produces good object code with all compilers
on all architectures, but it should solve the problem you observed and
more.

       Arnd

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ