linux-kernel - Potentially Broken Address Dependency via test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <YXknxGFjvaB46d/p@Pauls-MacBook-Pro>
Date:   Wed, 27 Oct 2021 12:19:48 +0200
From:   Paul Heidekrüger <paul.heidekrueger@...tum.de>
To:     paulmck@...nel.org, will@...nel.org, peterz@...radead.org,
        boqun.feng@...il.com, stern@...land.harvard.edu,
        parri.andrea@...il.com, linux-kernel@...r.kernel.org,
        llvm@...ts.linux.dev
Cc:     elver@...gle.com, charalampos.mainas@...il.com,
        pramod.bhatotia@...tum.de
Subject: Potentially Broken Address Dependency via test_bit() When Compiling
 With Clang

Hi all,

For my bachelor thesis, I have been working on the infamous problem of 
potentially broken dependency orderings in the Linux kernel. I'm being 
advised by Marco Elver, Charalampos Mainas, Pramod Bhatotia (Cc'd).

For context, see: 
https://linuxplumbersconf.org/event/7/contributions/821/attachments/598/1075/LPC_2020_--_Dependency_ordering.pdf

Our approach consists of two LLVM compiler passes which annotate 
dependencies in unoptimised intermediate representation (IR) and verify 
the annotated dependencies in optimised IR. ATM, the passes only 
recognise a subset of address dependencies - everything is still WIP ;-)

We have been cross-compiling with a slightly modified version of 
allyesconfig for arm64, and the passes have now found a case that we 
would like to share with LKML for feedback: an address dependency being 
broken (?) through compiler optimisations in 
fs/afs/addr_list.c::afs_iterate_addresses().

Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:

> [...]
>   index = READ_ONCE(ac->alist->preferred);
>   if (test_bit(index, &set))
>     goto selected;
> [...]

where test_bit() expands to the following in 
include/asm-generic/bitops/non-atomic.h, lines 115 - 122:

> static __always_inline int
> arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> {
>   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> }
> #define test_bit arch_test_bit

The address dependency gets preserved in unoptimised IR since the virtual register %33 transitively depends on %28:

> %28 = load volatile i8, i8* %preferred, align 2, !annotation !15
> store i8 %28, i8* %tmp21, align 1
> %29 = load i8, i8* %tmp21, align 1
> %conv23 = zext i8 %29 to i32
> store i32 %conv23, i32* %index, align 4
> %30 = load i32, i32* %index, align 4
> store i32 %30, i32* %nr.addr.i, align 4
> store i64* %set, i64** %addr.addr.i, align 8
> %31 = load i64*, i64** %addr.addr.i, align 8
> %32 = load i32, i32* %nr.addr.i, align 4
> %div.i = udiv i32 %32, 64
> %idxprom.i = zext i32 %div.i to i64
> %arrayidx.i = getelementptr i64, i64* %31, i64 %idxprom.i
> %33 = load volatile i64, i64* %arrayidx.i, align 8, !annotation !16

In optimised IR, there is no dependency between the two volatile loads 
anymore:

> %11 = load volatile i8, i8* %preferred, align 2, !annotation !19
> %conv25 = zext i8 %11 to i32
> %set.0. = load volatile i64, i64* %set, align 8

Now, since @nr traces back to the READ_ONCE() to @index, does this make 
the load from @addr in test_bit() address-dependent on that READ_ONCE()? 
Should the load from @addr therefore be ordered against the READ_ONCE()?

Many thanks,
Paul