[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9d17698-0c74-42b7-acc8-3ca3adf3ba4d@ghiti.fr>
Date: Thu, 3 Oct 2024 16:41:26 +0200
From: Alexandre Ghiti <alex@...ti.fr>
To: Jason Montleon <jmontleo@...hat.com>
Cc: Kees Cook <kees@...nel.org>, linux-hardening@...r.kernel.org,
Linux regressions mailing list <regressions@...ts.linux.dev>,
linux-riscv@...ts.infradead.org
Subject: Re: [REGRESSION][BISECTED] Cannot boot Lichee Pi 4A with
FORTIFY_SOURCE enabled
Hi Jason,
On 02/10/2024 17:14, Jason Montleon wrote:
> On Tue, Oct 1, 2024 at 10:53 AM Alexandre Ghiti <alex@...ti.fr> wrote:
>> Hi Jason,
>>
>> On 25/09/2024 16:32, Jason Montleon wrote:
>>> On Tue, Sep 24, 2024 at 1:36 PM Kees Cook <kees@...nel.org> wrote:
>>>>
>>>> On September 24, 2024 8:58:57 AM PDT, Jason Montleon <jmontleo@...hat.com> wrote:
>>>>> On Sun, Sep 22, 2024 at 6:38 PM Kees Cook <kees@...nel.org> wrote:
>>>>>> Can you try this patch? It should avoid using the "WARN" infrastructure
>>>>>> (if that is the source of blocking boot), but should still provide some
>>>>>> detail about what tripped it up (via the "regular" pr_*() logging). And
>>>>>> if it boots, can you look at the log to find what it reports for the
>>>>>> "Wanted to write ..." line(s)?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> -Kees
>>>>>>
>>>>>>
>>>>>> diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h
>>>>>> index 0d99bf11d260..469a3439959d 100644
>>>>>> --- a/include/linux/fortify-string.h
>>>>>> +++ b/include/linux/fortify-string.h
>>>>>> @@ -549,7 +549,8 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size,
>>>>>> const size_t q_size,
>>>>>> const size_t p_size_field,
>>>>>> const size_t q_size_field,
>>>>>> - const u8 func)
>>>>>> + const u8 func,
>>>>>> + const char *field)
>>>>>> {
>>>>>> if (__builtin_constant_p(size)) {
>>>>>> /*
>>>>>> @@ -610,8 +611,13 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size,
>>>>>> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832
>>>>>> */
>>>>>> if (p_size_field != SIZE_MAX &&
>>>>>> - p_size != p_size_field && p_size_field < size)
>>>>>> - return true;
>>>>>> + p_size != p_size_field && p_size_field < size) {
>>>>>> + if (p_size_field == 0)
>>>>>> + pr_warn("Wanted to write %zu to a 0-sized destination: %s\n",
>>>>>> + size, field);
>>>>>> + else
>>>>>> + return true;
>>>>>> + }
>>>>>>
>>>>>> return false;
>>>>>> }
>>>>>> @@ -625,7 +631,8 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size,
>>>>>> const size_t __q_size_field = (q_size_field); \
>>>>>> fortify_warn_once(fortify_memcpy_chk(__fortify_size, __p_size, \
>>>>>> __q_size, __p_size_field, \
>>>>>> - __q_size_field, FORTIFY_FUNC_ ##op), \
>>>>>> + __q_size_field, FORTIFY_FUNC_ ##op,\
>>>>>> + #p " at " FILE_LINE), \
>>>>>> #op ": detected field-spanning write (size %zu) of single %s (size %zu)\n", \
>>>>>> __fortify_size, \
>>>>>> "field \"" #p "\" at " FILE_LINE, \
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Kees Cook
>>>>>>
>>>>> Hi Kees,
>>>>> Sorry for the delay in responding. Your patch also caused the system
>>>>> to hang booting, but I took it as inspiration to try some things.
>>>>> TL;DR I see these two print several times:
>>>>>
>>>>> Wanted to write 8 to a 0-sized destination: oldptr at
>>>>> arch/riscv/errata/thead/errata.c:185
>>>>> Wanted to write 28 to a 0-sized destination: oldptr at
>>>>> arch/riscv/errata/thead/errata.c:185
>>>>>
>>>>> Since I suspected CONFIG_ERRATA_THEAD_MAE which relies on
>>>>> CONFIG_RISCV_ALTERNATIVE_EARLY I suspected it is just too early for
>>>>> pr_*, etc. I tried trace_printk but that didn't produce any traces,
>>>>> and then I found a comment which makes me think we're too early for
>>>>> that too.
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/kernel/alternative.c?h=v6.11#n214
>>>>>
>>>>> I didn't want to give up so I wrote a simple module to figure out how
>>>>> to use sbi_ecall and then incorporated that idea into your patch,
>>>>> although even that was not straightforward:
>>>>> https://gist.github.com/jmontleon/f3536ad70dc92bed992139ad5b266d5c
>>>> Oh nice work! Probably something riscv-specific needs to be wired up to the printk routines to do what you have there until printk is fully available But that's a separate issue, I guess.
>>>>
>>>>> Full output: https://gist.github.com/jmontleon/823facff38e28b717f153e38adcfd7fe
>>>> Does the system boot with your patch?
>>> Yes. It boots fully. It also booted with trace_printk, it just didn't
>>> produce a trace, which again I think is expected at that stage. It
>>> seems as long as there are no 'normal' print attempts, pr_()*, etc. it
>>> works.
>>
>> Hmmm I don't see what in your patch would fix your issue.
>>
>> I sent a patch
>> (https://lore.kernel.org/linux-riscv/20240829165048.49756-1-alexghiti@rivosinc.com/)
>> that was merged in v6.11-rc7 which fixes something similar, could that
>> be what fixed your problem?
>>
> Hi Alex, Thank you for having a look.
>
> Although I narrowed it down to starting with 6.11-rc1, the first time
> I encountered this was actually with an rc7 Fedora build.
> http://riscv.rocks/koji/buildinfo?buildID=338433
>
> I also did a build from upstream 6.11.1 last night and still see the
> problem trying to boot with it this morning.
>
>> The problem with this early code is that the MMU is not enabled yet and
>> if your kernel is relocatable, the relocations are set "virtually" and
>> then without the MMU, a call to an external function results in a
>> trap...Which happens way too often, I have a few ideas to fix that, I
>> need to find time to tackle this though.
>>
>> I don't know FORTIFY_SOURCE, but I guess this could cause a call to an
>> external function and then cause the issue I describe above right? We
>> already remove all instrumentations from this early code so maybe we
>> should remove FORTIFY_SOURCE too?
So I was able to reproduce the issue on qemu by adding a few tweaks, and
indeed we trap in __warn_printk() on a virtual address but MMU is not
enabled yet.
The following diff though allows me to pass this failure but I can't get
much further in the boot since the tweaks I added won't allow it, can
you give the following a try?
diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile
index 8a27394851233..4913f3b3f198c 100644
--- a/arch/riscv/errata/Makefile
+++ b/arch/riscv/errata/Makefile
@@ -2,6 +2,10 @@ ifdef CONFIG_RELOCATABLE
KBUILD_CFLAGS += -fno-pie
endif
+ifdef CONFIG_RISCV_ALTERNATIVE_EARLY
+KBUILD_CFLAGS += -D__NO_FORTIFY
+endif
+
obj-$(CONFIG_ERRATA_ANDES) += andes/
obj-$(CONFIG_ERRATA_SIFIVE) += sifive/
obj-$(CONFIG_ERRATA_THEAD) += thead/
Thanks,
Alex
>>
> What you are describing here does not seem unreasonable to me, but I
> simply do not know enough to say one way or the other.
> Is there something I can do to collect more information to determine
> if this is the cause?
>
> Thank you again for having a look,
> Jason
>
>> Thanks for digging into this,
>>
>> Alex
>>
>>
>>>> I would guess that this line at arch/riscv/errata/thead/errata.c:168:
>>>>
>>>> void *oldptr, *altptr;
>>>>
>>>> Should be:
>>>>
>>>> u8 *oldptr, *altptr;
>>> I tried this, but it didn't appear to make a difference, I will try
>>> some more today, though I admit to being out of my element.
>>>
>>>
>>>
>>>> But maybe __ALT_PTR needs adjustment too? Hmm. I would have expected void * to have a size of SIZE_MAX here, though, so something else might be confusing the check.
>>>>
>>>>
>>>> --
>>>> Kees Cook
>>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@...ts.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Powered by blists - more mailing lists