linux-kernel - Re: [PATCH] kfence: Replace local_clock() with ktime_get_boot_fast

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <VI1P193MB0752D8881930F88BACFB56A499B9A@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM>
Date:   Thu, 23 Nov 2023 17:29:33 +0800
From:   Juntong Deng <juntong.deng@...look.com>
To:     Marco Elver <elver@...gle.com>
Cc:     glider@...gle.com, dvyukov@...gle.com, akpm@...ux-foundation.org,
        kasan-dev@...glegroups.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        linux-kernel-mentees@...ts.linuxfoundation.org
Subject: Re: [PATCH] kfence: Replace local_clock() with
 ktime_get_boot_fast_ns()

On 2023/11/23 6:19, Marco Elver wrote:
> On Wed, 22 Nov 2023 at 22:36, Juntong Deng <juntong.deng@...look.com> wrote:
>>
>> On 2023/11/23 4:35, Marco Elver wrote:
>>> On Wed, 22 Nov 2023 at 21:01, Juntong Deng <juntong.deng@...look.com> wrote:
>>>>
>>>> The time obtained by local_clock() is the local CPU time, which may
>>>> drift between CPUs and is not suitable for comparison across CPUs.
>>>>
>>>> It is possible for allocation and free to occur on different CPUs,
>>>> and using local_clock() to record timestamps may cause confusion.
>>>
>>> The same problem exists with printk logging.
>>>
>>>> ktime_get_boot_fast_ns() is based on clock sources and can be used
>>>> reliably and accurately for comparison across CPUs.
>>>
>>> You may be right here, however, the choice of local_clock() was
>>> deliberate: it's the same timestamp source that printk uses.
>>>
>>> Also, on systems where there is drift, the arch selects
>>> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK (like on x86) and the drift is
>>> generally bounded.
>>>
>>>> Signed-off-by: Juntong Deng <juntong.deng@...look.com>
>>>> ---
>>>>    mm/kfence/core.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/kfence/core.c b/mm/kfence/core.c
>>>> index 3872528d0963..041c03394193 100644
>>>> --- a/mm/kfence/core.c
>>>> +++ b/mm/kfence/core.c
>>>> @@ -295,7 +295,7 @@ metadata_update_state(struct kfence_metadata *meta, enum kfence_object_state nex
>>>>           track->num_stack_entries = num_stack_entries;
>>>>           track->pid = task_pid_nr(current);
>>>>           track->cpu = raw_smp_processor_id();
>>>> -       track->ts_nsec = local_clock(); /* Same source as printk timestamps. */
>>>> +       track->ts_nsec = ktime_get_boot_fast_ns();
>>>
>>> You have ignored the comment placed here - now it's no longer the same
>>> source as printk timestamps. I think not being able to correlate
>>> information from KFENCE reports with timestamps in lines from printk
>>> is worse.
>>>
>>> For now, I have to Nack: Unless you can prove that
>>> ktime_get_boot_fast_ns() can still be correlated with timestamps from
>>> printk timestamps, I think this change only trades one problem for
>>> another.
>>>
>>> Thanks,
>>> -- Marco
>>
>> Honestly, the possibility of accurately matching a message in the printk
>> log by the timestamp in the kfence report is very low, since allocation
>> and free do not directly correspond to a certain event.
> 
> It's about being able to compare the timestamps. I don't want to match
> an exact event, but be able to figure out which event happened
> before/after an allocation or free, i.e. the logical ordering of
> events.
> 
> With CONFIG_PRINTK_CALLER we can see the CPU ID in printk lines and
> are therefore able to accurately compare printk lines with information
> given by KFENCE alloc/free info.
> 


That makes sense.


>> Since time drifts across CPUs, timestamps may be different even if
>> allocation and free can correspond to a certain event.
> 
> This is not a problem with CONFIG_PRINTK_CALLER.
> 
>> If we really need to find the relevant printk logs by the timestamps in
>> the kfence report, all we can do is to look for messages that are within
>> a certain time range.
>>
>> If we are looking for messages in a certain time range, there is not
>> much difference between local_clock() and ktime_get_boot_fast_ns().
>>
>> Also, this patch is in preparation for my next patch.
>>
>> My next patch is to show the PID, CPU number, and timestamp when the
>> error occurred, in this case time drift from different CPUs can
>> cause confusion.
> 
> It's not quite clear how there's a dependency between this patch and a
> later patch, but generally it's good practice to send related patches
> as a patch series. That way it's easier to see what the overall
> changes are and provide feedback as a whole - as is, it's difficult to
> provide feedback.
> 
> However, from what you say this information is already given.
> dump_stack_print_info() shows this - e.g this bit here is printed by
> where the error occurred:
> 
> | CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7
> | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2
> 04/01/2014
> 
> And if the printk log has timestamps, then these lines are prefixed
> with the timestamp where the error occurred.
> 


Thanks, I found that information.

Since this information is at the bottom of the report, I had previously
ignored them.

I would suggest considering moving this information to the top of
the report, for example

BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234

CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 
04/01/2014

Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72):
...

This more clearly correlates this information with the occurrence of
the error.

If the timestamp of the printk is used as the timestamp of when the
error occurred, then my patch is unnecessary.


>> For example, use-after-free caused by a subtle race condition, in which
>> the time between the free and the error occur will be very close.
>>
>> Time drift from different CPUs may cause it to appear in the report that
>> the error timestamp precedes the free timestamp.
> 
> That doesn't matter. I recommend that you go through a hypothetical
> debugging scenario:
> 1. We are not interested in the absolute timings of events, but the
> logical ordering between them.
> 
> 2. The logical ordering of events is inherent from how KFENCE
> operates: an error _always_ follows an allocation and/or free. From a
> debugging point of view, the timestamps do not have any value here.
> 
> 3. The timestamps _do_ add value when trying to figure out the logical
> ordering between allocation, free, or the erroneous access _with
> other_ events in the system. A stream of other events is always shown
> in the kernel log (printk). Other streams of events can be obtained
> via e.g. ftrace (which also uses local_clock(), see
> kernel/trace/trace_clock.c).
> 
> So, the timestamp that KFENCE should show is the one that most likely
> allows us to deduce the logical ordering with other events in the
> system.


Thanks for the detailed explanation.

I now have a better understanding of the purpose of timestamps in
KFENCE reports.