linux-kernel - Re: [PATCH v2 3/3] kernel/smp: add more data to CSD lock debugging

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef5e569c-4dc3-4a69-e173-e372ccf9a994@suse.com>
Date:   Tue, 2 Mar 2021 06:57:15 +0100
From:   Jürgen Groß <jgross@...e.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        paulmck@...nel.org, mhocko@...e.com,
        Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH v2 3/3] kernel/smp: add more data to CSD lock debugging

On 01.03.21 18:07, Peter Zijlstra wrote:
> On Mon, Mar 01, 2021 at 11:13:36AM +0100, Juergen Gross wrote:
>> In order to help identifying problems with IPI handling and remote
>> function execution add some more data to IPI debugging code.
>>
>> There have been multiple reports of cpus looping long times (many
>> seconds) in smp_call_function_many() waiting for another cpu executing
>> a function like tlb flushing. Most of these reports have been for
>> cases where the kernel was running as a guest on top of KVM or Xen
>> (there are rumours of that happening under VMWare, too, and even on
>> bare metal).
>>
>> Finding the root cause hasn't been successful yet, even after more than
>> 2 years of chasing this bug by different developers.
>>
>> Commit 35feb60474bf4f7 ("kernel/smp: Provide CSD lock timeout
>> diagnostics") tried to address this by adding some debug code and by
>> issuing another IPI when a hang was detected. This helped mitigating
>> the problem (the repeated IPI unlocks the hang), but the root cause is
>> still unknown.
>>
>> Current available data suggests that either an IPI wasn't sent when it
>> should have been, or that the IPI didn't result in the target cpu
>> executing the queued function (due to the IPI not reaching the cpu,
>> the IPI handler not being called, or the handler not seeing the queued
>> request).
>>
>> Try to add more diagnostic data by introducing a global atomic counter
>> which is being incremented when doing critical operations (before and
>> after queueing a new request, when sending an IPI, and when dequeueing
>> a request). The counter value is stored in percpu variables which can
>> be printed out when a hang is detected.
>>
>> The data of the last event (consisting of sequence counter, source
>> cpu, target cpu, and event type) is stored in a global variable. When
>> a new event is to be traced, the data of the last event is stored in
>> the event related percpu location and the global data is updated with
>> the new event's data. This allows to track two events in one data
>> location: one by the value of the event data (the event before the
>> current one), and one by the location itself (the current event).
>>
>> A typical printout with a detected hang will look like this:
>>
>> csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 5000000003 ns for CPU#06 scf_handler_1+0x0/0x50(0xffffa2a881bb1410).
>> 	csd: CSD lock (#1) handling prior scf_handler_1+0x0/0x50(0xffffa2a8813823c0) request.
>>          csd: cnt(00008cc): ffff->0000 dequeue (src cpu 0 == empty)
>>          csd: cnt(00008cd): ffff->0006 idle
>>          csd: cnt(0003668): 0001->0006 queue
>>          csd: cnt(0003669): 0001->0006 ipi
>>          csd: cnt(0003e0f): 0007->000a queue
>>          csd: cnt(0003e10): 0001->ffff ping
>>          csd: cnt(0003e71): 0003->0000 ping
>>          csd: cnt(0003e72): ffff->0006 gotipi
>>          csd: cnt(0003e73): ffff->0006 handle
>>          csd: cnt(0003e74): ffff->0006 dequeue (src cpu 0 == empty)
>>          csd: cnt(0003e7f): 0004->0006 ping
>>          csd: cnt(0003e80): 0001->ffff pinged
>>          csd: cnt(0003eb2): 0005->0001 noipi
>>          csd: cnt(0003eb3): 0001->0006 queue
>>          csd: cnt(0003eb4): 0001->0006 noipi
>>          csd: cnt now: 0003f00
>>
>> The idea is to print only relevant entries. Those are all events which
>> are associated with the hang (so sender side events for the source cpu
>> of the hanging request, and receiver side events for the target cpu),
>> and the related events just before those (for adding data needed to
>> identify a possible race). Printing all available data would be
>> possible, but this would add large amounts of data printed on larger
>> configurations.
>>
>> Signed-off-by: Juergen Gross <jgross@...e.com>
>> Tested-by: Paul E. McKenney <paulmck@...nel.org>
>> ---
>> V2:
>> - add automatic data deciphering and sorting of entries
>> - add new trace point for leaving flush_smp_call_function_queue()
>> - add information when finding an empty call_single_queue
> 
> They do not apply on top of these:
> 
>    https://lkml.kernel.org/r/20210220231712.2475218-2-namit@vmware.com
> 
> :-/
> 
>> @@ -290,6 +476,19 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
>>   
>>   void __smp_call_single_queue(int cpu, struct llist_node *node)
>>   {
>> +#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
>> +	if (static_branch_unlikely(&csdlock_debug_extended)) {
>> +		unsigned int type;
>> +
>> +		type = CSD_TYPE(container_of(node, call_single_data_t,
>> +					     node.llist));
>> +		if (type == CSD_TYPE_SYNC || type == CSD_TYPE_ASYNC) {
>> +			__smp_call_single_queue_debug(cpu, node);
>> +			return;
>> +		}
>> +	}
>> +#endif
> 
> This really ought to be in generic_exec_single(), because there we know
> the type matches.

You are right. This is much better.


Juergen

Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3092 bytes)

Download attachment "OpenPGP_signature" of type "application/pgp-signature" (496 bytes)