lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8aa30f6a-d18d-1cce-57dc-08efb52d822e@huawei.com>
Date:   Mon, 17 Apr 2017 11:16:19 +0800
From:   Xie XiuQi <xiexiuqi@...wei.com>
To:     "Baicar, Tyler" <tbaicar@...eaurora.org>,
        <christoffer.dall@...aro.org>, <marc.zyngier@....com>,
        <catalin.marinas@....com>, <will.deacon@....com>,
        <james.morse@....com>, <fu.wei@...aro.org>, <rostedt@...dmis.org>,
        <hanjun.guo@...aro.org>, <shiju.jose@...wei.com>
CC:     <linux-arm-kernel@...ts.infradead.org>,
        <kvmarm@...ts.cs.columbia.edu>, <kvm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <linux-acpi@...r.kernel.org>,
        <gengdongjiu@...wei.com>, <zhengqiang10@...wei.com>,
        <wuquanming@...wei.com>, <wangxiongfeng2@...wei.com>
Subject: Re: [PATCH v3 1/8] trace: ras: add ARM processor error information
 trace event

Hi Tyler,

On 2017/4/17 11:08, Xie XiuQi wrote:
> Hi Tyler,
> 
> Thanks for your comments and testing.
> 
> On 2017/4/15 4:36, Baicar, Tyler wrote:
>> On 3/30/2017 4:31 AM, Xie XiuQi wrote:
>>> Add a new trace event for ARM processor error information, so that
>>> the user will know what error occurred. With this information the
>>> user may take appropriate action.
>>>
>>> These trace events are consistent with the ARM processor error
>>> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>>>
>>> ---
>>> v2: add trace enabled condition as Steven's suggestion.
>>>      fix a typo.
>>> ---
>>>
>>> Cc: Steven Rostedt <rostedt@...dmis.org>
>>> Cc: Tyler Baicar <tbaicar@...eaurora.org>
>>> Signed-off-by: Xie XiuQi <xiexiuqi@...wei.com>
>>> ---
>> ...
>>>   +#define ARM_PROC_ERR_TYPE    \
>>> +    EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" )    \
>>> +    EM ( CPER_ARM_INFO_TYPE_TLB,  "TLB error" )    \
>>> +    EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" )    \
>>> +    EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
>>> +
>>> +#define ARM_PROC_ERR_FLAGS    \
>>> +    EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" )    \
>>> +    EM ( CPER_ARM_INFO_FLAGS_LAST,  "Last error captured" )    \
>>> +    EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" )    \
>>> +    EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
>>> +
>> Hello Xie XiuQi,
>>
>> This isn't compiling for me because of these definitions. Here you are using ARM_*, but below in the TP_printk you are using ARCH_*. The compiler complains the ARCH_* ones are undefined:
>>
>> ./include/trace/../../include/ras/ras_event.h:278:37: error: 'ARCH_PROC_ERR_TYPE' undeclared (first use in this function)
>>      __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
>> ./include/trace/../../include/ras/ras_event.h:280:38: error: 'ARCH_PROC_ERR_FLAGS' undeclared (first use in this function)
>>      __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),
> 
> Sorry, it's a typo. It should be ARM_xxx.
> 
>>
>>> +/*
>>> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
>>> + * via TRACE_DEFINE_ENUM().
>>> + */
>>> +#undef EM
>>> +#undef EMe
>>> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
>>> +#define EMe(a, b)    TRACE_DEFINE_ENUM(a);
>>> +
>>> +ARM_PROC_ERR_TYPE
>>> +ARM_PROC_ERR_FLAGS
>> Are the above two lines supposed to be here?
>>> +
>>> +/*
>>> + * Now redefine the EM() and EMe() macros to map the enums to the strings
>>> + * that will be printed in the output.
>>> + */
>>> +#undef EM
>>> +#undef EMe
>>> +#define EM(a, b)        { a, b },
>>> +#define EMe(a, b)    { a, b }
>>> +
>>> +TRACE_EVENT(arm_proc_err,
>> I think it would be better to keep this similar to the naming of the current RAS trace events (right now we have mc_event, arm_event, aer_event, etc.). I would suggest using "arm_err_info_event" since this is handling the error information structures of the arm errors.
>>> +
>>> +    TP_PROTO(const struct cper_arm_err_info *err),
>>> +
>>> +    TP_ARGS(err),
>>> +
>>> +    TP_STRUCT__entry(
>>> +        __field(u8, type)
>>> +        __field(u16, multiple_error)
>>> +        __field(u8, flags)
>>> +        __field(u64, error_info)
>>> +        __field(u64, virt_fault_addr)
>>> +        __field(u64, physical_fault_addr)
>> Validation bits should also be a part of this structure that way user space tools will know which of these fields are valid.
> 
> Could we use the default value to check the validation which we have checked in TP_fast_assign?
> 
>>> +    ),
>>> +
>>> +    TP_fast_assign(
>>> +        __entry->type = err->type;
>>> +
>>> +        if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
>>> +            __entry->multiple_error = err->multiple_error;
>>> +        else
>>> +            __entry->multiple_error = ~0;
>>> +
>>> +        if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
>>> +            __entry->flags = err->flags;
>>> +        else
>>> +            __entry->flags = ~0;
>>> +
>>> +        if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
>>> +            __entry->error_info = err->error_info;
>>> +        else
>>> +            __entry->error_info = 0ULL;
>>> +
>>> +        if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
>>> +            __entry->virt_fault_addr = err->virt_fault_addr;
>>> +        else
>>> +            __entry->virt_fault_addr = 0ULL;
>>> +
>>> +        if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
>>> +            __entry->physical_fault_addr = err->physical_fault_addr;
>>> +        else
>>> +            __entry->physical_fault_addr = 0ULL;
>>> +    ),
>>> +
>>> +    TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
>> I think the "ARM Processor Error:" part of this should just be removed. Here's the output with this removed and the trace event renamed to arm_err_info_event. I think this looks much cleaner and matches the style used with the arm_event.
>>
>>           <idle>-0     [020] .ns.   366.592434: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510f8000; running state: 1; PSCI state: 0
>>           <idle>-0     [020] .ns.   366.592437: arm_err_info_event: type cache error; count: 0; flags: 0x3; error info: 0000000000c20058; virtual address: 0000000000000000; physical address: 0000000000000000
> 

As this section is ARM Processor Error Section, how about use arm_proc_err_event?

> I agree. It looks much better.
> 
>>
>> Thanks,
>> Tyler
>>
> 

-- 
Thanks,
Xie XiuQi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ