linux-kernel - Re: [PATCH] irqchip, gicv3-its, numa: Workaround for Cavium ThunderX erratum 23144

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55DB11F3.1070604@arm.com>
Date:	Mon, 24 Aug 2015 13:45:39 +0100
From:	Marc Zyngier <marc.zyngier@....com>
To:	Ganapatrao Kulkarni <gpkulkarni@...il.com>
CC:	Robert Richter <robert.richter@...iumnetworks.com>,
	Will Deacon <Will.Deacon@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Jason Cooper <jason@...edaemon.net>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	Tirumalesh Chalamarla <tirumalesh.chalamarla@...iumnetworks.com>,
	Robert Richter <rric@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ganapatrao Kulkarni <ganapatrao.kulkarni@...iumnetworks.com>
Subject: Re: [PATCH] irqchip, gicv3-its, numa: Workaround for Cavium ThunderX
 erratum 23144

On 24/08/15 13:30, Ganapatrao Kulkarni wrote:
> Hi Marc,
> 
> thanks for the review comments.
> 
> On Mon, Aug 24, 2015 at 3:47 PM, Marc Zyngier <marc.zyngier@....com> wrote:
>> Hi Robert,
>>
>> Just came back from the Seattle madness, so picking up patches in
>> reverse order... ;-)
>>
>> On 22/08/15 14:10, Robert Richter wrote:
>>> The patch below adds a workaround for gicv3 in a numa environment. It
>>> is on top of my recent gicv3 errata patch submission v4 and Ganapat's
>>> arm64 numa patches for devicetree v5.
>>>
>>> Please comment.
>>>
>>> Thanks,
>>>
>>> -Robert
>>>
>>>
>>>
>>> From c432820451a46b8d1e299b8bfbfcdcb3b75340e7 Mon Sep 17 00:00:00 2001
>>> From: Ganapatrao Kulkarni <gkulkarni@...iumnetworks.com>
>>> Date: Wed, 19 Aug 2015 23:40:05 +0530
>>> Subject: [PATCH] irqchip, gicv3-its, numa: Workaround for Cavium ThunderX erratum
>>>  23144
>>>
>>> This implements a workaround for gicv3-its erratum 23144 applicable
>>> for Cavium's ThunderX multinode systems.
>>>
>>> The erratum fixes the hang of ITS SYNC command by avoiding inter node
>>> io and collections/cpu mapping. This fix is only applicable for
>>> Cavium's ThunderX dual-socket platforms.
>>
>> Can you please elaborate on this? I can't see any reference to the SYNC
>> command there. Is that a case of an ITS not being able to route LPIs to
>> cores on another socket?
> we were seeing mapc command failing when we were mapping its of node0
> with collections of node1(vice-versa).

There is no such thing as "collection of node1". There are collections
mapped to redistributors.

> we found sync was timing out, which is issued post mapc(also for mapvi
> and movi).
> Yes this errata limits the routing of inter-node LPIs.

Please update the commit message to reflect the actual issue.

> 
>>
>> I really need more details to understand this patch (please use short
>> sentences, I'm still in a different time zone).
>>
>>>
>>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@...iumnetworks.com>
>>> [ rric: Reworked errata code, added helper functions, updated commit
>>>       message. ]
>>>
>>> Signed-off-by: Robert Richter <rrichter@...ium.com>
>>> ---
>>>  arch/arm64/Kconfig               | 14 +++++++++++
>>>  drivers/irqchip/irq-gic-common.c |  5 ++--
>>>  drivers/irqchip/irq-gic-v3-its.c | 54 ++++++++++++++++++++++++++++++++++------
>>>  3 files changed, 64 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 3809187ed653..b92b7b70b29b 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -421,6 +421,20 @@ config ARM64_ERRATUM_845719
>>>
>>>         If unsure, say Y.
>>>
>>> +config CAVIUM_ERRATUM_22375
>>> +     bool "Cavium erratum 22375, 24313"
>>> +     depends on NUMA
>>> +     default y
>>> +     help
>>> +             Enable workaround for erratum 22375, 24313.
>>> +
>>> +config CAVIUM_ERRATUM_23144
>>> +     bool "Cavium erratum 23144"
>>> +     depends on NUMA
>>> +     default y
>>> +     help
>>> +             Enable workaround for erratum 23144.
>>> +
>>>  config CAVIUM_ERRATUM_23154
>>>       bool "Cavium erratum 23154: Access to ICC_IAR1_EL1 is not sync'ed"
>>>       depends on ARCH_THUNDER
>>> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
>>> index ee789b07f2d1..1dfce64dbdac 100644
>>> --- a/drivers/irqchip/irq-gic-common.c
>>> +++ b/drivers/irqchip/irq-gic-common.c
>>> @@ -24,11 +24,12 @@
>>>  void gic_check_capabilities(u32 iidr, const struct gic_capabilities *cap,
>>>                       void *data)
>>>  {
>>> -     for (; cap->desc; cap++) {
>>> +     for (; cap->init; cap++) {
>>>               if (cap->iidr != (cap->mask & iidr))
>>>                       continue;
>>>               cap->init(data);
>>> -             pr_info("%s\n", cap->desc);
>>> +             if (cap->desc)
>>> +                     pr_info("%s\n", cap->desc);
>>
>> No. I really want to see what errata are applied when I look at a kernel
>> log.
> sorry, did not understood your comment, it is still printed using cap->desc.

Yes, but you are making desc optional, and I don't want it to be
optional. I want the kernel to scream that we're using an erratum
workaround so that we can understand what is happening when reading a
kernel log.

>>>       }
>>>  }
>>>
>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>> index 4bb135721e72..666be39f13a9 100644
>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>> @@ -43,7 +43,8 @@
>>>  #include "irqchip.h"
>>>
>>>  #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING                (1ULL << 0)
>>> -#define ITS_FLAGS_CAVIUM_THUNDERX            (1ULL << 1)
>>> +#define ITS_WORKAROUND_CAVIUM_22375          (1ULL << 1)
>>> +#define ITS_WORKAROUND_CAVIUM_23144          (1ULL << 2)
>>>
>>>  #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING  (1 << 0)
>>>
>>> @@ -76,6 +77,7 @@ struct its_node {
>>>       struct list_head        its_device_list;
>>>       u64                     flags;
>>>       u32                     ite_size;
>>> +     int                     numa_node;
>>>  };
>>>
>>>  #define ITS_ITT_ALIGN                SZ_256
>>> @@ -609,11 +611,18 @@ static void its_eoi_irq(struct irq_data *d)
>>>  static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
>>>                           bool force)
>>>  {
>>> -     unsigned int cpu = cpumask_any_and(mask_val, cpu_online_mask);
>>> +     unsigned int cpu;
>>>       struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>>>       struct its_collection *target_col;
>>>       u32 id = its_get_event_id(d);
>>>
>>> +     if (its_dev->its->flags & ITS_WORKAROUND_CAVIUM_23144) {
>>> +             cpu = cpumask_any_and(mask_val,
>>> +                             cpumask_of_node(its_dev->its->numa_node));
>>
>> I suppose cpumask_of_node returns only the *online* cores of a given
>> node, right?
> yes.
>>
>>> +     } else {
>>> +             cpu = cpumask_any_and(mask_val, cpu_online_mask);
>>> +     }
>>> +
>>>       if (cpu >= nr_cpu_ids)
>>>               return -EINVAL;
>>>
>>> @@ -845,7 +854,7 @@ static int its_alloc_tables(struct its_node *its)
>>>       u64 typer;
>>>       u32 ids;
>>>
>>> -     if (its->flags & ITS_FLAGS_CAVIUM_THUNDERX) {
>>> +     if (its->flags & ITS_WORKAROUND_CAVIUM_22375) {
>>>               /*
>>>                * erratum 22375: only alloc 8MB table size
>>>                * erratum 24313: ignore memory access type
>>> @@ -1093,6 +1102,11 @@ static void its_cpu_init_lpis(void)
>>>       dsb(sy);
>>>  }
>>>
>>> +static inline int numa_node_id_early(void)
>>> +{
>>> +     return MPIDR_AFFINITY_LEVEL(read_cpuid_mpidr(), 2);
>>> +}
>>> +
>>>  static void its_cpu_init_collection(void)
>>>  {
>>>       struct its_node *its;
>>> @@ -1104,6 +1118,11 @@ static void its_cpu_init_collection(void)
>>>       list_for_each_entry(its, &its_nodes, entry) {
>>>               u64 target;
>>>
>>> +             /* avoid cross node core and its mapping */
>>> +             if ((its->flags & ITS_WORKAROUND_CAVIUM_23144) &&
>>> +                     its->numa_node != numa_node_id_early())
>>> +                             continue;
>>> +
>>
>> Argh. This is horrible. You really need some topology bindings to
>> describe your system instead of hardcoding some random level of
>> affinity. The next time someone is going to come up with a similarly
>> broken system, they will have to reinvent that wheel.
>  thanks for the suggestion, we will use cpu_topology[cpuid].cluster_id
> instead of  function numa_node_id_early()

Make sure you can relate the ITS to it (have a way to find out which
node a given ITS belong to, without playing tricks with the memory map.

>>
>>>               /*
>>>                * We now have to bind each collection to its target
>>>                * redistributor.
>>> @@ -1372,9 +1391,15 @@ static void its_irq_domain_activate(struct irq_domain *domain,
>>>  {
>>>       struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>>>       u32 event = its_get_event_id(d);
>>> +     unsigned int cpu;
>>> +
>>> +     if (its_dev->its->flags & ITS_WORKAROUND_CAVIUM_23144)
>>> +             cpu = cpumask_first(cpumask_of_node(its_dev->its->numa_node));
>>> +     else
>>> +             cpu = cpumask_first(cpu_online_mask);
>>
>> Looks like this can be factored with the code you've added in set_affinity.
> we cant issue mapvi with cross-node mapping.

And? You have almost the same code twice. Surely you can devise a single
helper that holds this code.

>>>
>>>       /* Bind the LPI to the first possible CPU */
>>> -     its_dev->event_map.col_map[event] = cpumask_first(cpu_online_mask);
>>> +     its_dev->event_map.col_map[event] = cpu;
>>>
>>>       /* Map the GIC IRQ and event to the device */
>>>       its_send_mapvi(its_dev, d->hwirq, event);
>>> @@ -1457,16 +1482,31 @@ static int its_force_quiescent(void __iomem *base)
>>>       }
>>>  }
>>>
>>> +static inline int its_get_node_thunderx(struct its_node *its)
>>> +{
>>> +     return (its->phys_base >> 44) & 0x3;
>>
>> Why 3? Is that because you have provision for 4 sockets or what?
> yes.
>>
>>> +}
>>> +
>>>  static void its_enable_cavium_thunderx(void *data)
>>>  {
>>> -     struct its_node *its = data;
>>> +     struct its_node __maybe_unused *its = data;
>>>
>>> -     its->flags |= ITS_FLAGS_CAVIUM_THUNDERX;
>>> +#ifdef CONFIG_CAVIUM_ERRATUM_22375
>>> +     its->flags |= ITS_WORKAROUND_CAVIUM_22375;
>>> +     pr_info("ITS: Enabling workaround for 22375, 24313\n");
>>> +#endif
>>> +
>>> +#ifdef CONFIG_CAVIUM_ERRATUM_23144
>>> +     if (num_possible_nodes() > 1) {
>>> +             its->numa_node = its_get_node_thunderx(its);
>>
>> I'd rather see numa_node being always initialized to something useful.
>> If you're adding numa support, why can't this be initialized via
>> standard topology bindings?
> IIUC, topology defines only cpu topology.

Well, welcome to a much more complex system where both your CPUs and
your IOs have some degree of affinity. This needs to be described
properly, and not hacked on the side.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/