[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110214114515.GA9867@elte.hu>
Date: Mon, 14 Feb 2011 12:45:15 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Cyrill Gorcunov <gorcunov@...il.com>
Cc: Suresh Siddha <suresh.b.siddha@...el.com>,
Yinghai Lu <yhlu.kernel@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>,
lkml <linux-kernel@...r.kernel.org>
Subject: Re: [RFC 1/2 -tip/master] x86, x2apic: minimize IPI register writes
using cluster groups
* Cyrill Gorcunov <gorcunov@...il.com> wrote:
> In the case of x2apic cluster mode we can group IPI register writes based on the
> cluster group instead of individual per-cpu destiantion messages. This reduces the
> apic register writes and reduces the amount of IPI messages (in the best case we
> can reduce it by a factor of 16).
>
> With this change, microbenchmark measuring the cost of flush_tlb_others(), with
> the flush tlb IPI being sent from a cpu in the socket-1 to all the logical cpus in
> socket-2 (on a Westmere-EX system that has 20 logical cpus in a socket) is 3x
> times better now (compared to the former 'send one-by-one' algorithm).
Pretty nice!
I have a few structural and nitpicking comments:
> +++ tip-linux-2.6/arch/x86/kernel/apic/probe_64.c
> @@ -71,7 +71,9 @@ void __init default_setup_apic_routing(v
> #endif
>
> if (apic == &apic_flat && num_possible_cpus() > 8)
> - apic = &apic_physflat;
> + apic = &apic_physflat;
> + else if (apic == &apic_x2apic_cluster)
> + x2apic_init_cpu_notifier();
This should be encapsulated cleaner - this higher layer does not care what the
x2apic code tries to initialize.
So i think the cleanest method would be to clean up default_setup_apic_routing() and
get rid of the CONFIG_X86_X2APIC (and UV and vsmp) mess from there.
We want some sort of proper driver init sequence between the various APIC drivers,
which sequence the default_setup_apic_routing() code just iterates and as a result
it gets a pointer to a 'struct apic'. This would be like all other driver init
sequences work - no ugly 'glue' like probe_64.c would be needed.
> +static void __x2apic_send_IPI_mask(const struct cpumask *mask, int vector,
> + int exclude_self)
Please if possible don't break function prototypes in the middle of the argument
list. There are more natural places to break the line:
static void
__x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int exclude_self)
But it can be in a single line as well. (up to 90-100 cols is fine.)
> +{
> + unsigned long cpu;
> unsigned long flags;
> + struct cpumask *cpus_in_cluster_ptr, *ipi_mask_ptr;
> + u32 dest, this_cpu;
Ok, this is a pet peeve of mine: look how inconsistent this variable definition
block has become. This would be cleaner:
struct cpumask *cpus_in_cluster_ptr, *ipi_mask_ptr;
unsigned long cpu, flags;
u32 dest, this_cpu;
> + /*
> + * we are to modify mask, so we need an own copy
> + * and be sure it's manipulated with irq off
> + */
Most comments in this file capitalize the first word in a comment block.
(this holds for a number of similar cases as well in later portions of your patch)
> + for_each_cpu(cpu, ipi_mask_ptr) {
> + unsigned long i;
> + dest = 0;
Please separate variable definitions from the first statement in a more visible manner.
> + cpus_in_cluster_ptr = per_cpu(cpus_in_cluster, cpu);
> +
> + /* only cpus in cluster involved */
> + for_each_cpu_and(i, ipi_mask_ptr, cpus_in_cluster_ptr)
> + if (!exclude_self || i != this_cpu)
> + dest |= per_cpu(x86_cpu_to_logical_apicid, i);
Please use curly braces in blocks that are longer than a single line.
> +#define x2apic_propagate_cpu_cluster_status_online(cpu) \
> + x2apic_propagate_cpu_cluster_status(cpu, 1)
> +
> +#define x2apic_propagate_cpu_cluster_status_offline(cpu) \
> + x2apic_propagate_cpu_cluster_status(cpu, 0)
Is there any particular reason why we use preprocessor technology here, instead of
the fine C language that .c files are generally written in?
Enumerating 0/1 symbolically would be cleaner that introducing two smaller helper
functions in any case.
> +static int __cpuinit
> +cluster_setup(struct notifier_block *nfb, unsigned long action, void *hcpu)
> +{
> + unsigned int cpu = (unsigned long)hcpu;
> + int err = 0;
> +
> + switch (action) {
> + case CPU_UP_PREPARE:
> + zalloc_cpumask_var(&per_cpu(cpus_in_cluster, cpu), GFP_KERNEL);
> + zalloc_cpumask_var(&per_cpu(ipi_mask, cpu), GFP_KERNEL);
> + if (!per_cpu(cpus_in_cluster, cpu) || !per_cpu(ipi_mask, cpu)) {
> + free_cpumask_var(per_cpu(cpus_in_cluster, cpu));
> + free_cpumask_var(per_cpu(ipi_mask, cpu));
> + err = -ENOMEM;
> + }
zalloc_cpumask_var() has a return code that could be used for a slightly more
standard deinit/teardown sequence.
> +void x2apic_init_cpu_notifier(void)
> +{
> + int cpu = smp_processor_id();
>
> + zalloc_cpumask_var(&per_cpu(cpus_in_cluster, cpu), GFP_KERNEL);
> + zalloc_cpumask_var(&per_cpu(ipi_mask, cpu), GFP_KERNEL);
> + BUG_ON(!per_cpu(cpus_in_cluster, cpu) || !per_cpu(ipi_mask, cpu));
Such a BUG_ON() is not particularly user friendly - and this could trigger during
CPU hotplug events, i.e. while the system is fully booted up, right?
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists