[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1240552261.6842.857.camel@yhuang-dev.sh.intel.com>
Date: Fri, 24 Apr 2009 13:51:01 +0800
From: Huang Ying <ying.huang@...el.com>
To: Andi Kleen <ak@...ux.intel.com>
Cc: Ingo Molnar <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Re-implement MCE log ring buffer as per-CPU ring buffer
On Wed, 2009-04-22 at 19:11 +0800, Andi Kleen wrote:
> Huang Ying wrote:
>
> Basic concept is good and it fixes some long standing problems.
>
> But some details need to be improved I think:
>
> > +/*
> > + * Initialize MCE per-CPU log buffer
> > + */
> > +static void mce_log_init(void)
> > +{
> > + int cpu;
> > + struct mce_log_cpu *mcelog_cpu;
> > +
> > + if (mcelog.log_cpus)
> > + return;
> > + mcelog.log_cpus = kmalloc(num_possible_cpus() * sizeof(mcelog_cpu),
> > + GFP_KERNEL);
> > + for_each_possible_cpu(cpu) {
> > + mcelog_cpu = &per_cpu(mcelog_cpus, cpu);
> > + mcelog.log_cpus[cpu] = mcelog_cpu;
> > + }
>
> I don't understand why the separate allocation. Can't you just put
> the whole log buffer directly into the per cpu data?
>
> This would also make the initialization cleaner because you would need to only
> initialize state for the currently initialized CPU.
The original intention to use mcelog.log_cpus is to make it easier to
find per-cpu buffers from memory image produced by kdump. But is seems
that something as follow is just sufficient.
mcelog.log_cpus = alloc_percpu(struct mce_log_cpu);
Is this what you suggested?
>
> > + while (!mcelog_cpu->entry[i].finished) {
> > + rdtscll(now);
> > + if (now - start > WRITER_TIMEOUT_CYC) {
> > + memset(mcelog_cpu->entry + i, 0,
> > sizeof(struct mce));
> > + head = mcelog_cpu->head;
> > goto timeout;
>
> This timeout should be reported somehow, perhaps with a printk.
> Also it's problematic to hardcode cycles here, i think it would
> be better to use a similar scheme as the Monarch timeout
> with a ndelay() and a spinunit. That is guaranteed to stay
> the same even as CPU frequencies change.
Yes. I will change this.
> > +static ssize_t mce_read(struct file *filp, char __user *inubuf, size_t usize,
> > + loff_t *off)
> > +{
> > + char __user *ubuf = inubuf;
> > + struct mce_log_cpu *mcelog_cpu;
> > + int cpu, new_mce, err = 0;
> > + static DEFINE_MUTEX(mce_read_mutex);
> > + size_t usize_limit;
> > +
> > + /* Too large user buffer size may cause system not response */
>
> Did you understand why that happens? It's worrying.
The loop in mce_read() does not release the CPU in non-preempt kernel.
If the user buffer size is too large, it may cause the CPU to be
soft-lock-upped or not response. One way to solve the issue is to
constrain the size of user buffer, another way may be insert a
schedule() in the loop.
> > +static ssize_t show_log_flags(struct sys_device *s,
> > + struct sysdev_attribute *attr,
> > + char *buf)
> > +{
> > + int cpu = s->id;
> > + struct mce_log_cpu *mcelog_cpu = &per_cpu(mcelog_cpus, cpu);
> > + unsigned flags;
> > + do {
> > + flags = mcelog_cpu->flags;
> > + } while (cmpxchg((unsigned *)&mcelog_cpu->flags, flags, 0) != flags);
> > + return sprintf(buf, "0x%x\n", flags);
> > +}
> > +
> > static SYSDEV_ATTR(trigger, 0644, show_trigger, set_trigger);
> > static SYSDEV_INT_ATTR(tolerant, 0644, tolerant);
> > ACCESSOR(check_interval,check_interval,mce_restart())
> > +static SYSDEV_ATTR(log_flags, 0644, show_log_flags, NULL);
>
> Do we really need that interface? It seems rather obscure.
> mcelog should get these flags anyways. Better to drop it.
For overflow flag, it seems not very useful. I will drop this.
Best Regards,
Huang Ying
Download attachment "signature.asc" of type "application/pgp-signature" (198 bytes)
Powered by blists - more mailing lists