linux-kernel - Re: Status of tip/x86/apic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <548D6D13.8070706@linux.intel.com>
Date:	Sun, 14 Dec 2014 18:57:23 +0800
From:	Jiang Liu <jiang.liu@...ux.intel.com>
To:	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>
CC:	x86@...nel.org, Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	Tony Luck <tony.luck@...el.com>,
	Borislav Petkov <bp@...en8.de>, Joerg Roedel <joro@...tes.org>,
	Marc Zyngier <marc.zyngier@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Yinghai Lu <yinghai@...nel.org>,
	Alex Williamson <alex.williamson@...hat.com>
Subject: Re: Status of tip/x86/apic

On 2014/12/13 4:35, Thomas Gleixner wrote:
> Folks,
> 
> after mulling this in my head for quite some time, I'm going to
> postpone the whole thing for 3.20.
> 
> That said, I need to say, that I'm really happy with the outcome of
> this massive overhaul. I really want to thank all involved people,
> especially Jiang, for their great work and help so far!!!
> 
> The hierarchical irq domains really improve the code by distangling
> the various subsystems and the arm[64] use cases just prove that it
> was the right decision.
> 
> We're almost there with x86 but my gut feeling tells me that pushing
> it now is too risky. I rather prefer quiet holidays for all of us than
> the nagging fear that the post holiday inbox will be full of obscure
> bug reports and we then start a chase and bandaid race which will kill
> the well earned recreation in an instant.
Hi Thomas,
	It's more safe to let it mature for another merge window
in tip tree:)

> 
> This will block other things in that area for a while, but it's the
> only sane decision at the moment, unless Linus insists on pulling the
> lot and promises to deal with the fallout. :)
> 
> The reasons why I decided to do so are:
> 
>  - The bugs we found in the last week. That tells me that there is
>    some more stuff lurking.
> 
>  - The already existing mess in a some areas which got unearthed by
>    this work in the last week. That definitely needs a thorough
>    cleanup and not some more bandaids.
> 
>  - Lack of proper debugging features. Sending out per issue debug
>    patches simply does not scale.
> 
>  - It's not bisectable and unfortunately there are too many fixes to
>    various places to make manual bisection feasible.
> 
> For 3.20 I want to proceed in the following way:
> 
>  - Apply all bug fixes to x86/apic
> 
>  - Address the issues with the resource management (and elsewhere)
>    proper on top
> 
>  - Add a proper debugging mechanism (the existing irqdomain debugfs
>    interface is completely useless).
> 
>    For the hierarchical domains we really want two things:
> 
>    1) A debugfs interface which lets us introspect the hierarchy.
> 
>       I was working on that before I got dragged into bug chasing and
>       merge window frenzy.
> 
>       For proper introspection down to the hardware level this
>       requires either domain/irq_chip specific callbacks or some
>       unified way to track the current state. The latter is painful as
>       it requires to store information redundantly.
> 
>       So having domain/chip callbacks to retrieve the state is the
>       right solution. Most chip/domain implementations cache their
>       [hardware] state already, so providing an accessor to convert
>       that into a common data format is the best way. If the callback
>       is not implemented then the information is not available or
>       maybe not relevant.
> 
>       I'm not going to have a per domain/chip seqfile print function
>       as this is just a complete waste. Pretty printing obscure
>       hardware information does not help much for the general user. We
>       rather have the raw data and proper post processing tools which
>       can provide that pretty print information than bloating the
>       kernel binary with randomized and possibly useless seq_print
>       functions.
> 
>       Another reason why I want just raw binary data is that I want to
>       use exactly the same mechanism for tracing. See below.
> 
>       After looking at the various new domain/chip implementations its
>       sufficient to have 16 bytes of storage space for this, but
>       that's a minor detail.
> 
>       To provide a proper translation into pretty printed values we
>       can do the following:
> 
>        Create a new section for storing such data and have a data
>        structure there which describes the content of the buffer. That
>        section goes into a seperate file and not linked into the
>        kernel binary. Simple enough for tools to pick up and for bug
>        reporters to use/provide. If the stupid file is not available
>        we still can recreate it from source and translate the hex
>        dump. And in the most cases the pure hexdump will be sufficient
>        for the people who need actually to look at this.
> 
>    2) Proper trace point support so we can actually track allocation
>       and the hardware access at the various domain levels because
>       some of these issues cannot be decoded by looking at a state
>       snapshot in debugfs. With some of them we even can't access
>       debugfs at all.
> 
>       Though one issue with that is, that for the early boot process
>       there is no way to store that information as the tracer gets
>       enabled way after init_IRQ(). But there is no reason why the
>       tracer could not be enabled before that. All it needs is a
>       working memory allocator. Steven?
> 
>       Now there is another class of problems which might be hard to
>       debug. When the machine just boots into a hang, so we dont get a
>       ftrace output neither from an oops nor from a console. It would
>       be nice if we could have a command line option which prints
>       enabled trace points via (early_)printk. That would avoid
>       sending out ad hoc printk debug patches which will basically
>       provide the same information as the trace_points. That would be
>       useful for other hard to debug boot hangs as well. Steven?
>       
>       I think the above can be solved, so we need to agree on a proper
>       set of tracepoints. I came up with the following list:
> 
>       - trace_irqdomain_create(domain->id, domain->name, ...)
>       - trace_irqdomain_destroy(domain->id)
>       
>       - trace_irqdomain_alloc(irq_data)
> 
>       	struct irq_data contains all relevant information for
>       	assigning the tracepoint data.
> 	
> 		__entry->virq		= irq_data->virq;
> 		__entry->domainid	= irq_data->domain;
> 		__entry->hwirq		= irq_data->hwirq;
> 		TP_STORE_DATA(__entry->data, irq_data);
> 
> 	Where TP_STORE_DATA checks for the above callback and uses it
> 	if available, otherwise we just clear the data field.
>       
>         So this reuses the callback which we want for debugfs
>         anyway. The print format is just hexdump. See my above
>         rationale for that.
> 
>       - trace_irqdomain_free(virq, domain->id)
> 	  
>       - trace_irqdomain_hw_access(irqdata)
> 
> 	Same "data" and pretty printing argument as for
> 	trace_irqdomain_alloc()
> 
> 	The obvious place to put such a trace point is
> 	e.g. irq_chip_write_msi_msg() where the callback records the
> 	currently written msi msg.
> 
> Once we have sorted that, I'll push x86/apic into a seperate git
> repository so the history is preserved.
> 
> After that I'll redo x86/apic from scratch with proper ordering and
> all fixes folded to the right places so the whole thing becomes
> bisectable.
> 
> Thoughts?
This really sounds a good idea to debug interrupt.

So I will work on following items for 3.20:
1) Continue to convert PCI MSI code into generic MSI code
as much as possible.
2) Simplify interrupt remapping initialization on x86, the first
version has been posted at: https://lkml.org/lkml/2014/12/10/20.
3) Solve new bugs if any:)
Thanks!
Gerry

> 
> Thanks,
> 
> 	Thomas
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/