linux-kernel - Re: linux-next requirements

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201002281322.05213.rjw@sisk.pl>
Date:	Sun, 28 Feb 2010 13:22:05 +0100
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Stephen Rothwell <sfr@...b.auug.org.au>, mingo@...hat.com,
	hpa@...or.com, linux-kernel@...r.kernel.org, roland@...hat.com,
	suresh.b.siddha@...el.com, tglx@...utronix.de, hjl.tools@...il.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus <torvalds@...ux-foundation.org>
Subject: Re: linux-next requirements

On Sunday 28 February 2010, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@...k.pl> wrote:
> 
> > On Saturday 27 February 2010, Ingo Molnar wrote:
> > > 
> > > * Rafael J. Wysocki <rjw@...k.pl> wrote:
> > > 
> > > > > > Lets see.  Over the last 60 days, I have reported 37 build errors.  Of 
> > > > > > these, 16 were reported against x86, 14 against ppc, 7 against other 
> > > > > > archs.
> > > > > 
> > > > > So only 43% of them were even relevant on the platform that 95+% of the 
> > > > > Linux testers use? Seems to support the points i made.
> > > > 
> > > > Well, I hope you don't mean that because the majority of bug reporters (vs 
> > > > testers, the number of whom is unknown to me at least) use x86, we are free 
> > > > to break the other architectures. ;-)
> > > 
> > > It means exactly that: just like we 'can' break compilation with gcc296, 
> > > ancient versions of binutils, odd bootloaders, can break the boot via odd 
> > > hardware, etc. When someone uses that architectures then the 'easy' 
> > > bugfixes will actually flow in very quickly and without much fuss
> > 
> > Then I don't understand what the problem with getting them in at the 
> > linux-next stage is.  They are necessary anyway, so we'll need to add them 
> > sooner or later and IMO the sooner the better.
> 
> The problem is the dynamics and resulting (non-)cleanliness of code. We have 
> architectures that have been conceptually broken for 5 years or more, but 
> still those problems get blamed on the last change that 'causes' the breakage: 
> the core kernel and the developers who try to make a difference.
> 
> I think your perspective and your opinion is correct, while my perspective is 
> real and correct as well - there's no contradiction really. Let me try to 
> explain how i see it:
> 
> You are working in a relatively well-designed piece of code which interfaces 
> to the kernel in sane ways - kernel/power/* et al. You might break the 
> cross-builds sometimes, but it's not very common, and in those cases it's 
> usually your own fault and you are grateful for linux-next to have caught that 
> stupidity. (i hope this a fair summary!)

Fair enough.

> I am not criticising that aspect of linux-next _at all_ - it's useful and 
> beneficial - and i'd like to thank Stephen for all his hard work. Other 
> aspects of linux-next useful as well: such as the patch conflict mediation 
> role.

Great.

> But as it happens so often, people tend to talk more about the things that are 
> not so rosy, not about the things that work well.
> 
> The area i am worried about are new core kernel facilities and their 
> development and extension of existing facilities. _Those_ facilities are 
> affected by 'many architectures' in a different way from how you experience 
> it: often we can do very correct changes to them, which still 'break' on some 
> architecture due to _that architecture's conceptual fault_.
> 
> Let me give you an example that happened just yesterday. My cross-testing 
> found that a change in the tracing infrastructure code broke m32r and parisc.
> 
> The breakage:
> 
>  /home/mingo/tip/kernel/trace/trace_clock.c:86: error: implicit declaration of function 'raw_local_irq_save'
>  /home/mingo/tip/kernel/trace/trace_clock.c:112: error: implicit declaration of function 'raw_local_irq_restore'
>  make[3]: *** [kernel/trace/trace_clock.o] Error 1
>  make[3]: *** Waiting for unfinished jobs....
> 
> Is was 'caused by':
> 
>  18b4a4d: oprofile: remove tracing build dependency
> 
> In linux-next this would be pinned to commit 18b4a4d, which would have to be 
> reverted/fixed.
> 
> Where does the _real_ blame lie? Clearly in the M32R and HP/PARISC code: why 
> dont they, four years after it has been introduced as a core kernel facility 
> in 2006, _still_ not support raw_local_irq_save()?

OK, I see your point.

> ( A similar situation occured in this very thread a well - before the subject 
>   of the thread - so it's a real and present problem. We didnt even get _any_ 
>   reaction about that particular breakage from the affected architecture ... )
> 
> These situations are magnified by how certain linux-next bugs are reported: 
> the 'blame' is put on the new commit that exposes that laggy nature of certain 
> architectures. Often the developers even believe this false notion and feel 
> guilty for 'having broken' an architecture - often an architecture that has 
> not contributed a single core kernel facility _in its whole existence_.
> 
> The usual end result is that the path of least resistance is taken: the commit 
> is reverted or worked around, while the 'laggy' architecture can continue 
> business as usual and cause more similar bugs and hickups in the future ...
> 
> I.e. there is extra overhead put on clearly 'good' efforts, while 'bad' 
> behavior (parasitic hanging-on, passivity, indifference) is rewarded. 
> Rewarding bad behavior is very clearly harmful to Linux in many regards, and i 
> speak up when i see it.
> 
> So i wish linux-next balanced these things more fairly towards those areas of 
> code that are actually useful: if it ignored build breakages that are due to 
> architectures being lazy - in fact if it required architectures to _help out_ 
> with the development of the kernel.
> 
> The majority of build-bugs i see trigger in cross-builds (90% of which i catch 
> before they get into linux-next) are of this nature, that's why i raised it in 
> such a pointed way. Your (and many other people's) experience will differ - so 
> you might see this as an unjustified criticism.

Thanks a lot for the clarification.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/