linux-kernel - Re: [patch 19/33] genirq/msi: Provide msi_desc::msi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87cz93s464.ffs@tglx>
Date:   Thu, 01 Dec 2022 13:24:03 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
        linux-pci@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Marc Zyngier <maz@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Dave Jiang <dave.jiang@...el.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Kevin Tian <kevin.tian@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Logan Gunthorpe <logang@...tatee.com>,
        Ashok Raj <ashok.raj@...el.com>, Jon Mason <jdmason@...zu.us>,
        Allen Hubbe <allenbh@...il.com>,
        "Ahmed S. Darwish" <darwi@...utronix.de>,
        Reinette Chatre <reinette.chatre@...el.com>
Subject: Re: [patch 19/33] genirq/msi: Provide msi_desc::msi_data

Jason!

On Wed, Nov 23 2022 at 19:38, Thomas Gleixner wrote:
> On Wed, Nov 23 2022 at 12:58, Jason Gunthorpe wrote:
>> I find your perspective on driver authors as the enemy quite
>> interesting :)
>
> I'm not seeing them as enemies. Just my expectations are rather low by
> now :)

This made me think about it for a while. Let me follow up on that.

When I set out to add real-time capabilities to the kernel about 20 years
ago, I did a thorough analysis of the kernel design and code base.

It turned out that aside of well encapsulated infrastructure, e.g. mm,
vfs, scheduler, network core, quite some of the rest was consisting of
blatant layering violations held together with duct tape, super glue and
haywire-circuit.

It was immediately clear to me, that this needs a lot of consolidation
and cleanup work to get me even close to the point where RT becomes
feasible as an integral part of the kernel. But not only this became
clear, I also realized that a continuation of this model will end up in
a maintenance nightmare sooner than later.

Me and the other people interested in RT estimated back then that it'll
take 5-10 years to get this done.

Boy, we were young and naive back then and completely underestimating
the efforts required. Obviously we were also underestimating the
concurrent influx of new stuff.

Just to give you an example. Our early experiments with substituting
spinlocks was just the start of the horrors. Instead of working on the
actual substitution mechanisms and the required other modifications, we
spent a vast amount of our time chasing dead locks all over the place.
My main test machine had not a single device driver which was correct
and working out of the box. What's worse is that we had to debate with
some of the driver people about the correctness of our locking analysis
and fight for stuff getting fixed.

This ended in writing and integrating lockdep, which has thankfully
taken this burden of our plate.

When I started to look into interrupt handling to add support for
threaded interrupts, which are a fundamental prerequisite for RT, the
next nightmare started to unfold.

The "generic" core code was a skeleton and everything real was
implemented in architecture specific code in completely incompatible
ways. It was not even possible to change common data structures without
breaking the world.  What was even worse, drivers fiddled in the
interrupt descriptors just to scratch an itch.

What I learned pretty fast is that most driver writers try to work
around short-comings in common infrastructure instead of tackling the
problem at the root or talking to the developers/maintainers of that
infrastructure.

The consequence of that is: if you want to change core infrastructure
you end up mopping up the driver tree in order not to break things all
over the place. There are clearly better ways to spend your time.

So I started to encapsulate things more strictly - admittedly to make my
own life easier. But at the same time I always tried hard to make these
encapsulations easy to use, to provide common infrastructure in order to
replace boilerplate code and to help with resource management, which is
one of the common problems in driver code. I'm also quite confident that
I carefully listened to the needs of driver developers and I think the
whole discussion about IMS last year is a good example for that. I
surely have opinions, but who doesn't?

So no, I'm not seeing driver writers as enemies. I'm just accepting the
reality that quite some of the drivers are written in "get it out the
door" mode. I'm well aware that there are other folks who stay around for
a long time and do proper engineering and maintenance, but that's sadly
the minority.

Being responsible for core infrastructure is an interesting challenge
especially with the zoo of legacy to keep alive and the knowledge that
you can break the world with a trivial and obviously "correct"
change. Been there, done that. :)

Thanks,

        Thomas