linux-kernel - Re: 2.6.23-rc1-mm1 - seems OK on Dell Latitude D820, except for tpm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20070727110701.be3a8459.akpm@linux-foundation.org>
Date:	Fri, 27 Jul 2007 11:07:01 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Valdis.Kletnieks@...edu
Cc:	Kylene Hall <kjhall@...ibm.com>, tpm@...horst.net,
	linux-kernel@...r.kernel.org,
	Fernando Luis Vázquez Cao 
	<fernando@....ntt.co.jp>
Subject: Re: 2.6.23-rc1-mm1 - seems OK on Dell Latitude D820, except for
 tpm_tis

On Fri, 27 Jul 2007 09:28:09 -0400
Valdis.Kletnieks@...edu wrote:

> On Fri, 27 Jul 2007 00:00:32 EDT, Valdis.Kletnieks@...edu said:
> 
> > Apparently, things go pear-shaped in tis_tpm_send(), when they get to the
> > 'if (chip->vendor.irq)' - under 22-rc6-mm1, we never got into this code,
> > because earlier initialization complained it couldn't get IRQ8.  Now, we
> > get IRQ3, and apparently get into this if statement, and then spend 120
> > seconds while wait_for_stat() times out.  So the root cause does look like
> > it's this IRQ8/IRQ3 issue.
> > 
> > I'll try to find time to do a bisect on -rc1-mm1 tomorrow to track down
> > what exactly did this.
> 
> And we have a winner.  In my bisect 'hunt' file, I ended at:
> 
> fs-use-kmem_cache_zalloc-instead.patch GOOD
> # remove-kconfig-setting-config_debug_shirq.patch: Ingo worried
> remove-kconfig-setting-config_debug_shirq.patch BAD

Thanks for working that out.

> Looks like Ingo was right. :)  As a cross-check, I tested a 'GOOD' kernel,
> but rebuilt with CONFIG_DEBUG_SHIRQ=y, and that *also* died. 
> 
> Looks like the problematic code is in tpm_tis.c tpm_tis_init() near here:
> 
>                 for (i = 3; i < 16 && chip->vendor.irq == 0; i++) {
>                         iowrite8(i, chip->vendor.iobase +
>                                     TPM_INT_VECTOR(chip->vendor.locality));
>                         if (request_irq
>                             (i, tis_int_probe, IRQF_SHARED,
>                              chip->vendor.miscdev.name, chip) != 0) {
>                                 dev_info(chip->dev,
>                                          "Unable to request irq: %d for probe\n"
> ,           
>                                          i);
>                                 continue;
>                         }
> 
> This seems to be misbehaving differently for the two different DEBUG_SHIRQ
> cases.
> 
> With DEBUG_SHIRQ=n, it starts at IRQ3, gets to at least 8 (where it complains
> it can't request it for probing), and possibly all the way to 15, without ever
> actually selecting and assigning an IRQ (to refresh memories, in that range
> /proc/interrupts only lists:
> 
>   8:          0          0   IO-APIC-edge      rtc
>   9:          3          0   IO-APIC-fasteoi   acpi
>  12:         94          0   IO-APIC-edge      i8042
>  14:     148166          0   IO-APIC-edge      libata
>  15:         94          0   IO-APIC-edge      libata
> 
> So there's certainly IRQ's available.  No idea why it doesn't choose one. But
> since it never chose one, it never gets into the "wait for the IRQ" protected
> by 'if (chip->vendor.irq)' at the end of tpm_tis_send.
> 
> With DEBUG_SHIRQ=y, It starts at IRQ3, and assigns it (which seems a good thing).
> Unfortunately, this then hits the timeouts in tpm_tis_send.
> 
> Anybody got an idea what *should* be happening here?
> 
> Just for the record, I see this in /sys:
> 
> % cat /sys/bus/pnp/devices/00:0e/id
> BCM0102
> PNP0c31
> 
> The driver is apparently being selected on the basis of PNP0c31.  One of
> the other ID's it looks for is BCM0101 - but this is a BCM0102.  Is this a
> misidentification, or does the driver need to handle a 0102 differently, or
> is something else odd going on?
> 

Fernando, those patches are just too scary for me because of stuff like
this.

Perhaps we should look at implementing this new behaviour on a per-driver
basis?  Pass smoe new flag into request_irq(), perhaps?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/