lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 9 Jan 2020 08:47:51 +0530
From:   Muni Sekhar <munisekharrms@...il.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > Hi,
> >
> > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > I see that my system freezes without capturing the crash dump for
> > certain tests. I debugged this issue and it was tracked down to the
> > below mentioned interrupt handler code.
> >
> >
> > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > given below.
> >     status = readl(ctrl->reg + INT_STATUS);
> >
> >
> > And then clears the pending interrupts using ‘writel()’ as given blow.
> >         writel(status, ctrl->reg + INT_STATUS);
> >
> >
> > I've noticed a kernel hang if INT_STATUS register read again after
> > clearing the pending interrupts.
> >
> > Can someone clarify me why the kernel hangs without crash dump incase
> > if I read the INT_STATUS register using readl() after clearing the
> > pending bits?
> >
> > Can readl() block?
>
> readl() should not block in software.  Obviously at the hardware CPU
> instruction level, the read instruction has to wait for the result of
> the read.  Since that data is provided by the device, i.e., your FPGA,
> it's possible there's a problem there.

Thank you very much for your reply.
Where can I find the details about what is protocol for reading the
‘memory mapped IO’? Can you point me to any useful links..
I tried locate the exact point of the kernel code where CPU waits for
read instruction as given below.
readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
Do I need to check for the assembly instructions, here?

>
> Can you tell whether the FPGA has received the Memory Read for
> INT_STATUS and sent the completion?

Is there a way to know this with the help of software debugging(either
enabling dynamic debugging or adding new debug prints)? Can you please
point some tools\hw needed to find this?


>
> On the architectures I'm familiar with, if a device doesn't respond,
> something would eventually time out so the CPU doesn't wait forever.

What is timeout here? I mean how long CPU waits for completion? Since
this code runs from interrupt context, does it causes the system to
freeze if timeout is more?

lspci output:
$ lspci
00:00.0 Host bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series SoC Transaction Register (rev 11)
00:02.0 VGA compatible controller: Intel Corporation Atom Processor
Z36xxx/Z37xxx Series Graphics & Display (rev 11)
00:13.0 SATA controller: Intel Corporation Atom Processor E3800 Series
SATA AHCI Controller (rev 11)
00:14.0 USB controller: Intel Corporation Atom Processor
Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI (rev 11)
00:1a.0 Encryption controller: Intel Corporation Atom Processor
Z36xxx/Z37xxx Series Trusted Execution Engine (rev 11)
00:1b.0 Audio device: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series High Definition Audio Controller (rev 11)
00:1c.0 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
Express Root Port 1 (rev 11)
00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
Express Root Port 3 (rev 11)
00:1c.3 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
Express Root Port 4 (rev 11)
00:1d.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series USB EHCI (rev 11)
00:1f.0 ISA bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series Power Control Unit (rev 11)
00:1f.3 SMBus: Intel Corporation Atom Processor E3800 Series SMBus
Controller (rev 11)
01:00.0 RAM memory: PLDA Device 5555
03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
Connection (rev 03)

>
> > Snippet of the ISR code is given blow:
> >
> > https://pastebin.com/WdnZJZF5
> >
> >
> >
> > static irqreturn_t pcie_isr(int irq, void *dev_id)
> >
> > {
> >
> >         struct test_device *ctrl = data;
> >
> >         u32 status;
> >
> > …
> >
> >
> >
> >         status = readl(ctrl->reg + INT_STATUS);
> >
> >         /*
> >
> >          * Check to see if it was our interrupt
> >
> >          */
> >
> >         if (!(status & 0x000C))
> >
> >                 return IRQ_NONE;
> >
> >
> >
> >         /* Clear the interrupt */
> >
> >         writel(status, ctrl->reg + INT_STATUS);
> >
> >
> >
> >         if (status & 0x0004) {
> >
> >                 /*
> >
> >                  * Tx interrupt pending.
> >
> >                  */
> >
> >                  ....
> >
> >        }
> >
> >
> >
> >         if (status & 0x0008) {
> >
> >                 /* Rx interrupt Pending */
> >
> >                 /* The system freezes if I read again the INT_STATUS
> > register as given below */
> >
> >                 status = readl(ctrl->reg + INT_STATUS);
> >
> >                 ....
> >
> >         }
> >
> > ..
> >
> >         return IRQ_HANDLED;
> > }
> >
> >
> >
> > --
> > Thanks,
> > Sekhar



-- 
Thanks,
Sekhar

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ