linux-kernel - Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210622094000.00004f7e@163.com>
Date:   Tue, 22 Jun 2021 09:40:00 +0800
From:   Chunyou Tang <tangchunyou@....com>
To:     Steven Price <steven.price@....com>
Cc:     tomeu.vizoso@...labora.com, airlied@...ux.ie,
        linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
        alyssa.rosenzweig@...labora.com,
        ChunyouTang <tangchunyou@...becorp.cn>
Subject: Re: [PATCH v2] drm/panfrost:report the full raw fault information
 instead

Hi Steve,
	I will send a new patch with suitable subject/commit message.
But I send a V3 or a new patch?

	I met a bug about the GPU,I have no idea about how to fix it,
If you can give me some suggestion,it is perfect.

You can see such kernel log:

Jun 20 10:20:13 icube kernel: [  774.566760] mvp_gpu 0000:05:00.0: GPU
Fault 0x00000088 (SHAREABILITY_FAULT) at 0x000000000310fd00 Jun 20
10:20:13 icube kernel: [  774.566764] mvp_gpu 0000:05:00.0: There were
multiple GPU faults - some have not been reported Jun 20 10:20:13 icube
kernel: [  774.667542] mvp_gpu 0000:05:00.0: AS_ACTIVE bit stuck Jun 20
10:20:13 icube kernel: [  774.767900] mvp_gpu 0000:05:00.0: AS_ACTIVE
bit stuck Jun 20 10:20:13 icube kernel: [  774.868546] mvp_gpu
0000:05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel:
[  774.968910] mvp_gpu 0000:05:00.0: AS_ACTIVE bit stuck Jun 20
10:20:13 icube kernel: [  775.069251] mvp_gpu 0000:05:00.0: AS_ACTIVE
bit stuck Jun 20 10:20:22 icube kernel: [  783.693971] mvp_gpu
0000:05:00.0: gpu sched timeout, js=1, config=0x7300, status=0x8,
head=0x362c900, tail=0x362c100, sched_job=000000003252fb84

In
https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.clem@gmail.com/
there had a same bug like mine,and I found you at the mail list,I don't
know how it fixed?

I need your help!

thinks very much!

Chunyou

于 Mon, 21 Jun 2021 11:45:20 +0100
Steven Price <steven.price@....com> 写道:

> On 19/06/2021 04:18, Chunyou Tang wrote:
> > Hi Steve,
> > 	1,Now I know how to write the subject
> > 	2,the low 8 bits is the exception type in spec.
> > 
> > and you can see prnfrost_exception_name()
> > 
> > switch (exception_code) {
> >                 /* Non-Fault Status code */
> > case 0x00: return "NOT_STARTED/IDLE/OK";
> > case 0x01: return "DONE";
> > case 0x02: return "INTERRUPTED";
> > case 0x03: return "STOPPED";
> > case 0x04: return "TERMINATED";
> > case 0x08: return "ACTIVE";
> > ........
> > ........
> > case 0xD8: return "ACCESS_FLAG";
> > case 0xD9 ... 0xDF: return "ACCESS_FLAG";
> > case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
> > case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
> > }
> > return "UNKNOWN";
> > }
> > 
> > the exception_code in case is only 8 bits,so if fault_status
> > in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct
> > exception reason,it will be always UNKNOWN.
> 
> Yes, I'm happy with the change - I just need a patch that I can apply.
> At the moment this patch only changes the first '0x%08x' output rather
> than the call to panfrost_exception_name() as well. So we just need a
> patch which does:
> 
> - fault_status & 0xFF, panfrost_exception_name(pfdev, fault_status),
> + fault_status, panfrost_exception_name(pfdev, fault_status & 0xFF),
> 
> along with a suitable subject/commit message describing the change. If
> you can send me that I can apply it.
> 
> Thanks,
> 
> Steve
> 
> PS. Sorry for going round in circles here - I'm trying to help you get
> setup so you'll be able to contribute patches easily in future. An
> important part of that is ensuring you can send a properly formatted
> patch to the list.
> 
> PPS. I'm still not receiving your emails directly. I don't think it's
> a problem at my end because I'm receiving other emails, but if you can
> somehow fix the problem you're likely to receive a faster response.
> 
> > 于 Fri, 18 Jun 2021 13:43:24 +0100
> > Steven Price <steven.price@....com> 写道:
> > 
> >> On 17/06/2021 07:20, ChunyouTang wrote:
> >>> From: ChunyouTang <tangchunyou@...becorp.cn>
> >>>
> >>> of the low 8 bits.
> >>
> >> Please don't split the subject like this. The first line of the
> >> commit should be a (very short) summary of the patch. Then a blank
> >> line and then a longer description of what the purpose of the
> >> patch is and why it's needed.
> >>
> >> Also you previously had this as part of a series (the first part
> >> adding the "& 0xFF" in the panfrost_exception_name() call). I'm not
> >> sure we need two patches for the single line, but as it stands this
> >> patch doesn't apply.
> >>
> >> Also I'm still not receiving any emails from you directly (only via
> >> the list), so it's possible I might have missed something you sent.
> >>
> >> Steve
> >>
> >>>
> >>> Signed-off-by: ChunyouTang <tangchunyou@...becorp.cn>
> >>> ---
> >>>  drivers/gpu/drm/panfrost/panfrost_gpu.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> >>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c index
> >>> 1fffb6a0b24f..d2d287bbf4e7 100644 ---
> >>> a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++
> >>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -33,7 +33,7 @@ static
> >>> irqreturn_t panfrost_gpu_irq_handler(int irq, void *data) address
> >>> |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO); 
> >>>  		dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at
> >>> 0x%016llx\n",
> >>> -			 fault_status & 0xFF,
> >>> panfrost_exception_name(pfdev, fault_status & 0xFF),
> >>> +			 fault_status,
> >>> panfrost_exception_name(pfdev, fault_status & 0xFF), address);
> >>>  
> >>>  		if (state & GPU_IRQ_MULTIPLE_FAULT)
> >>>
> > 
> >