lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHp75Vc6aPAqG7gDobjOaiSRqw8MNAG7pS3o-ze6ejXXF7AFEg@mail.gmail.com>
Date:	Tue, 16 Oct 2012 12:35:57 +0300
From:	Andy Shevchenko <andy.shevchenko@...il.com>
To:	viresh kumar <viresh.kumar@...aro.org>
Cc:	Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
	Vinod Koul <vinod.koul@...el.com>,
	linux-kernel@...r.kernel.org, spear-devel <spear-devel@...t.st.com>
Subject: Re: [PATCH] dmatest: terminate all ongoing transfers before
 submitting new one

On Tue, Oct 16, 2012 at 11:56 AM, viresh kumar <viresh.kumar@...aro.org> wrote:
> On Tue, Oct 16, 2012 at 2:15 PM, Andy Shevchenko
> <andriy.shevchenko@...ux.intel.com> wrote:
>> The following error messages come if we have software LLP emulation enabled and
>> enough threads running.
>>
>> modprobe dmatest iterations=40
>> [  168.048601] dmatest: Started 1 threads using dma0chan0
>> [  168.054546] dmatest: Started 1 threads using dma0chan1
>> [  168.060441] dmatest: Started 1 threads using dma0chan2
>> [  168.066333] dmatest: Started 1 threads using dma0chan3
>> [  168.072250] dmatest: Started 1 threads using dma0chan4
>> [  168.078144] dmatest: Started 1 threads using dma0chan5
>> [  168.084057] dmatest: Started 1 threads using dma0chan6
>> [  168.089948] dmatest: Started 1 threads using dma0chan7
>> [  170.032962] dma0chan1-copy0: terminating after 40 tests, 0 failures (status 0)
>> [  170.041274] dma0chan0-copy0: terminating after 40 tests, 0 failures (status 0)
>> [  170.597559] dma0chan2-copy0: terminating after 40 tests, 0 failures (status 0)
>> [  171.085059] dma0chan7-copy0: #0: test timed out
>> [  171.839710] dma0chan3-copy0: terminating after 40 tests, 0 failures (status 0)
>> [  172.146071] dma0chan4-copy0: terminating after 40 tests, 0 failures (status 0)
>> [  172.220802] dma0chan7-copy0: #1: got completion callback, but status is 'in progress'
>> [  172.242049] dma0chan7-copy0: #2: got completion callback, but status is 'in progress'
>> [  172.281063] dma0chan7-copy0: #3: got completion callback, but status is 'in progress'
>> [  172.400866] dma0chan7-copy0: #4: got completion callback, but status is 'in progress'
>> [  172.471799] dma0chan7-copy0: #5: got completion callback, but status is 'in progress'
>> [  172.613996] dma0chan7-copy0: #6: got completion callback, but status is 'in progress'
>> [  172.670286] dma0chan7-copy0: #7: got completion callback, but status is 'in progress'
>> [  172.750763] dma0chan7-copy0: #8: got completion callback, but status is 'in progress'
>> [  172.777452] dma0chan5-copy0: terminating after 40 tests, 0 failures (status 0)
>> [  172.788740] dma0chan7-copy0: #9: got completion callback, but status is 'in progress'
>> [  172.845156] dma0chan7-copy0: #10: got completion callback, but status is 'in progress'
>> [  172.906593] dma0chan7-copy0: #11: got completion callback, but status is 'in progress'
>> [  173.181515] dma0chan6-copy0: terminating after 40 tests, 0 failures (status 0)
>> [  173.512838] dma0chan7-copy0: terminating after 40 tests, 12 failures (status 0)
>>
>> The patch fixes dmatest module to stop any ongoing transfer before submitting
>> new one. Perhaps there is a better solution and driver logic needs to be fixed
>> as well.
>>
>> After patch we will have
>>
>> modprobe dmatest iterations=50
>> [   84.027375] dmatest: Started 1 threads using dma0chan0
>> [   84.033282] dmatest: Started 1 threads using dma0chan1
>> [   84.039182] dmatest: Started 1 threads using dma0chan2
>> [   84.045089] dmatest: Started 1 threads using dma0chan3
>> [   84.051003] dmatest: Started 1 threads using dma0chan4
>> [   84.056916] dmatest: Started 1 threads using dma0chan5
>> [   84.062828] dmatest: Started 1 threads using dma0chan6
>> [   84.068714] dmatest: Started 1 threads using dma0chan7
>> [   86.538284] dma0chan0-copy0: terminating after 50 tests, 0 failures (status 0)
>> [   86.842221] dma0chan1-copy0: terminating after 50 tests, 0 failures (status 0)
>> [   87.060460] dma0chan6-copy0: #0: test timed out
>> [   87.065614] dma0chan7-copy0: #0: test timed out
>> [   87.220321] dma0chan2-copy0: terminating after 50 tests, 0 failures (status 0)
>> [   88.595061] dma0chan3-copy0: terminating after 50 tests, 0 failures (status 0)
>> [   89.152170] dma0chan4-copy0: terminating after 50 tests, 0 failures (status 0)
>> [   89.955059] dma0chan5-copy0: terminating after 50 tests, 0 failures (status 0)
>> [   90.697073] dma0chan6-copy0: terminating after 50 tests, 1 failures (status 0)
>> [   90.893422] dma0chan7-copy0: terminating after 50 tests, 1 failures (status 0)
>
> You still have failures. :(
Sure, the point is we have no 'in progress' issues

> Can you try with a large timeout value for the module.
I tried and the failures were gone.

> We must get to the root cause of these failures. There may be something more
> serious which is getting hidden due to this call to terminate().
My understanding is that. The software LLP emulation runs several
transactions per active descriptor. Because of a huge load of the
CPU/DMA some transactions are not done within given timeout. The
dmatest supplies next block to transfer without doing anything for
previous one. Under some circumstances the new transfer is queued, and
immediately after this the callback function is called for _previous_
transfer. The check condition doesn't recognize which transfer called
the callback function.

Rough solution is proposed by current patch. Another solution is to
mark each transfer with id and check done flag and transfer id
together.

> Unless there is a issue with software emulation of LLP, the only difference with
> s/w emulation is the transfers become slow.
Yep.

> Also, the proposed solution might hide some other important errors. We may need
> to terminate transfers when we found that an error is there in  last transfers:
I think it could be better than first solution, but what do you think
about marking each transfer with corresponding id?


-- 
With Best Regards,
Andy Shevchenko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ