linux-kernel - Re: 4.13 on thinkpad x220: oops when writing to SD card

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a1683a2f-1940-a787-645e-fbf63b50e8a8@rock-chips.com>
Date:   Wed, 13 Sep 2017 12:04:39 +0800
From:   Shawn Lin <shawn.lin@...k-chips.com>
To:     Linus Walleij <linus.walleij@...aro.org>
Cc:     shawn.lin@...k-chips.com, Ulf Hansson <ulf.hansson@...aro.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Pavel Machek <pavel@....cz>,
        "linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        Seraphime Kirkovski <kirkseraph@...il.com>
Subject: Re: 4.13 on thinkpad x220: oops when writing to SD card


On 2017/9/12 17:42, Linus Walleij wrote:
> On Fri, Sep 8, 2017 at 4:51 AM, Shawn Lin <shawn.lin@...k-chips.com> wrote:
>> On 2017/9/8 4:02, Linus Walleij wrote:
>>> On Thu, Sep 7, 2017 at 9:18 AM, Ulf Hansson <ulf.hansson@...aro.org>
>>> wrote:
>>>
>>>> Even if this fixes the problem it seems like we are papering over the
>>>> real issue, which earlier fixes also did during the release cycle for
>>>> v4.13.
>>>
>>>
>>> I think this is the real solution to the issue.
>>>
>>>>> Another unrelated issue with mmc_init_request() is that
>>>>> mmc_exit_request()
>>>>> is not called if mmc_init_request() fails, which means
>>>>> mmc_init_request()
>>>>> must free anything it allocates when it fails.
>>>>
>>>>
>>>> Yes, the situations it's just too fragile. We need to fix the behavior
>>>> properly, although I haven't myself been able to investigate exactly
>>>> how yet.
>>>>
>>>> Adding, Linus, perhaps he has some ideas.
>>>
>>>
>>> Maybe we should simply bite the bullet and do what was suggested
>>> by another contributor when I refactored the bounce buffer handling:
>>> simply delete the bounce buffer code and let any remaining (few?)
>>> legacy devices suffer a bit (performancewise) at the gain of way
>>> simpler code?
>>
>> Are you in the same page with what Adrian pointed to?
>>
>> IIUC, the issue is:
>> init_rq_fn will be called each time when trying to create new reuqest
>> from the pre-allocated request_list memeory pool, and exit_rq_fn will is
>> in the corresponding routine to free request from request_list
>> (blk_free_request) when finished. But if alloc_request_size fails, it
>> won't call exit_rq_fn, so you need to prevent memory leak on your own
>> error path of init_rq_fn.
> 
> Yes and that is what the current patch fixes, is it not?

Nope. It fixed the *out-of-bound* memory usage as request queue will
hold 4 requests by default and only use these four requests when meeting
memory pressure. But the code didn't place the correct bouncesz so that
the 4 pre-allocated requests didn't have valid memory page when
allocated.  But in runtime, normally the request queue asks for new
request from request list by dynamically allocating it. So your
init_rq_fn will be called each time. However the mmc_init_request
didn't handle the error path properly。


I simply add SDHCI_QUIRK_BROKEN_ADMA for my SDHCI to force it use
SDMA so that the host->max_segs should be 1. Then add some a hack to
simulate some random failure for mmc_alloc_sg and then after some iozone
stress test, the memory is exhausted finally.

+static u64 skip = 0;
  static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
  {
         struct scatterlist *sg;
+       u32 random = 0;
+
+       /* Simply skip the bootup stage */
+       if (skip++ >= 0x800)
+               random = get_random_int();
+
+       if (unlikely((random & 0xf) > 0xd)) {
+               printk("mmc_alloc_sg: make fake failure, random =
0x%x\n", random);
+               return NULL;
+       }


[  540.507195] iozone invoked oom-killer: 
gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), 
nodemask=(null),  order=0, oom_score_adj=0
[  540.508302] iozone cpuset=/ mems_allowed=0
[  540.508727] CPU: 2 PID: 3039 Comm: iozone Not tainted 
4.13.0-next-20170912-00003-g01cc0dd5-dirty #110
[  540.509537] Hardware name: Firefly-RK3399 Board (DT)
[  540.509977] Call trace:
[  540.510221] [<ffff20000808b3d0>] dump_backtrace+0x0/0x480
[  540.510707] [<ffff20000808b864>] show_stack+0x14/0x20
[  540.511164] [<ffff200008dfbabc>] dump_stack+0xa4/0xc8
[  540.511621] [<ffff2000081fc744>] dump_header+0xc4/0x2e8
[  540.512091] [<ffff2000081fb888>] oom_kill_process+0x3a8/0x728
[  540.512605] [<ffff2000081fc090>] out_of_memory+0x1a8/0x6d8
[  540.513099] [<ffff200008203ea8>] __alloc_pages_nodemask+0xef8/0xf80
[  540.513660] [<ffff200008275d6c>] alloc_pages_current+0x9c/0x128
[  540.514191] [<ffff200008281f60>] new_slab+0x488/0x5d8
[  540.514645] [<ffff200008284198>] ___slab_alloc+0x378/0x5d0
[  540.515138] [<ffff200008284414>] __slab_alloc.isra.23+0x24/0x38
[  540.515668] [<ffff200008284d44>] kmem_cache_alloc+0x1ec/0x218
[  540.516183] [<ffff20000830dc30>] __blockdev_direct_IO+0x220/0x4a00

> 
>> But you seem to talk about removing the bounce buffer and so finally
>> get rid of registering init_rq_fn/exit_rq_fn?
> 
> That is in direct response to Ulf's statement
> "the situations it's just too fragile" and what can be done about
> that is not make the code even more complex but make it
> simpler and easier to follow.
> 
> One way to achieve that in the long term, is to delete bounce
> buffer handling since it adds overhead and complexity.
> 
>> That is another thing,
>> and what we right now need to do is to fix the pontential memory leak.
>> It's quite a simple action, right? :)
> 
> It is another thing.
> 
> This patch fixes a memory leak, but Ulf is asking how we may
> avoid fragile behaviour.
> 
>>> I am a bit hesitant about that because Pierre Ossman said it was
>>> actually a big win on the SDHC hosts that made use of it at one
>>> point.
>>
>> You had removed packed cmd support to simplify the code, so I think
>> this is another trade-off need to ask: What you want? and keep
>> consistent with the direction you insisted on.
> 
> Packed command could be removed because it was not used by
> any in-tree code. See
> commit 03d640ae1f9b24b1d2a11f747143a1ecc0745019
> 

Ok, I was surprised that no any in-tree host enabled it, but if some
host did, the situation was similar with this one.

> Bounce buffers on the other hand, as this patch shows, is very much
> in use. So if they e.g. improve throughput on these systems
> (mainly laptops I think?) it should definately be kept around.
> 
> It might be a good idea to make a patch to remove the bounce
> buffers and ask people to do before/after throughput tests,
> because I do not have this hardware myself. If it doesn't provide
> any throughput improvements to any system in use, it should
> be deleted. But I don't know that yet.

yeap, sounds good but it's up to Ulf.

> 
> Yours,
> Linus Walleij
> 
> 
>