linux-kernel - bug in RLIMIT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <f9eabcda0704281500l58c3b791s97c499822b577140@mail.gmail.com>
Date:	Sat, 28 Apr 2007 19:00:14 -0300
From:	"Miguel Freitas" <mfreitas@...il.com>
To:	linux-kernel@...r.kernel.org
Cc:	"Roland McGrath" <roland@...hat.com>
Subject: bug in RLIMIT_SIGPENDING

summary: there seems to be a bug in RLIMIT_SIGPENDING accounting that
can cause it to go negative. associated with this fact, the given
process may get stuck forever trying to enter a 'clone' syscall.

long version:

- several people have experienced this problem of Xorg hanging forever
(100% cpu usage) trying to enter the 'clone' syscall to execute
xkbcomp.

- the syscall is aborted with ERESTARTNOINTR because there is a
SIGALRM signal pending. status shows:

SigQ:   1/18446744073709551615
SigPnd: 0000000000000000
ShdPnd: 0000000000002000
SigBlk: 0000000000000000
SigIgn: 0000000000301000
SigCgt: 0000000061c06ecb

note the weird SigQ value, is 64 bits' -1 for RLIMIT_SIGPENDING.

- the signal handler is executed (as confirmed under gdb).

- kernel will then force reentering the syscall by means of the
following code in handle_signal():

 case -ERESTARTNOINTR:
 regs->rax = regs->orig_rax;
 regs->rip -= 2;
 break;

- this effectively puts user space in a kind of spinlock that never ends.

- the code that sets signal handler is quoted here from Xorg gitweb:

1529 #define SMART_SCHEDULE_SIGNAL           SIGALRM
(...)
1588     bzero ((char *) &act, sizeof(struct sigaction));
1589
1590     /* Set up the timer signal function */
1591     act.sa_handler = SmartScheduleTimer;
1592     sigemptyset (&act.sa_mask);
1593     sigaddset (&act.sa_mask, SMART_SCHEDULE_SIGNAL);
1594     if (sigaction (SMART_SCHEDULE_SIGNAL, &act, 0) < 0)
1595     {
1596         perror ("sigaction for smart scheduler");
1597         return FALSE;
1598     }

- the code that sets the timer is quoted here from Xorg gitweb:

1548 Bool
1549 SmartScheduleStartTimer (void)
1550 {
1551 #ifdef SMART_SCHEDULE_POSSIBLE
1552     struct itimerval    timer;
1553
1554     SmartScheduleTimerStopped = FALSE;
1555     timer.it_interval.tv_sec = 0;
1556     timer.it_interval.tv_usec = SmartScheduleInterval * 1000;
1557     timer.it_value.tv_sec = 0;
1558     timer.it_value.tv_usec = SmartScheduleInterval * 1000;
1559     return setitimer (ITIMER_REAL, &timer, 0) >= 0;
1560 #endif
1561     return FALSE;
1562 }

- having this negative rlimit may cause problem to the
__sigqueue_alloc() kernel function. however, as far as i can see, this
would possibly prevent new signals from being enqueued - not existing
ones from being dequeued/cleared/whatever.

- bugzilla entry for the complete investigation can be seen here:

https://bugs.freedesktop.org/show_bug.cgi?id=10525

thanks,

Miguel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/