The 'pututline' utmp entry insertion function can corrupt the utmp file with duplicate entries when the system is busy. Root cause: If the utmp entry already exists and the timeout for the lock (re:LOCK_FILE) in internal_getur_r() expires internal_getur_r() returns -1 to pututline_file(). The pututline_file() function incorrectly interprets the -1 return value as "entry not found" rather than "lock timed out" AND then incorrectly appends a duplicate entry to the utmp file. This can happen whenever there are simultaneous pututline executions and the system is under a high enough load to cause the lock to time out. The program logic needs to be repaired to assure that a lock timeout will never cause a duplicate entry to be added. Perhaps a backoff algorithm could be used to retry the lock or the lock timeout could be increased as well to accommodate systems under greater stress. The problem was identified on an x86 machine. The macro (FILE_LOCK) expanded version of internal_getur_r follows: internal_getut_r (const struct utmp *id, struct utmp *buffer) { int result = -1; { struct flock fl; struct sigaction action, old_action; unsigned int old_timeout; old_timeout = alarm (0); action.__sigaction_handler.sa_handler = timeout_handler; (__builtin_memset (&action.sa_mask, '\0', sizeof (sigset_t)), 0); action.sa_flags = 0; __sigaction (14, &action, &old_action); alarm (1); memset (&fl, '\0', sizeof (struct flock)); fl.l_type = (0); fl.l_whence = 0; if (__fcntl_nocancel ((file_fd), 7, &fl) < 0) goto unalarm_return; if (id->ut_type == 1 || id->ut_type == 2 || id->ut_type == 4 || id->ut_type == 3) { while (1) { if (__read_nocancel (file_fd, buffer, sizeof (struct utmp)) != sizeof (struct utmp)) { (__libc_errno = (3)); file_offset = -1l; goto unlock_return; } file_offset += sizeof (struct utmp); if (id->ut_type == buffer->ut_type) break; } } else { while (1) { if (__read_nocancel (file_fd, buffer, sizeof (struct utmp)) != sizeof (struct utmp)) { (__libc_errno = (3)); file_offset = -1l; goto unlock_return; } file_offset += sizeof (struct utmp); if (__utmp_equal (buffer, id)) break; } } result = 0; unlock_return: fl.l_type = 2; __fcntl_nocancel ((file_fd), 7, &fl); unalarm_return:alarm (0); __sigaction (14, &old_action, ((void *) 0)); if (old_timeout != 0) alarm (old_timeout); } while (0); return result; }
Created attachment 2800 [details] pututline() currupts the utmp file on fcntl() lock time out In some scenarios pututline (3) may corrupt the utmp like with heavy system load environment and using pututline frequently. The failure analysis is in login/utmp_file.c internal_getut_r() function is not retaining the lock error because of fcntl() failure so, pututline() is failing to differentiate between utline not exists and utmp lock time out. As its returning -1 in both cases. In both cases it treats as utline not found it appends at the last. Copy /var/run/utmp to current directory where you are executing tests. who o/p Before Executing tests ============================== $ who ./utmp chill tty1 May 15 18:53 halesh pts/2 Jun 20 12:28 halesh pts/4 Jun 20 13:00 who o/p After executing tests ============================= $ who ./utmp chill tty1 May 15 18:53 halesh pts/2 Jun 20 12:28 halesh pts/4 Jun 20 13:00 chill tty1 May 15 18:53 <- ** THE FIRST LOGIN HAS BEEN APPENDED. **utmp got currupted becuase of adding firstlogin entry again at the last instead of replacing it. If you are not able to reproduce please increase the LOOP macro in testcase.
Created attachment 2802 [details] Patch for the pututline() for fcntl() lock time out After applaying the attached patch.. who o/p Before Executing tests ============================== $who ./utmp chill tty1 May 15 18:53 halesh pts/4 Jun 20 13:00 (43.88.101.161) who o/p After executing tests ============================= $who ./utmp chill tty1 May 15 18:53 halesh pts/4 Jun 20 13:00 I have tested regression tests for adding new entry and replacing entries using pututline() with patch and wroking fine.
(In reply to comment #2) > Created an attachment (id=2802) > Patch for the pututline() for fcntl() lock time out > > > After applaying the attached patch.. > > who o/p Before Executing tests > ============================== > $who ./utmp > chill tty1 May 15 18:53 > halesh pts/4 Jun 20 13:00 (43.88.101.161) > > > who o/p After executing tests > ============================= > $who ./utmp > chill tty1 May 15 18:53 > halesh pts/4 Jun 20 13:00 Last part of who(1) o/p of second entry got missed while copying in text box... who o/p After Executing tests ============================== $who ./utmp chill tty1 May 15 18:53 halesh pts/4 Jun 20 13:00 (43.88.101.161) > > I have tested regression tests for adding new entry and replacing entries using > pututline() with patch and wroking fine.
I checked in a slightly modified patch.
Thanks.