Question
Our TWiki was becoming unresponsive and almost every other request would hang. Save operations were taking minutes to complete. After some investigation, we found that the cgi-bin scripts were being timed out and killed. The view and save process were running at 100% CPU but not completing.
I traced the processes using strace and found them to be spinning in an endless loop desperately trying to unlink the cgisess_* files and getting an EPERM error over and over again.
We had over 46,000 session files in /tmp, each owned by root. The TWiki processes are configured to run as nobody so we are a bit confused as to why these files are owned as root. In addition, we weren't running tick_twiki.pl because we have {Sessions}{ExpireAfter} set to the default: 21600.
What appeared to be happening was that the session files were "old" and the ids were being reused. For example, the session files that were trying to be deleted in save and view were over 3 months old. It's possible that something else is in play, but once we removed all of the old /tmp files, things are working again.
Actually, I believe the problem was not that the ids were being reused, but there were just way too many of them. I'm guessing that we were spending minutes trying to remove them in the Client::expireDeadSessions function called at the end of every call to Client::finish. However, I can't account for why some posts/gets fail and timeout and others don't.
I've been monitoring things since and have noticed that more cgisess_* files owned by root are accumulating at an alarming pace (over 400 in just a few hours). If we have the TWiki configured to run cgi files as nobody, then why would TWiki's cgi session files be created as root and how can we configure it so that this is not the case any longer.
In the meanwhile, we have cron whacking these /tmp files regularly so we are up and running again. But it would seem that there is something not right in either the configuration or a bug in TWiki (no errors or warnings show up in the configure page).
Environment
--
BrianVetter - 11 Jan 2007
Answer
If you answer a question - or someone answered one of your questions - please remember to edit the page and set the status to answered. The status selector is below the edit box.
The problem is not with running TWiki through Apache, but when the tools are called directly from the command line or cron. In this particular case, the mailnotify script is being called every three minutes from root's crontab. mailnotify uses the contributed Mailer module which indirectly creates a CGI session when it calls new TWiki.
We'll run the notifier as nobody and see how things work.
--
BrianVetter - 11 Jan 2007