>So I’ve had some issues with duplicate processes running and that kind of confused me. I looked in the code and I’m using Proc::Pid::File to control the processes and exiting out if the old process is still running. I finally found some time to dig into this further late last week. It seems that the logic the module currently has you use is essentially:
if pidAlive
then exit
else
updatePid
end if
The problem here is that there is a race condition between the checking for alive and updating the pid file to take control. What is happening on our system is:
process1: is pidAlive? no.
process2: is pidAlive? no.
process1: update pidFile.
process2: update pidFile.
Now both process1 and process2 are running, the pid file reflects process2 is alive, nothing indicates that process1 is alive, so if the restart or shutdown systems are used, they talk to process2 but process1 keeps on trucking. At this point process3 can check the pid and it’s not alive so it will go ahead and start up perpetuating the problem.
What is really needed would be a single method that will eliminate the race condition:
begin getPid
create pid file if it does not exist
exclusive lock pid file
if pid and pid is running then
result is false
else
result is true
update pid file to our pid
end if
release exclusive lock
return result
end getPid
The problem with this solution? Proc::Pid::File has been around for quite a while and a lot of people are using it. We can’t just change the API like that. We need to find a way to keep the same API, but change the logic to match what I’ve proposed.
I’ll need to give this some more thought on how to work around this without changing the API. Perhaps it’s just continue to support the current API, and add a new method to do both at once but do it cleanly instead of just using it as a wrapper to call both in sequence.