[rt-users] Can't fork at Mailer.pm
Martin Drasar
drasar at ics.muni.cz
Mon Oct 8 05:54:54 EDT 2012
On 8.10.2012 11:29, Tim Cutts wrote:
> It's only for a split second, and the memory isn't used. UNIX doesn't just start an application in one go, it uses two steps:
>
> 1) fork
>
> This duplicates the current process, and both processes continue executing.
> UNIX does this with so-called copy-on-write, so the pages aren't actually duplicated until they're modified, but nevertheless, the potential exists for both copies to be completely required, so if you have vm.overcommit_memory set to 2, you need all that virtual memory to be available just in case.
>
> 2) exec
>
> the newly forked process usually immediately executes the exec() system call, which replaces the current process' virtual memory image with the desired program. At this point, the virtual memory requirements go back down again (assuming the new process is something small, which in the case of sendmail, it is).
>
> Apologies if there are errors and oversimplifications in the above, I'm not exactly a total UNIX beardy type.
>
> We've seen this bite us on some of our HPC clusters at work - there, we do have vm.overcommit_memory set to 2, because we want programs allocating too much memory to die immediately, without risk of the kernel's out of memory killer zapping the wrong thing. A common problem on such machines is very large computational jobs trying to start sub-processes - as soon as they do, their virtual memory requirement doubles and the fork() fails with an out of memory error.
>
> It just seemed to me that it might be the case with your system. Try the output of:
>
> cat /proc/sys/vm/overcommit_memory
>
> if it's 2, then this is probably your problem.
> Tim
I have checked the overcommit settings and it is zero. It seems that the
problem was purely in RT and postgres running with mostly default
settings on one machine with not enough memory. It was good enough for
RT 3.8.7, but it is not for 4.0.6. Although now I remember a few times
when processes were killed because of insufficient memory back then. But
it happened so little that we did not pay attention. I will not be
making the same mistake again...
I will ask some greybeard to check the settings for me or throw memory
at it, until the problem goes away :-)
Anyway, thanks a lot for your help.
Martin
More information about the rt-users
mailing list