When using nwchem you eventually will run into an shmmax problem:
******************* ARMCI INFO ************************The application attempted to allocate a shared memory segment of 44498944 bytes in size. This might be in addition to segments that were allocated succesfully previously. The current system configuration does not allow enough shared memory to be allocated to the application.This is most often caused by:1) system parameter SHMMAX (largest shared memory segment) being too small or2) insufficient swap space.Please ask your system administrator to verify if SHMMAX matches the amount of memory needed by your application and the system has sufficient amount of swap space. Most UNIX systems can be easily reconfigured to allow larger shared memory segments,see http://www.emsl.pnl.gov/docs/global/support.htmlIn some cases, the problem might be caused by insufficient swap space.*******************************************************0:allocate: failed to create shared region : -1(rank:0 hostname:boron pid:17222):ARMCI DASSERT fail. shmem.c:armci_allocate():1082 cond:0
I haven't gotten that in a while since I increased shmmax to 6572498432, but running a frequency calculation on a large molecule with unrestricted DFT triggered it again on my 32 GB node.So I hit google.These posts were informative:
http://www.pythian.com/news/245/the-mysterious-world-of-shmmax-and-shmall/
http://padmavyuha.blogspot.com.au/2010/12/configuring-shmmax-and-shmall-for.html
http://yuji.wordpress.com/2011/11/03/what-is-shmmax-shmall-shmmni-shared-memory-max/
me@neon:~$ cat /proc/sys/kernel/shmall2097152me@neon:~$ cat /proc/sys/kernel/shmni4096me@neon:~$ cat /proc/sys/kernel/shmmax6572498432
That works out to (4096 bytes/page*2097152)*(1/(1024*1024*1024) bytes per gigabyte) pages=8.192 GB. And they are the same on all my nodes in spite of the memory available varying.
Another way of looking at it:
ipcs -lm------ Shared Memory Limits --------max number of segments = 4096max seg size (kbytes) = 6418455max total shared memory (kbytes) = 8388608min seg size (bytes) = 1
Your shmmall is the number of pages total, the shmmni is the page size and the shmmax is the largest contigouos chunk of RAM available.
So if I get things right, and parroting what's said on the pages above, your shmmall should approach but not exceed your total physical memory, you shmni is better left alone, and your shmmax can be anywhere up to your total RAM.
The links above cite Oracle recommendations which state that (for 32 bit system) it should be 4 GB - 1 byte OR half your RAM, whichever is smaller. I'll show that case here, but will be testing using 80% of my RAM for my calcs.
So for my boxes:
32 GB RAM => shmmax=16GB, shmmall=(32-2 GB)/4095, shmni=4096
sudo sysctl -w kernel.shmmax=17179869184sudo sysctl -w kernel.shmall=7340032ipcs -lm16 GB RAM => shmmax=8GB, shmmall=(16-2 GB)/4096, shmni=4096------ Shared Memory Limits --------max number of segments = 4096max seg size (kbytes) = 16777216max total shared memory (kbytes) = 29360128min seg size (bytes) = 1
sudo sysctl -w kernel.shmmax=8589934592sudo sysctl -w kernel.shmall=3670016
If you're happy with those values, make them permanent by editing your sysctl.conf and adding the relevant lines:
kernel.shmmax=17179869184
kernel.shmall=7340032
So here are the formulae (assuming that you set shmmax to half your ram and leave 2 gb out of shmall):
shmmax=RAM (bytes)/2shmni=4096shmmall=(RAM(bytes)-2147483648)/shmni
Hiç yorum yok:
Yorum Gönder