23 Eylül 2012 Pazar

My own personal benchmarks for NWChem, gromacs with atlas, openblas, acml on AMD and intel


The title says it all, really. Since I'm back to exploring ways of improving performance for my little cluster I figured I'd break this out as a separate post. Most of this data was found here before: http://verahill.blogspot.com.au/2012/09/new-compute-node-using-amd-fx-8150.html

All units are running up-to-date debian testing (wheezy).

Configuration:
Boron (B): Phenom II X6 2.8 GHz, 8Gb RAM (2.8*6=16.8 GFLOPS predicted)
Neon (Ne): FX-8150 X8 3.6 GHz, 16 Gb RAM (3.6*8=28.8 GFLOPS predicted (int), 3.6*4=14.4 GFLOPS (fpu))
Tantalum (Ta): Quadcore i5-2400 3.1 GHz, 8 Gb RAM (3.1*4=12.4 GFLOPS predicted)
Vanadium (V):  Dual socket 2x Quadcore Xeon X3480 3.06 GHz, 8Gb. CentOS (ROCKS 5.4.3)/openblas.

Results

Gromacs --double (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, double precision, 500k steps)
B  :  10.662 ns/day (11.8  GFLOPS, runtime 8104 seconds)***
B  :    9.921 ns/day ( 10.9 GFLOPS, runtime 8709 seconds)**
Ne:  10.606 ns/day (11.7  GFLOPS, runtime 8146 seconds) *
Ne:  12.375 ns/day (13.7  GFLOPS, runtime 6982 seconds)**
Ne:  12.385 ns/day (13.7  GFLOPS, runtime 6976 seconds)****
Ta:  10.825 ns/day (11.9  GFLOPS, runtime 7981 seconds)***
V :   10.560 ns/dat (11.7  GFLOPS, runtime 8182 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

Gromacs --single (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, single precision, 500 k steps)
B  :   17.251 ns/day (19.0 GFLOPS, runtime 5008 seconds)***
Ne:   21.874 ns/day (24.2 GFLOPS, runtime  3950 seconds)**
Ne:   21.804 ns/day (24.1 GFLOPS, runtime 3963  seconds)****
Ta:   17.345 ns/day (19.2 GFLOPS, runtime  4982 seconds)***
V :   17.297 ns/day (19.1 GFLOPS, runtime 4995 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

NWChem (opt biphenyl cation, cp-md/pspw):
B  :   5951 seconds**
B  :   4084 seconds ***
B  :   5782 seconds ***xy
Ne:    3689 seconds**
Ta :   4102 seconds***
Ta :   4230 seconds***xy
V :    5396 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem

NWChem (opt biphenyl cation, geovib, 6-31G**/ub3lyp):
B  :  2841 seconds **
B  :  2410 seconds***
B  :  2101 seconds ***x
B  :  2196 seconds ***xy
Ne: 1665 seconds **
Ta : 1785 seconds***
Ta : 1789 seconds***xy
V  : 2600 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem
y NWChem 6.1.1

A Certain Commercial Ab Initio Package (Freq calc of pre-optimised H14C19O3 at 6-31+G*/rb3lyp):
B  :    2h 00 min (CPU time 10h 37 min)
Ne:   1h 37 min (CPU time: 11h 13 min)
Ta:   1h 26 min (CPU time: 5h 27 min)
V  :   2h 15 min (CPU time 15h 50 min)
Using precompiled binaries.


Gamess:
(I'm still working on learning how to run gamess efficiently, so take these values with a huge saucer of salt for now). bn.inp does a geometry optimisation of a biphenyl cation (mult 2) at ub3lyp/6-31G**. bn.inp has no $STATPT card while bn3.inp does and it makes a huge difference -- but is this because it does 20 steps (nsteps=20), then kills the run? The default is 50 steps and it does seem like all the runs do the maximum number of steps, then exit.

 Again, still learning. See below for input files. Will fix this post as I learn what the heck I'm doing. The relative run times on each node are still comparable though, but just don't use the numbers to compare the run speed of e.g. nwchem vs gamess.

Gamess using bn.inp with atlas
B:    9079 seconds
Ne: 7252 seconds
Ta: 9283 seconds

Gamess using bn.inp with openblas
B:   9071 seconds
Ta: 9297 seconds


Gamess using bn.inp with acml
Ne: 7062 seconds

Gamess using bn3.inp with atlas. B: 4016 secondsNe: 3162 secondsTa: 4114 seconds



bn.inp:
 $CONTRL COORD=CART UNITS=ANGS scftyp=uhf dfttyp=b3lyp runtyp=optimize ICHARG=1 MULT=2 maxit=100$END $system mwords=2000 $end $BASIS gbasis=n31 ngauss=6 ndfunc=1 npfunc=1 $END $guess guess=huckel $end $DATAbiphenylC1C      6.0      0.0000000000   -3.5630100000    0.0000000000 C      6.0     -1.1392700000   -2.8592800000   -0.3938400000 C      6.0     -1.1387900000   -1.4654500000   -0.3941500000 C      6.0      0.0000000000   -0.7428100000    0.0000000000 C      6.0      1.1387900000   -1.4654500000    0.3941500000 C      6.0      1.1392700000   -2.8592800000    0.3938400000 C      6.0      0.0000000000    0.7428100000    0.0000000000 C      6.0      1.1387900000    1.4654500000   -0.3941500000 C      6.0      1.1392700000    2.8592800000   -0.3938400000 C      6.0     -1.1387900000    1.4654500000    0.3941500000 C      6.0      0.0000000000    3.5630100000    0.0000000000 C      6.0     -1.1392700000    2.8592800000    0.3938400000 H      1.0      0.0000000000   -4.6489600000    0.0000000000 H      1.0     -2.0282700000   -3.3966200000   -0.7116100000 H      1.0     -2.0214800000   -0.9282700000   -0.7279300000 H      1.0      2.0282700000   -3.3966200000    0.7116100000 H      1.0      2.0282700000    3.3966200000   -0.7116100000 H      1.0     -2.0214800000    0.9282700000    0.7279300000 H      1.0      0.0000000000    4.6489600000    0.0000000000 H      1.0     -2.0282700000    3.3966200000    0.7116100000 H      1.0      2.0214800000    0.9282700000   -0.7279300000 H      1.0      2.0214800000   -0.9282700000    0.7279300000  $END

bn3.inp:
$CONTRL COORD=CART UNITS=ANGS scftyp=uhf dfttyp=b3lyp runtyp=optimize ICHARG=1 MULT=2 maxit=100$END $system mwords=2000 $end $BASIS gbasis=n31 ngauss=6 ndfunc=1 npfunc=1 $END $STATPT OPTTOL=0.0001 NSTEP=20 HSSEND=.TRUE. $END $guess guess=huckel $end $DATAbiphenylC1C      6.0      0.0000000000   -3.5630100000    0.0000000000 C      6.0     -1.1392700000   -2.8592800000   -0.3938400000 C      6.0     -1.1387900000   -1.4654500000   -0.3941500000 C      6.0      0.0000000000   -0.7428100000    0.0000000000 C      6.0      1.1387900000   -1.4654500000    0.3941500000 C      6.0      1.1392700000   -2.8592800000    0.3938400000 C      6.0      0.0000000000    0.7428100000    0.0000000000 C      6.0      1.1387900000    1.4654500000   -0.3941500000 C      6.0      1.1392700000    2.8592800000   -0.3938400000 C      6.0     -1.1387900000    1.4654500000    0.3941500000 C      6.0      0.0000000000    3.5630100000    0.0000000000 C      6.0     -1.1392700000    2.8592800000    0.3938400000 H      1.0      0.0000000000   -4.6489600000    0.0000000000 H      1.0     -2.0282700000   -3.3966200000   -0.7116100000 H      1.0     -2.0214800000   -0.9282700000   -0.7279300000 H      1.0      2.0282700000   -3.3966200000    0.7116100000 H      1.0      2.0282700000    3.3966200000   -0.7116100000 H      1.0     -2.0214800000    0.9282700000    0.7279300000 H      1.0      0.0000000000    4.6489600000    0.0000000000 H      1.0     -2.0282700000    3.3966200000    0.7116100000 H      1.0      2.0214800000    0.9282700000   -0.7279300000 H      1.0      2.0214800000   -0.9282700000    0.7279300000  $END

Hiç yorum yok:

Yorum Gönder