Implementation Of The NAS Parallel Benchmarks In Java

2y ago
17 Views
2 Downloads
672.87 KB
9 Pages
Last View : 30d ago
Last Download : 3m ago
Upload by : Camille Dion
Transcription

ImplementationMichaelof h{ er,Jin,(NAS)Moffetthj in,Benchmarksyan} nas,in bstractSeveral featuresmake Java an attractivechoice for High PerformanceComputingapplicabilityof Java to ComputationalFluid Dynamics(CFD), we have implementedin Java. The performanceand scalability of the benchmarkspoint out the areas wheretechnologyand in C).In order to gauge thethe NAS Parallel Benchmarksimprovementin Java compilerto Fortranin thecompetitionforCFDapplications.1IntroductionThe portability,expressiveness,technology,have created an interestand safety of the Java language,supportedby rapid progressin Java compilerin the HPC communityto evaluate Java on computationallyintensiveproblems[11]. Java threads, RMI, and networkingcapabilitiesposition Java well for programmingon Shared Memory Parallel(SNIP) computersand on computationalgrids.On the other hand issues of safety, lack of light weight objects,intermediatebyte code interpretation,and array access overheadscreate challengesin achievinghigh performancefor Java codes. The challengesare being addressedby work on implementationof efficient Java compilers[12, 13]andby extendingJava with classes implementingthe data types used in HPC [12].In this paper, we describe an implementationof the NAS ParallelBenchmarks(NPB) [1] in Java. The benchmarksuite is acceptedby the HPC communityas an instrumentfor evaluatingperformanceof parallelcomputers,compilers,and tools. We quote from [10] "ParallelJava versions of Linpackand NAS ParallelBenchmarkswouldbe particularlyinteresting".The implementationof the NPB in Java builds a base for trackingthe progressofJava technologyand for evaluatingJava as a choice for programmingscientificapplications,and for identifyingthe areas where improvementin Java compilerswould make the strongestimpact on the performanceof scientificcodes written in Java.Our implementationof theNPBin Javais derivedfromtheoptimizedNPB2.3-serialversion[9] writteninFortran(except IS, written in C). The NPB2.3-serialversion was previouslyused by us for the developmentof theHPF [5] and OpenMP[9] versions of the NPB. We start with an evaluationof Fortranto Java conversionoptionsby comparingperformanceof basic ComputationalFluid Dynamics(CFD) operations.The most efficient optionsare then used to translatethe Fortranto Java. We then parallelizethe resultingJava code by using Java threadsand the master-workersload distributionmodel. Finally, we profile the benchmarksand analyzethe performanceon five different machines:IBM p690, SGI Origin2000,SUN Enterprisel0000,Intel Pentium-IIIbased PC, andApple2G4 rallelParallelof parallelis availableas the nchmarkscomputers(NPB)were derivedand are recognizedfromCFDas a standardcodes[1]. Theyindicatorwere designedof computerto compareperformance.theThe NPB

ofnumericalmethodsroutinelyusedin CFDapplications.ThesimulatedCFD applicationsmimic data traffic andcomputationsfound in full CFD codes.An algorithmicdescriptionof the benchmarks(pencil and paper specification)was given in [1] and is referredto as NPB-1. A source code implementationof most benchmarks(NPB-2)was describedin [2]. The latest releaseNPB-2.3containsMPI source code for all the benchmarksand a stripped-downserial version (NPB-2.a-serial).The serial version was intendedto be used as a startingpoint for parallelizationtools and compilersand for othertypes of parallelimplementations.Recently,OpenMP[9] and HPF [5] versionsof the benchmarkshave beendeveloped.These benchmarkswere released, along with an optimizedserial version, as a separatepackage calledProgrammingBaselinesfor NPB (PBN).For completenessof this paper,we outline the seven benchmarksthathave been implementedin Java.BT is a simulatedCFD applicationNavier-Stokesequations.Implicit (ADI) approximateTridiagonalof 5x5 blocksSP is a simulatedCFDthatuses an implicitalgorithm(a-D)to solve 3-dimensionalThe finite differencessolutionto the problemis basedfactorizationthat decouplesthe x, g, and z dimensions.compressibleon an AlternatingDirectionThe resultingsystem is Blockwhich is then solved sequentiallyalong each dimension.application.It differs from BT in the factorizationof the discreteNavier-Stokesoperator.It employes the Beam-Warmingapproximatefactorizationthat decouplesthe x, y, and z dimensions.The resultingsystem of Scalar Pentadiagonallinear equationsis solved sequentiallyalong each dimension.LU is also a simulatedCFD application.It uses the symmetricsuccessiveover-relaxation(SSOR) methodtosolve the discreteNavier-StokesequationsFT containsthe computationalkernelby splittingit into block Lower and Upper triangularsystems.of a 3-D Fast FourierTransform(FFT).Each FT iterationperformsthree series of one-dimensionalFFTs, one series for each dimension.MG uses a V-cycle MultiGrid methodto computethe solutionof the 3-D scalar Poissonequation.algorithmworks iterativelyon a set of grids that are made betweenthe coarsestand the finest grids. It testsThebothshort and long distancedata movement.CG uses a ConjugateGradientmethodto computeapproximationsto the smallest eigenvaluesof a sparse unstructuredmatrix.This kernel tests unstructuredcomputationsand communicationsby manipulatinga diagonallydominantmatrixwith randomlygeneratedlocationsof entries.IS performskey histogram.3FortranJavaHowever,structuresortingof integerkeys usingIS is the only benchmarktois a moreJavaa linear-timewrittenIntegerSortingalgorithmbasedon computationof thein C.Translationexpressivelanguagematchingthe performanceof the applicationis keptthanFortran.Thiseasesthetaskof translatingFortranof Fortranversions is still a challenge.In the literal translation,intact, the arrays are translatedto Java arrays, complex numberscodeto Java.the proceduralare translatedinto (Re,Im) pairs, and no other objectsare used except the objectshaving the methodscorrespondingto theoriginal Fortran subroutines.The object oriented translationtranslatesmultidimensionalarrays, complex numbers,matrices,and grids into appropriateclasses and changes the code structurefrom the proceduralstyle to the objectoriented style. The advantageof the literal translationis that mappingof the original code to the translatedcodeis direct, and the potentialoverheadfor access and modificationof the correspondingdata is smaller than in theobject oriented translation.On the other hand, the object orientedand allows advising the compiler of special treatmentof particulartranslationresults in a better structuredcodeclasses, for example,using semanticexpansion[12, 13]. Since we are interestedin high performancecode we chose the literal translation.In order to compare efficiency of different options in the literal translationand to form a baseline for estimationof the quality of our implementationof the benchmarks,we chose a few basic CFD operationsand implementedthem in Java. The relativeperformanceof differentimplementationsof the basic operationsgives us a guide forFortran-to-Javatranslation.As the basic operationswe chose the operationsthat we have used to build the HPFperformancemodel loading/storing[6]:arrayelements; filtering an array with a local kernel; (the kernel can be a first or second orderSP, and LU, or a compact3x3x3 stencil as in the smoothingoperatorin MG);star-shapedstencilas in BT,

a matrixvectormultiplicationofa 3-Darrayof5x5matricesby a 3-Darrayof5-Dvectors;(a routineCFDoperation); a reductionsumof 4-Darrayelements.Weimplementedtheseoperationsin umberofarraydimensions.Theversionthat preservesthe versiononthe (Java1.1.3).Sowedecidedto translateFortranarraysinto linearizedJavaarrays;hence,wepresentthe profilingdatafor the linearizedtranslationonly. esultsontheSGIOrigin2000aresummarizedin Table1.Tablethe1. Thegridexecutionsizetimesin secondsof thebasicCFDoperationsontheOperation1 ISerial211. Assignment(10 iterations)2. First Order Stencil3. Second Order 720.6050.2000.2890.3430.1060.1714. Matrix vector multiplication)5. ReductionSum0.5710.0794.9280.3896.1780.3923.182I 0.2011.8960.1481.0330.087Stencil)slowerWe can offer some Threadconclusionscode is a factorfromthe profiling161 32814[0.2640.0790.109FortranOrigin2000;Java 1.1.8Number of Threadsf77 Java .6340.0630.5880.072data:of 3.3 (Assignment)to 12.4 (SecondOrderthanthe s1 threadcolumn)contributesno morethan20% to theexecutiontime; The speedupwith 16 threadsis around7 for the computationallyaround 5-6 for less intensiveoperations(1 and 5).For a more detailed analysisof the basic operationsuses 32 hardwarecountersto count issued/graduatedprimary/secondarywell with the ratioexpensiveoperations(2,3 and 4) and iswe used an SGI profiling tool called perfex.The perfexintegerand floatingpoint instructions,load/stores,andcache misses etc. The profiling with perfex shows thatof the total numberof executedinstructionsin thesethe Java/Fortrantwo codes. Also,performancecorrelatesthe Java code executestwice as many floating point instructionsas the Fortrancode, confirmingthat the Just-In-Time(JIT) compilerdoes not use the "madd"instructionsince that is not compatiblewith the Java roundingerror model [11].Once we chose a literal translationwith array linearizationof Fortranto Java we automatedthe translationbyusing emacsREAL*8regular expressions.u(5 ,nx,ny,nz)For example,to translatetheFortranu(m,i,j,k) .into the Java array:doubleu [] new double [5*nx*ny*nz]; nt usizel 5,usize2 us zel*nx,us ze3 usize2*ny;u ( (m- 1) ( i- 1) *us ize 1 (j - 1) *usize2 (k- 1) *us ze3) .we translatedthe declarationby hand and then translatedthe referencesarrayname\(([',] ),([',] ),([',] ),(D\)] )\)arr aynarne [ (\ 1-1) (\2-1) * size I (\3- t) *s ize2 (\4-1)*s ize3]Similarly,DO loops were convertedto Java for loops using the macrodo[ ] ([- a-z0-9] )[]* []*([- a-z0-9] )[ ]*,[]*(. ) arrayto arrayelementsby usingthemacro

for(\1 \2;\1 \3;\1 ) {Several Fortranconstructswere changed to Java via contextfree replacement.These include allerators,all type declarations(except characterarrays which were convertedto Java strings),somestatements,comments,and the call statement.The semiautomatictranslationallowed us to translateof Fortran code to Java. In general, even the literal translationrequires parsing the Fortrancode andboolean opif-then-elseabout 70%translatingthe parse tree to a Java equivalent,for example, for labeled DO loops, common,format,and IO statements.We structuredthe code in the following way. Each benchmarkhas a base class and derivedmain and workersclasses. The base class containsall global and common variablesas members.The main class containsone methodfor each Fortransubroutine,including main.There is one worker class for each parallelizableFortransubroutine(see the discussionin the next section).The main class has two additionalmethods:runBenchmark() executedinthe serial mode and run () executed in the parallel mode. The runBenchmark() methodcalls all methodsexactlyin the same sequenceas in the original Fortrancode. The run()methodis used to start threadsin the parallelmode. The commonlyused functionsTimer, Random, and PrintResultsare implementedas separateclasses andare importedinto each4JavaUsingbenchmark.ThreadsAll the benchmarksforare packagedin the NPB3.0-JAVpackage.ParallelizationA significantappealof Java for parallelcomputingstems from the presenceof threadsas part of the Javalanguage.On a shared memorymultiprocessor,the Java VirtualMachine(JVM) can assign differentthreadstodifferent processorsand speed up execution of the job if the work is well divided among the threads.Conceptually,Java threadsare close to OpenMPthreads,so we used the OpenMPversion of the benchmarks[9] as a prototypefor the muhithreading.The base class (and, hence, all other classes) of each benchmarkwas derived from class j ava. lang. Thread,soall benchmarkobjects are implementedas threads.The instanceof the main class is designatedto be the masterthat controls the synchronizationof the worker objects.The workers are switchedbetweenblocked and runnablestates with wait () and notify()methods of the Threadclass.In all benchmarks,except MG, the work per thread is the sameinitializationof the threads and partitioningthe work among themis accomplishedby specifyingthe startingand ending iterationsthread dispatchesthe job to each worker, starts the workers,andEach worker threadis then startedand immediatelygoes into ain all iterations.Hence, in these benchmarks,theis performedin the main class. The partitioningof the outer loop for each worker.The masterthen waits until all workers are finished (see 1).blocked state on the conditionthat the variabledone is true; then it performsthe assigned work and notifiesthe master that the work is done.The while looparound the wait call preventsan arbitrarynotify call from waking a threadbefore its time.All CFD codes areplaced in the worker'sstep method.In MG, the load per threaddependson the size of the grid used in thisiteration.Hence, in MG, each thread uses the GetWork () methodto obtain the loop boundariesbefore it performsthe step method.Master'sWorker'scodef or ( i 0 ; l num threads; i )worker [i] . done false;for (i 0; i num threads;i )synchronized(worker [i] ) {worker [i] .notify();}for (i 0 ; i num threads; i )while ( !worker [i] . done) {for (i 0 ; i num threads; i )worker [i] . done false;for (i 0 ; i num threads; i )synchronized(worker[i] ) {worker [i] .notify();}for ( i 0 ; i num threads; i )while ( ! worker [i] . done) {try{wait();}catch(InterruptedExceptionof threadie) {}}Figuremodeltry{wait() ; }cat ch (InterruptedExceptionie) {}}Thiscode1. The master-workersynchronizationis applicablethreadsynchronization.only to independentworkers:eachworkerprocessesthe job

dispatchedtherebythemaster,is a elveswiththeof workers.allotherThiscasetheworkersworkersas shownis theimplementedbuthavefor all arewherea relay-racefinish;(thesynchronizedin 2.while(true){while(done)try{waitfor(k l() ;}catch(InterruptedException;k nz;k )ie) {}{if (id O)while (todo O)try{waitstep(k)() ;}catch(InterruptedExceptionie) {};todo--;if ( id num thr eads- 1 )synchronized(worker[id l] ) {worker[id l].todo ;worker[id l].notify();}}done true;if (id num threads- i) synchronized(mast er) {master.not ify () ;}}Figure2. ations.Hereid is thethreadnum-ber.5PerformanceWehavelargesttestedof 1.3.Thebenchmarksclasses.SUNJavatestsMHz,1.3.0,on thewe alsoresultsareresultsin Tables2 and 3.Also for referencewe rantheon 1 node(1GHz,5.1of AppleComparisonWe offerFT,andthefirstthefollowingMGworkingthefor egroup,maytestedJavaof theW,onprocessors),SGItestedandbutona Linuxprocessors,Java1.1.8,Forison thewassignificantlycomparison,MHz,2 vaitsin TablesthesharedSUNwe executionintensiveof thebe explainedbenchmarksimprovementon322 G4S,MHz,SUNclassp69032thantheJavaOnwe usedworseprocessors,theGHz,16 processors).Enterprisewe includeA as(1.3thatJavaof JavaFortran-OpenMP1.3.0),Table5 chmarksto Fortrangroup,12.5 heOrigin2000we usedtheScalabilitytheIS andratioCFDoperationsratioareCGis rationsoverheadis withinf77 compilertwo 3.11-7.2regular-strideis within1), indicatingof theinterval.BT,SP, LU,computations.which2-4 fromto olvetheperformancebasicCFDof vementcan

Table 2. BenchmarkI 7.1(1.3 GHz,of Threads160.280.832 46.28.928.5 ]6.0 tFT.A Java1.3.0FT.A f77-OpenMPIS.A 21CG.ACG.AMG.Ain threein secondsSP.ALU.ALU,Abe 56.95Javal.3.0f77-OpenMPFirst,JITneeds to reduce28.84.55I14.446.843.137.763.34the ratiot1.6410.8310.4110.46]4.151.562.390.86of Java/Fortran1.800.551.70 10.44 [instructions(whichfor Java1.1.8 isa factor of 10) for executingthe basic operations.Second, the Java roundingerror model should allow the "madd"instructionto be used. Third, in all benchmarksworking on structuredgrids, the array sizes and loop bounds areconstants,and simple compileroptimizationshould be able to lift boundschecking out of the loop [11] withoutcompromisingcodeOur performancemarkingGroup [3].of a Java version issafety.resultsapparentlyare in sharp contrastto the results reportedby the Java GrandeBenchIn that paper it was reportedthat on almost all Java GrandeBenchmarks,the performancewithin a factor of 2 of the correspondingC or Fortranversions.To resolve the discrepancyin the performancewe obtainedversion was not availableon thethe jgf2.0from the www.epcc.ed.ac.uk/javagrandewebsite we literallytranslatedthe Java versionwebsite.Since the Fortranto Fortranand ran both versionson multipleplatforms.The results are summarizedin Table 7. We have also includedresultsof the LINPACKversion of the LU decomposition.From the table we can concludethat the algorithmused in lufactbenchmarkperformsvery poorly relativeto LINPACK.The reason for this is that lufactis based on BLAS1, having poorcache reuse.As a result, the computationsalways wait for data (cache misses), which obscuresthe performancecomparisonbetween the Java and Fortranversions.Note that our Assignmentbase operationexhibitsabout thesameJava/Fortranperformance5.2Scalabilityof Multithreadedratioas theJavalufactbenchmark.CodesSinglethreadedJava benchmarkssometimesrun faster than the serial versions.That can be explainedby thefact that in the singlethreadedversion the data layout is more cache friendly. Overall the multithreadingintroducesan overhead of about 10%-20%.The speedup of BT, SP, and LU with 16 threads is in the range of 6-12 (efficiency0.38-0.75).The low efficiencyof FT on SUN Enterpriseis explainedby the inability of the JVM to use more than4 processorsto run applicationsrequiringsignificantamountsof memory (FT.A uses about 350 MB). An artificialincrease in the memory use for other benchmarksalso resulted in a drop of scalabilityfor more than 4 threads.Thelower scalabilityof LU can be explainedby the fact that it performsthe thread synchronizationinside a loop overone grid dimension,thus introducinghigher overhead due to a thread relay-racingmechanism.The low scalabilityof IS was expectedsince the amountof work performedby each threadis small relativeto other benchmarks,hence, the data movementoverheadseclipse the gain in processingtime.Our tests of CG benchmarkon the SGI Orgin2000showed virtuallyno performanceused; (similar observationsused(2-4).To investigategain until8 processorswereare valid for IS). Even with a large number of threads (10-16), only a few" processorswerethis problem,we used "top -T" which allows monitoringthe individualPosix threadsof

Table 3.BenchmarktimesBT.AJavaBT.Af77-OpenMPSP.A FT.AJavaFT.Af77-OpenMPIS.A JavaIS.A1.1.81.1.8in secondsSGIOrigin2000Number(250 MHz,of Threads32 processors).9136.31028.08332.5983.64806.0519.5I 2645-27. .'1411433.; 12i '00 .83881" 1- j71374944.77111.0850.83789.8504.5i 2333.8259.91705.214 76.44.11581.2133.01188.288.5-1502.2 I7.5.72 nMPMG.AJavaMG.Af77-OpenMPTable 4. Benchmark1.1.8timesin;econdsSerialBT.A Javal.l.3SP.A Javal.l.3LU.A Javal.l.313609.510235.812344.5FT.A Javal.l.3IS.A Javal.l.3CG.A se10000(333Numberof z,8116 547.687.140.872.636.468.7an application.With this utility, we found that the JVM seemed to be ignoring our threadcreation and running allthe threads in one or two Posix threads.The fact that all the other benchmarksran each thread in a separatePosixthread suggestedthat the problem was peculiarto CG. CG's work load is much smaller than the work load of thecomputationallyintensivebenchmarks.Based on this, we hypothesizedthat the JVM was attemptingto optimizeCPU usage by runningthe threads serially on a few processorsinstead of using one processorper thread.In orderto test this, we put an initializationsection into the benchmarkwhich performeda large number of floating pointoperationsin each thread,in the hope that the JVM would create more Posix threadsto handle the high CPUload. With this change in the code, the JVM created all threads for executingthe initializationsection.When theactual computationsdid start,JVM used a separateCPU for each thread.As the number of threadsincreased,the work load on each CPU decreasedsomewhat.However,by initializingthe threadload, we were able to get avisible speedupof CG see Table 2. On the LinuxThe reason for this will be farther investigated.6RelatedPIIIPC we did not obtainany speedupwhenusing2 threads.WorkIn our implementationWestminster'sPerformancewe parallelizedthe NAS ParallelBenchmarksEngineeringGroup at the School of ComputerInterface)to create a systembenchmarksFT and IS usingdependentJava MPI library.javaMPI[8]. The Westminsterusing Java threads.The UniversityofScience used the Java JNI (Java NativeThey also used this libraryto implementthe NASversion of javaMPIcan be compiledon any system

TableLinux5. BenchmarkPC(933Javal.3.0timesMHz,inseconds2 PIIIprocessors).Numberof ThreadsSerialTableon6. BenchmarkAppleXservertimes(1 GHz,1 0851.62188.2MG.A59.94Table7. JavaGrande170.3LUbenchmark[4]. TheFortranversionwasonof sors).NumberBT.ASP.ALU.AMG.Ain2 romlufact.Theperformanceof the LINPACKversionof theLU decomposition(DGETRF,basedonMMULT,and havinggood cache reuse)is shownfor reference.The executiontime is in seconds.(TheclassesA,BandC employMachine/PlatformSUN UltraSparc/Java1.4.0SGI Origin2000/Java1.1.8Sun E10000/Java1.1.3IBM POWER4/Java1.3.0with Java 1.0.2 and LAM 6.1.The Universityof istributedand HighPerformanceComputingGroup,has also releasedtheNASbenchmarksEP and IS (with FT, CG and MG under development)[10], along with many other benchmarksinorder to test Java's suitabilityfor grand challenge applications.The Java GrandeForum have developeda set of benchmarks[4] reflectingvarious computationallyintensiveapplicationswhich likely will benefit from use of Java. TheFortranare significantlymore favorableto Java than ours.7performanceresultsreportedin [3] relativeto C andConclusionsAlthoughthe performanceof the implementedNAS Parallel Benchmarksin Java is lagging far behind Fortranand C at this time, by using the performanceenhancingmethodsdetailed in [11, 13], the serial performancecan beimprovedto near Fortran-likeperformance.From our performanceresults it follows that the IBM Java compileris the leader in this direction.Efficiencyof parallelizationwith threadsis about 0.5 for up to 16 threadsand islower than the efficiency' of parallelizationwith OpenMP,IvIPI, and HPF on SGI and SUN machines.However, onthe IBM machine, the scalablityof the Java code is as good as that of OpenMP,and in average the performanceof the Java code is within a factor of 3 of that of Fortran.With several groups working on MPI and OpenMPfor Java, improvementsin parallel performanceand scalabilityseem likely as well. The attractionof Java as a numericallyintensiveapplicationslanguage is primarilydriven by itsease of use, universalportability,and high expressivenesswhich, in particular,allows expressingparallelism.If Javacode is made to run faster throughmethodsthat have already been researchedextensively,such as high order looptransformations,semantic expansion,and a wider availabilityof traditionallyoptimizednative compilers,togetherwith an implementationof multidimensionalarrays and complex numbers,it could be an attractiveprogrammingenvironmentfor HPC applications.The NPB3.0-JAVpackage is avialablefrom www.nas.nasa.gov/Software/NPB.8

Acknowledgment.The authorsappreciatehelpof IBM staffmembersCharlesGrasslandLuiz ervers.References[1]D. Bailey, J. Barton, T. Lasinski, and H. Simon 1991,http://www.nas.nasa.gov/Software/NPB/[2] D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart,A. Woo, M. Yarrow.2.0. Report NAS-95-020,Dec. 1995. http://science.nas.nasa.gov/Software/NPB.[3] J.M. Bull, L.A. Smith,L. Pottage,R. Freeman.Applications.Joint ACM Java Grande - ISCOPE[4] Java1.html.GrandeBenchmarks.[7] The GNU[8] V. hmarkingJava againstC and Fortran for2001 Conference,Palo Alto, CA, pp. ng/research'activities/java'grande/index"[5] M. Frumkin,H. Jin, J. Yah. HPFference on Paralleland Distributed[6] M. Frumkin,H. Jim J. Yan.CDROMversion of [PPS/SPDPThe NASImplementationof NPB2.3.Proceedingsof ISCA llthInternationalComputingSystems,Chicago, I1, September2-4, 1998, 8 pp.Implementationof NAS Parallel1999 Proceedings,April 12-16,for the JavaTMProgrammingS. Flynn-Hummel,S. Mintchev.Proceedingsof the 1998 ACMLanguage.Con-Benchmarksin High PerformanceFortran.1999, San Juan, PuertoRico, 10 nceParallel ProgrammingWorkshopon Java for High-Performancein Java." cs.wmin.ac.uk/JavaMPI-[9] H. Jin, M. Frumk[n,J. Yah. The OpenMPImplementationformance.NAS TechnicalReportRNR-99-011,NASA Amesof NASResearchParallelCenterBenchmarksand Its PerMoffettField, CA, 1999,http://www.nas.nasa.gov/Software/NPB/.[10] J.A. Mathew,marks.Proc.http:/P.D. Coddingtonof erence,Developmentof JavaSanFrancisco,JuneGrande1999,Bench9 ml.[11] S.P. Midkiff,J.E. Moreira,M. Snir. Java for NumericallyIntensiveComputing:from FlopsProceedingsof FRONTIERS'99,pp. 251-257, Annapolis,Maryland,February21-25, 1999.[12] P. Wu,S.Java.Proc.Midkiff,of theJ. ence,SupportforSanFrancisco,to w.cs.ucsb.edu/conferences/java99/pr gram'html[13] J. E. Moreira,grammingforS. P. Midkiffhigh-performanceM. Gupta,numericalhttp: / . V. Artigas,M. Snir,computing.IBMSystemsand R.Journal,D.Lawrence.Java proYbl. 39, n. 1, 2000.

applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Network Attached Storage VMs on NAS . 2/15/2013 Virtual Machine Workloads: The Case for New Benchmarks for NAS 10 VM-NAS I/O Stack . GPFS, WAFL, ZFS GPFS, WAFL, ZFS NAS Appliance Application NFS /SMB Current NAS Benchmarks New NAS Benchmarks Physical Machine

NAS Jacksonville NOLF Whitehouse NAVSTA GTMO NAS Pensacola NAS Corpus Christi NOLF Cabannis NOLF Goliad NOLF Waldron NAS Kingsville NALF Orange Grove NAS Key West . jeffrey.w.frank@navy.mil (904) 542-6315 (COMM) 942-6315 (DSN) LT JAMIE WALLACE, CEC, USN Construction Manager Public Works Department Building 27