Fujitsu's Lustre Contributions - EOFS

3y ago
42 Views
3 Downloads
1.05 MB
33 Pages
Last View : 6d ago
Last Download : 3m ago
Upload by : Xander Jaffe
Transcription

Lustre Administrators andDevelopers Workshop 2014Fujitsu's Lustre Contributions- Policy and Roadmap-Shinji Sumimoto, Kenichiro SakaiFujitsu Limited, a member of OpenSFSCopyright 2014 FUJITSU LIMITED

Outline of This Talk Current Status of Fujitsu’s Supercomputer Development Past and Current Product Development The Next Step towards Exa-scale Development Fujitsu’s Contribution Policy to Lustre Community Contribution Policy Current Contribution and the Next Step Introduction of Contribution Feature IB Multi-rail, Directory Quota etc.1Copyright 2014 FUJITSU LIMITED

Fujitsu Joins OpenSFS, Oct. 14, 20132Copyright 2014 FUJITSU LIMITED

CURRENT STATUS OF FUJITSU’SSUPERCOMPUTER DEVELOPMENT3Copyright 2014 FUJITSU LIMITED

History of Fujitsu SupercomputersFujitsu has been developing HPC file system for customersPRIMEHPC FX10MassivelyParallelFX1The K computer*2SPARCEnterpriseWorld’s FastestVector Processor (1999)VectorNo.1 in Top500(Nov. 1993)Gordon Bell Prize(1994, 95, 96)Most Performance Efficientin Top500 (Nov. 2008)VPP5000NWT*1VP-200F230-75APU VP-100ScalarPRIMEQUESTVPP300/700VPP500No.1 in Top500(June 2011, Nov 2011)PRIMEPOWERHPC2500AP3000AP1000PRIMERGY CX1000Cluster nodePRIMERGY BX900Cluster nodeHX600Cluster nodePRIMERGY RX200Cluster nodeJapan’s LargestCluster in Top500(June 2010)ClusterJapan’s LargestCluster in Top500(June 2004)Japan’s FirstSupercomputer(1977)*1 NWT Numerical Wind Tunnel, Joint development with JAXA 1980 1985 1990199520004*2 Joint Development with Riken20052010Copyright 2014 FUJITSU LIMITED

K computer and the Next Step K computer: Still TOP500 Rank #4 system in the world. FEFS on K computer is the first 1 TB/s sustained IOR performancefile system in the world. We are now developing FEFS for the next Post-FX10 system. The next target is Exa-scale system4 coreVISIMPACT8 DDR2-DIMM8 coreHPC-ACE8 DDR3-DIMMTofu interconnect16 coreHPC-ACE8 DDR3-DIMMTofu interconnect32 coreHPC-ACE28 Hybrid Memory CubeTofu interconnect 220082010201220155Copyright 2014 FUJITSU LIMITED

SDHPC Activities for Exascale System Japanese researchers wrote roadmap papers for the exascalesystem (2010/8 - )(Japanese) 012/03/FutureHPCI-Report.pdf(English) -3-kondo.pdf6Copyright 2014 FUJITSU LIMITED

Storage and System Requirement from theArchitecture Roadmap7Copyright 2014 FUJITSU LIMITED

Fujitsu’s FEFS Development towards Exascale Fujitsu will continue to develop Lustre based FEFS torealize the next generation exa-scale systems. Needs to continue to enhance Lustre FEFS already supports Exa-byte class file system size However, several issues to realize real Exa-scale file system One of Issue is Exa-scale storage design Electric Power and Footprint including Computing System andStorage: Electric Power: 20-30MW, Footprint: 2000m2(SDHPC) Electric power for storage system must be minimized becausemost of the power should be used for computing. Power Consumption of Exa-byte class Storage System:Should be Less than 1MW (as assumption)8Copyright 2014 FUJITSU LIMITED

Exascale File System Design K computer File System Design How should we realize High Speed and Redundancy together? How do we avoid I/O conflicts between Jobs? These are not realized in single file system. Therefore, we have introduced Integrated Layered File System. Exascale File System/Storage Design Another trade off targets: Power, Capacity, Footprint Difficult to realize single 1EB and 10TB/s class file systemin limited power consumption and footprint.Local File System Third Storage layer for Capacity is needed:Three Layered File SystemGlobal File System Local File System for PerformanceArchive File System Global File System for Easy to Use Archive File System for Capacity9Copyright 2014 FUJITSU LIMITED

The Next Integrated Layered File System Architecturefor Post-peta scale System (Feasibility Study 2012-2013) Local File System o(10PB): Memory, SSD, HDD Based Application Specific, Existing FS, Object Based, etc. Global File System o(100PB): HDD Based Lustre Based, Ext[34], Object Based, Application Specific etc. Archive System o(1EB): HSM(Disk Tape), Grid, Cloud Based HSM, Lustre, other file systemComputeNodesNodesComputeComputeNodesCompute NodesThousands of UsersLoginServerHigh Speed for bject BasedSharedUsabilityLustre BasedTransparent Data AccessJob Scheduler10OtherOrganizationOtherSystemsHigh Capacity & Redundancy& InteroperabilityHSM, Other Shared FS,Grid or Cloud Based/dataCopyright 2014 FUJITSU LIMITED

Issues for Post-Petascale File/Storage System Power Saving Storage Architecture for 1EB Class storage 20MW-30MW: Total System Power including Computing System Required Total(Compute and Storage) power management Lustre is not ready for EXA byte size systems FEFS and GPFS are ready, so current Lustre needs to expand itslimits. It also limits specification of Lustre 2.x based FEFS Issues for Realizing Post-Petascale File System: How to realize application specific high speed file accessto the local file system? – Needs to investigate storage accesspattern of target applications How to realize transparent file access among three filesystems? – Lustre HSM is one of options.11Copyright 2014 FUJITSU LIMITED

FUJITSU’S CONTRIBUTION POLICY TOLUSTRE COMMUNITY12Copyright 2014 FUJITSU LIMITED

Fujitsu’ Lustre Contribution Policy Fujitsu will open its development plan and feed back it’senhancements to Lustre community LAD is the most suitable place to present and discuss. Fujitsu’s basic contribution policy: Opening development plan Feeding back its enhancements to Lustre communityno later than after a certain period when our product is shipped.OpenSFSLustre DevelopmentReleaseDev. PlanFujitsuEnhancementLustre DevelopmentFeedbackEnhancementContribution& Production Cycle13Lustre DevelopmentFeedbackEnhancementProduct ShipmentFujitsu CustomersCopyright 2014 FUJITSU LIMITED

Fujitsu’s Activities to Lustre Community Step 1 (2012-2013): Basic Enhancement for CoreLustre Modules with Whamcloud/Intel Step 2 (2014- ): Advanced Function Contributionby Fujitsu.14Copyright 2014 FUJITSU LIMITED

Fujitsu Contributions of Basic Enhancement Fujitsu ported our enhancements into Lustre 2.xwith IntelJiraLU-2467FunctionAbility to disable pingingLandingLustre 2.4LU-2466LNET networks hashingLustre 2.4LU-2934LNET router prioritiesLustre 2.5LU-2950LNET read routing list from fileLustre 2.5LU-2924Reduce ldlm poold execution timeLustre 2.5LU-3221Endianness fixes (SPARC support)Lustre 2.5LU-2743Errno translation tables (SPARC Support) Lustre 2.5LU-4665lfs setstripe to specify OSTsLustre 2.7Bug-fixes are not included15Copyright 2014 FUJITSU LIMITED

Fujitsu Contributions of Advanced Functions Fujitsu’s now been porting our enhancements into Lustre 2.x These features were implemented in Lustre 1.8 based FEFS They’ve been used in our customer’s HPC system, including K computer We’ll start submitting patches for Lustre in 2015FunctionsSubmitting ScheduleIB multi-railJan. 2015Automated Evict RecoveryApr. 2015Directory Quota2nd half of 2015Improving Single Process IO Performance 2nd half of 2015Client QoS2nd half of 2015Server QoSTBDMemory Usage ManagementTBD16Copyright 2014 FUJITSU LIMITED

Fujitsu’s Contribution Roadmap Fujitsu’s development and community feedback plan Schedule may change by Fujitsu’s development/marketing strategyCY2014CY2015CY2016FujitsuPorting FEFS featuresinto Lustre2.xOpenSFS- Directory Quota- IB Multi-rail- Single process I/O- Evict recovery- Client QoSFeed-backLustreEnhancement(TBD: Snapshot, etc.)2.62.72.817Copyright 2014 FUJITSU LIMITED

Advanced Function(1) InfiniBand (IB) Multi-rail Multiple InfiniBand(IB) interfaces as a single Lustre NID Improving Data Transferring Bandwidth on a single Lustre node Improving Redundancy against Failures of IB. Achieved about 11GB/s read/write performance with two FDR IB HCAs(Single 6GB/s) Tested with upto four IB HCA devices Directory Quota able to: Use Directory Quota (DQ for short) feature in the same way ofLustre’s UID/GID quota function Limit the number of inodes and disk blocksto each directory specified by user Be managed by lfs command like UID/GID quota of Lustre.18Copyright 2014 FUJITSU LIMITED

Advanced Function(2) Improvement of single process IO performance Improving single process IO performance Our prototype results: Over 2GB/s bandwidth twice as fast as Lustre 2.5. Client QoS Provides Fair Share accesses among users on a single Lustre client On a multi user client, when one user issues large amount of IO, the IOperformance of the other users are terribly degrade. Client QoS feature prevents this performance issue by controlling the numberof IO requests issued by each user. Automated Evict Recovery When a Lustre server evicts a client, the server notifies the client to reconnectthe server. This occurs IO error to user application Minimizing the evicting status of Lustre clients especially disable pingingfeature is enabled Reducing the occurring of IO error to user application.19Copyright 2014 FUJITSU LIMITED

INTRODUCTION OF CONTRIBUTIONInfiniBand (IB) Multi-railFEATURESDirectory QuotaImproving Single Process IO Performance20Copyright 2014 FUJITSU LIMITED

Issue of Current Lustre IB Multi-rail Client, MDS and OSS can not use multiple IB I/F. Single IB I/F failure in a server (MDS/OSS) cause failover. Client can use only one IB I/F when accessing a server.Clients AUse singleIB I/Fib0ib1Clients Bib0Clients Aib1Clients Bib0ib1ib0ib1ib0ib1ib0ib1IB ight 2014 FUJITSU LIMITED

FEFS IB Multi-rail FEFS Approach: Add IB multi-rail functioninto Lustre network driver (o2iblnd). All IB I/F on the client can be used to communicate with a server. All IB connections are used by round-robin order. Continue communication when single point of IB failure occurs. All IB connections are used by round-robin order by each requests.Clients AMulti-railby o2iblndIb0Clients BClients AIb0Clients BIb0Ib0Ib0Ib0IB SWFailureIb0Ib0MDS/OSSMDS/OSS22No failoverrequiredCopyright 2014 FUJITSU LIMITED

Variation of Multi-Rail Not only symmetric connectionbut also asymmetric connection for every node pair. User can realize flexible configuration23Copyright 2014 FUJITSU LIMITED

IB Multi-Rail: How to Use Combining single NID width multiple IB interfacesClientHCA0 ib0(192.168.0.10)HCA1 ib1(192.168.0.11)Server (MDS/OSS)Single LNET(network o2ib0)NID 192.168.0.10@o2ib0HCA0 ib0(192.168.0.12)HCA1 ib1(192.168.0.13)NID 192.168.0.12@o2ib0 LNET setting (modprobe.conf)options lnet networks o2ib0(ib0,ib2) NID/IPoIB definition# lctl –net o2ib0 add o2ibs 192.168.0.10@o2ib0 192.168.0.10 192.168.0.11 Client# lctl –net o2ib0 add o2ibs 192.168.0.12@o2ib0 192.168.0.12 192.168.0.13 Server Display multi-rail information# lctl --net o2ib0 show o2ibs192.168.0.10@o2ib0 192.168.0.10 192.168.0.11192.168.0.12@o2ib0 192.168.0.12 192.168.0.1324Copyright 2014 FUJITSU LIMITED

IB Multi-Rail: LNET Performance ServerServer CPU: Xeon E5520 2.27GHz x2 IB: QDR x2 or FDR x2IB x2IB SW ResultIB x2 B/W almost scales by #IBs Achieves nearly HW performanceServer(Concurrency 32)25Copyright 2014 FUJITSU LIMITED

IB Multi-Rail: IO Throughput of Single OSS OSS/Client Result CPU: Xeon E5520 2.27GHz x2 IB: QDR x2 Throughput almost scales by #IBs Measurement of FDR is planned OST ramdisk x8 ( 6GB/s) IOR 32-process (8client x4)ClientClient Clientx8IB SWo2ib0MDSOSSramdiskx826Copyright 2014 FUJITSU LIMITED

Directory Quota:Features What is Directory Quota? Restricting #inodes&blocks by individual directories All files/directories under the DQ-enable directory are underQuota accounting Fujitsu is now implementing Directory Quota (DQ)function into Lustre 2.x DQ of FEFS based on Lustre 1.8 has been used in productionsystems for more than two years. Will be Implemented on top of the Disk Quota framework DQ can be used along with disk Quota27Copyright 2014 FUJITSU LIMITED

Directory Quota(DQ):Use Image Use Case1: for Job Directory DQ can control file system usagefor each job/(Root)Job1SubD1Job2Job3SubD2 SubD3SubD3 Use Case1: for Shared Directory Of course, DQ can control shareddirectories for their usage/(Root)tmpSubD1bindataSubD2 SubD3SubD3 Limitation Nested DQ directories are notpermitted, because of simplicity ofimplementation and performance28/(Root)dir1SubD1dir2dir3SubD2 SubD3SubD3Copyright 2014 FUJITSU LIMITED

Directory Quota: How to Use Operations are same as Lustre’s UID/GID Quota Only “quotacheck” operation differs Set DQ on target directory ( DQ-directory) # lfs quotacheck –d target dir Counts the number of inodes&blocks of existing files under DQ-directory Set limits of inodes and blocks # lfs setquota –d target dir -B #blk -I #inode mountpoint Enable limiting by DQ # lctl conf param fsname .quota. ost mdt ugd # lctl set param -P fsname .quota. ost mdt ugd Check status # lctl get param osd-*.*.quota slave.info29Copyright 2014 FUJITSU LIMITED

Improving Single Process IO Performance Comparison between Lustre 2.6.0 and prototype (Lustre 1.8 base) We’ve been re-designing implementation suiting Lustre 2.x OSS/Client Result CPU: Xeon E5520 2.27GHz x2 IB: QDR x1 Lustre 2.6.0 Prototype0.9 1.0GB/s2.2 2.9GB/s OST ramdisk x4 IOR 1-processClientQDR IB SWMDSOSS ramdiskOSSx4ramdisk30Copyright 2014 FUJITSU LIMITED

Summary Fujitsu will continue to improve Lustre for exascale systems. Fujitsu will open its development plan and feed back it’senhancements to Lustre community LAD is the most suitable place to present and discuss. Several Features will be scheduled to be contributed InfiniBand Multi-rail, Direcotry Quota etc OpenSFSLustre DevelopmentReleaseDev. PlanFujitsuEnhancementLustre DevelopmentFeedbackEnhancementContribution& Production Cycle31Lustre DevelopmentFeedbackEnhancementProduct ShipmentFujitsu CustomersCopyright 2014 FUJITSU LIMITED

Copyright 2014 FUJITSU LIMITED

PRIMERGY BX900 Cluster node HX600 Cluster node PRIMERGY RX200 Cluster node Cluster No.1 in Top500 (June 2011, Nov 2011) Japan’s Largest Cluster in Top500 (June 2010) PRIMERGY CX1000 Cluster node Massively Parallel Fujitsu has been developing HPC file system for customers 4

Related Documents:

Fujitsu - LIFEBOOK E752 Fujitsu - LIFEBOOK E753 Fujitsu - LIFEBOOK E782 Fujitsu - LIFEBOOK N532 Fujitsu - LIFEBOOK NH532 Fujitsu - LIFEBOOK P702 Fujitsu - LIFEBOOK P772 Fujitsu - LIFEBOOK S752 Fujitsu - LIFEBOOK S762 Fujitsu - LIFEBOOK S782 Fujitsu - LIFEBOOK S792 Fujitsu - LIFEBOOK SH54/H

Have a question? Email us at: AskFujitsu@us.fujitsu.com Data SheetFUJITSU Notebook LIFEBOOK U937 Fujitsu recommends Windows. Fujitsu OPTIMIZATION Services In addition to FUJITSU Notebook LIFEBOOK U937, Fujitsu provides a range of platform solutions. They combine reliable Fujitsu products with the best in services, know- how and worldwide .

Серверы Fujitsu PRIMERGY Серверы Fujitsu PRIMEQUEST Серверы Fujitsu M10 Family Системы хранения Fujitsu ETERNUS / NetApp FAS PM Сервисы хостинга Сервисы сопровождения и поддержки решения Fujitsu F1 (на базе FJ HW SAP B1 SW)

Fujitsu provides multiple solutions for OTN switching. These solutions begin with the FLASHWAVE CDS in the access network to provide ODUn granularity and switching, and range to the FLASHWAVE 9500 platform with its scalable multiterabit switching fabric for the metro regional network. Fujitsu OTN solutions provide OTN switching at theFile Size: 2MBPage Count: 10Explore furtherFLASHWAVE 9500 Packet ONP - Fujitsu Network Communications .www.fujitsu.comBuy Used & Refurbished Fujitsu Flashwave 9500 Worldwide .worldwidesupply.netF L A S H WAV E 9 5 0 0 - Fujitsuwww.fujitsu.comRecommended to you based on what's popular Feedback

Hadoop* Adaptor for Lustre* Based on the new Hadoop* architecture AKA MapReduce NextGen based on Apache* Hadoop2.0.4 Packaged as a single Java* library (JAR) -Classes for accessing data on Lustre* in a Hadoop compliant manner. Users can configure Lustre Striping.

Hadoop* on Lustre* Liu Ying (emoly.liu@intel.com) High Performance Data Division, Intel Corporation . Based on the new Hadoop* architecture Packaged as a single Java* library (JAR) Classes for accessing data on Lustre* in a Hadoop* compliant manner. Users can configure Lustre Striping.

This Reference Manual is valid for the following systems: FUJITSU Desktop ESPRIMO Q556 (if applicable, followed by suffixes) FUJITSU Desktop ESPRIMO Q956 FUJITSU Desktop ESPRIMO Q957 Fujitsu 7

Konsumsi asam folat, vitamin B12 dan vitamin C pada ibu hamil tergolong masih rendah, sehingga konsumsi sumber vitamin perlu ditingkatkan untuk mencegah masalah selama kehamilan, seperti anemia, prematur, dan kematian ibu dan anak. Kata kunci: asam folat, ibu hamil, vitamin B12, vitamin C *Korespondensi: Telp: 628129192259, Surel: hardinsyah2010@gmail.com J. Gizi Pangan, Volume 12, Nomor 1 .