Geek Guide Ceph: Open-Source SDS

3y ago
21 Views
2 Downloads
2.00 MB
25 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Evelyn Loftin
Transcription

GEEK GUIDE CEPH: OPEN-SOURCE SDSTable of ContentsAbout the Sponsor 4Introduction 5Overview: What Is Software-Defined Storage? 6Benefits of SDS 9Single Management Interface 9Reduced CAPEX 10Scalable 11No Single Point of Failure 11Ceph 12Architecture 13Hardware Considerations 15RAM 17Drives 17Networks 18Other Considerations 19Minimum Hardware Recommendations 21How to Get Started 21SUSE Enterprise Storage 23Conclusion 25TED SCHMIDT is the Senior Project Manager and Product Owner of Digital Productsfor a consumer products development company. Ted has worked in Project and ProductManagement since before the agile movement began in 2001. He has managed project andproduct delivery for consumer goods, medical devices, electronics and telecommunicationmanufacturers for more than 20 years. When he is not immersed in product development,Ted writes novels and runs a small graphic design practice at http://floatingOrange.com.Ted has spoken at PMI conferences, and he blogs at http://floatingOrangeDesign.Tumblr.com.2

GEEK GUIDE CEPH: OPEN-SOURCE SDSGEEK GUIDES:Mission-critical information for the most technical people on the planet.Copyright Statement 2016 Linux Journal. All rights reserved.This site/publication contains materials that have been created, developedor commissioned by, and published with the permission of, Linux Journal(the “Materials”), and this site and any such Materials are protected byinternational copyright and trademark laws.THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND,EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIEDWARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,TITLE AND NON-INFRINGEMENT. The Materials are subject to change without noticeand do not represent a commitment on the part of Linux Journal or its Web sitesponsors. In no event shall Linux Journal or its sponsors be held liable for technicalor editorial errors or omissions contained in the Materials, including without limitation,for any direct, indirect, incidental, special, exemplary or consequential damageswhatsoever resulting from the use of any information contained in the Materials.No part of the Materials (including but not limited to the text, images, audioand/or video) may be copied, reproduced, republished, uploaded, posted,transmitted or distributed in any way, in whole or in part, except as permitted underSections 107 & 108 of the 1976 United States Copyright Act, without the expresswritten consent of the publisher. One copy may be downloaded for your personal,noncommercial use on a single computer. In connection with such use, you may notmodify or obscure any copyright or other proprietary notice.The Materials may contain trademarks, services marks and logos that are theproperty of third parties. You are not permitted to use these trademarks, servicesmarks or logos without prior written consent of such third parties.Linux Journal and the Linux Journal logo are registered in the US Patent &Trademark Office. All other product or service names are the property of theirrespective owners. If you have any questions about these terms, or if you wouldlike information about licensing materials from Linux Journal, please contact usvia e-mail at info@linuxjournal.com.3

GEEK GUIDE CEPH: OPEN-SOURCE SDSAbout the SponsorSUSE, a pioneer in open-source software, provides reliable,interoperable Linux, cloud and storage infrastructuresolutions that give enterprises greater control and flexibility.More than 20 years of engineering excellence, exceptionalservice and an unrivaled partner ecosystem power theproducts and support that help our customers managecomplexity, reduce cost and confidently deliver missioncritical services. The lasting relationships we build allow usto adapt and deliver the smarter innovation they need tosucceed—today and tomorrow.4

GEEK GUIDE CEPH: OPEN-SOURCE SDSCeph:Open-SourceSDSTED SCHMIDTIntroductionDespite the recent trend of businesses moving to cloudbased storage and SaaS applications, businesses of allsizes will see significant benefits from pursuing a strategythat mixes cloud-based information services for standardor back-office functions with in-house management ofmission- and strategy-critical data. Data has become moreimportant than ever, and bigger than ever, but growingdata comes with increased costs and performance issues.Enterprises are looking for data management solutions5

GEEK GUIDE CEPH: OPEN-SOURCE SDSthat are scalable, resilient and that can be built oncommodity hardware.The current trend is toward commodity hardwareand solid-state drives rather than disk, and away fromlegacy NAS and SAN solutions, where proprietarysoftware and hardware keep you from realizing thecost benefits of using commodity hardware. In short,businesses are looking for a way to separate storagehardware from the software that manages it, so theycan use a single, efficient software solution that willmanage any vendor’s hardware.Enter software-defined storage, or SDS.But, what is SDS? There are a lot of opinions on whatexactly SDS is and how it benefits the enterprise. In thisebook, I explore the background of SDS and define whatit really means. I look at some of the characteristics ofSDS, examine Ceph, an open-source SDS solution, anddiscuss why an open-source solution that leveragescommodity hardware might be your best answer.Overview: What IsSoftware-Defined Storage?Software-defined storage is a relatively new categoryof software storage products. Although many considerSDS to be a natural evolution of virtualization andsoftware-defined networking, SDS is, put simply, avirtualization technique aimed at reducing the costsof managing growing data stores by decouplingstorage management software from its hardware toallow centralized management of cheaper commodity6

GEEK GUIDE CEPH: OPEN-SOURCE SDSSo, to qualify as SDS, a solution must runon generic, industry-standard hardware,without any proprietary hooks that ultimatelylead to limitations.hardware. Beyond this over-simplified definition arenuances that create big differences in the solution youare ultimately getting. So, it’s important to understandthose nuances.Gartner puts it clearly by stating that an SDS solutionwill use software to separate and abstract storagecapabilities that are pulled from industry-standard,commodity hardware, with the aim of delivering higherquality of service while reducing costs. IDC adds tothis idea of hardware agnosticism by defining SDS as“any storage software stack that can be installed oncommodity resources (x86 hardware, hypervisors, orcloud) and/or off-the-shelf computing hardware”. So,to qualify as SDS, a solution must run on generic,industry-standard hardware, without any proprietaryhooks that ultimately lead to limitations. Gartneragrees with IDC’s take on SDS by stating that SDSworks regardless of class of storage. Use of commodityhardware is key to the ROI benefits offered by true SDS.Now, let’s look at some of the other characteristics ofan SDS solution. Although SDS does provide for pooling7

GEEK GUIDE CEPH: OPEN-SOURCE SDSof storage, to be true SDS, the solution also has toprovide the following additional features:n Establishment of policies for managing data services aswell as storage.n Metadata tagging for managing data services and storage.n Dis-aggregation of data services and storage.n Automated management of storage.n UI that provides self-service.Additional features that can be part of SDS but arenot required:n Use of non-proprietary hardware including industry-standard hardware.n Enhances existing functions of specialized hardware.n Scales out storage.n Incremental build out of data services and storage solution.Based on this list, it’s easy to see that SDS can take anumber of different forms depending on your budget,requirements or other factors. However, the separation ofmanagement software and services from hardware creates8

GEEK GUIDE CEPH: OPEN-SOURCE SDSa solution that becomes scalable. Additionally, simplifyingthe indexing of unstructured data using object servicesbased on representational state transfer (REST) is also key,as are filesystems that improve data protection and easeof capacity optimization, and free interaction of dataservices to allow the separation of data and scalability.Benefits of SDSThe promise of SDS is that it will enable enterpriseIT to provide a more on-demand, scalable and agileexperience for business users with no single point offailure, while simplifying their storage management andreducing CAPEX and OPEX.Single Management Interface: SDS brings greatflexibility to the IT organization because it providesa single software interface to potentially all storagehardware, regardless of vendor. This means functionssuch as creating a volume, establishing RAID protection,implementing thin provisioning and tiering of data allcan be done through a single interface. IT administratorsdon’t need to be retrained on each storage system. Thisflexibility allows IT organizations to purchase storagesystems that are specific to a task without adding toinfrastructure management.Plenty of storage vendors make excellent storagehardware, but they have not invested in theaccompanying storage software. These vendors oftenare classified as T ier-2 vendors, but when coupledwith SDS, they can match T ier-1 hardware vendorsfeature for feature.9

GEEK GUIDE CEPH: OPEN-SOURCE SDSFIGURE 1. Basic SDS ArchitectureReduced CAPEX: One of the benefits offered bySDS is that it separates the purchase of managementsoftware from storage hardware, so you effectively spendless capital when you need to add or upgrade storagehardware, because you don’t need to worry about theadded cost of the management software that comes withproprietary solutions.With proprietary solutions, any time you need to upgradeyour storage hardware, you have to buy new software withthe hardware upgrade. This leads to increased trainingcosts and even poses potential procedural risks when thereare significant differences in the releases of managementsoftware. On the other hand, if the storage software isunchanged over generations of hardware, you’re paying forsomething you don’t need. In other words, you’re paying10

GEEK GUIDE CEPH: OPEN-SOURCE SDSfor something you already own!Scalable: As big data continues to grow, finding costeffective ways to gain value from all that informationwill be a critical deciding factor for companies that wantto remain viable in the future. Because you can usecommodity hardware to grow your data architecture asyour data grows, SDS provides a much more flexible andlower cost solution, no matter how massive your datastorage needs become.No Single Point of Failure: The no-single-point-offailure design principle asserts simply that no singlepart can stop the entire system from working. W ithtraditional dedicated storage solutions, a storage arraycan’t borrow capacity from another when demand forstorage increases, which leads to data bottlenecks and asingle point of failure.With SDS, however, that risk is avoided. Remember,SDS uses commodity storage devices and provides sharedstorage capabilities, such as mirroring and replication.SDS also eliminates the need for dedicated storage arraysand storage area networks. Because SDS distributes theworkload across multiple devices, if any single device ornode fails, it doesn’t bring down the entire system.I’ve defined SDS as a solution that separatesmanagement software from the commodity hardware itmanages, and I’ve described the benefits of moving toan SDS solution, including cost reduction, simplification,scalability and avoidance of a single point of failurethrough distributed workload. Now, let’s take a look atthe industry leader in SDS: Ceph.11

GEEK GUIDE CEPH: OPEN-SOURCE SDSCephCeph is an open-source, distributed object store andfilesystem originally designed by Sage Weil for hisdoctoral dissertation at UC, Santa Cruz. In 2012,Weil started Inktank Storage, which Red Hat acquiredin 2014. In 2015, to assist the Ceph community ofdevelopers in creating and promoting a unified visionfor open-source SDS technology, individuals fromorganizations including Canonical, CERN, Cisco, Fujitsu,Intel, Red Hat, SanDisk and SUSE formed the CephCommunity Advisory Board.Designed to deliver extraordinary performance,reliability and scalability, Ceph provides interfaces forobject, block and filesystem storage to store data on asingle, unified system. It provides unified distributionof storage without a single point of failure, is scalableto the exabyte level, and because it’s open source, it’savailable to anyone for free.Ceph deployment is fairly straightforward. You startby setting up your network, every machine or serverthat will be part of the environment as a Ceph Node,and the Ceph Storage Cluster that requires at least oneCeph Monitor (for added fault tolerance and reliability,Ceph supports clustering of monitors), which maintainsmaps of the cluster state. You also need at least twoCeph Object Storage Device (OSD) Daemons to storedata, which provide the Ceph Monitor with data andhandle replication, recovery balancing and backfilling.If you’re going to run any Ceph filesystem clients, youalso should plan on setting up the Ceph Metadata Server12

GEEK GUIDE CEPH: OPEN-SOURCE SDSPart of what makes Ceph work is that theDaemons and Clients have knowledge ofthe topology of the cluster.(MDS). MDS does just what it sounds like—it storesmetadata for the Ceph filesystem.Ceph then uses storage pools to store data. Itcalculates which placement group gets the data andwhich OSD should store the placement group usingsomething called the CRUSH (Controlled, Scalable,Decentralized Placement of Replicated Data) algorithm,which enables the Ceph Storage Cluster to scale,rebalance and recover dynamically as needed.Architecture: Let’s look at the basic components ofCeph a little more closely, starting with the storagecluster. The RADOS (Reliable Autonomic DistributedObject Store)-based storage cluster in Ceph is composedof a Monitor and at least two OSD Daemons. The CephMonitor maintains maps of the state of the cluster toensure high availability in case a Monitor Daemon fails,and the Daemons monitor their own state, as well aseach other’s state, and report back to the Monitor.Instead of having a central lookup table to reference,the Daemons and storage cluster Clients use CRUSHto compute information about data. CRUSH basicallydistributes the workload by managing the data objectsacross the Clients and Daemons in the cluster.13

GEEK GUIDE CEPH: OPEN-SOURCE SDSFIGURE 2. High-Level CEPH ArchitecturePart of what makes Ceph work is that the Daemons andClients have knowledge of the topology of the cluster. Thistopology is contained in the Cluster Map. The Cluster Mapis composed of five maps: Monitor Map, OSD Map, PG Map,CRUSH Map and the MDS Map. The Monitor Map containsthe location of each monitor. The OSD Map contains alist of OSDs and their status in addition to a list of pools,replica sizes and PG numbers. The PG Map contains detailson each placement group (PG). The CRUSH Map containsa list of storage devices, the failure domain hierarchy andhierarchy rules. The MDS Map contains the metadata pool14

GEEK GUIDE CEPH: OPEN-SOURCE SDSand a list of metadata servers and their status. Every mapcontains its current epoch, when it was created and when itlast changed. Whenever Ceph Clients want access to data,they first must obtain a copy of the Cluster Map from aCeph Monitor.As mentioned earlier, the OSD Daemons are awareof each other—something referred to as being “clusteraware”. Because of this, OSD Daemons can interact witheach other and Ceph Monitors, while Ceph Clients caninteract directly with OSD Daemons. This architecturaldesign feature allows OSD Daemons to use CPU and RAM ofthe cluster nodes to perform tasks at the exabyte scale thatnormally would cause bottlenecks. OSD Daemons now canservice Clients directly, which increases performance andsystem capacity at the same time; Clients no longer waiton a centralized server. It also means that Ceph Monitorsare lightweight processes, because someone else is alwayschecking on them. Data scrubbing also becomes morethorough because the OSD Daemons can compare objects inthe placement groups of other OSD Daemons. Finally, OSDDaemons relieve Ceph clients from the need to perform anydata replication, because OSD Daemons can replicate datato however many other OSDs exist.Hardware ConsiderationsCeph was designed to run on commodity hardware,and that makes building and maintaining massive dataclusters economically feasible. But, there are a few thingsto consider before you get started building your newcluster—things like whether to include failure domains,15

GEEK GUIDE CEPH: OPEN-SOURCE SDSpotential performance issues and matching hosts to OSDDaemons to gain the most efficiency (for instance, it’stypically a good idea to run an OSD Daemon on hardwareconfigured for that type of dæmon). A great source ofinformation when planning out your Ceph implementationis http://ceph.com or, at the time of this writing, the mostcurrent recommendations by SUSE at use enterprise storagearchitectural overview with recommendations guide.pdf.SUSE recommends understanding a few basicconsiderations before sizing your hardware, including:n Single and aggregate thread throughput expectations.n Minimum, maximum and average latency expectations.n Acceptable performance degradation during failure/rebuild events.n Read/write ratios.n Working data set size.n Access protocols or methods.With this data in hand, let’s explore a few basic, butmore current, hardware configuration recommendations forimplementing Ceph.Let’s begin with the trickiest, sizing the gateway services.Depending on what they are doing, they could co-locate16

GEEK GUIDE CEPH: OPEN-SOURCE SDSon a monitor node, or require dedicated nodes, dependingon cluster size. Ceph OSD Daemons need some processingpower as well, and you can really damage performanceduring normal and degraded operations. The best thinghere is to consult http://www.suse.com for the latestrecommendations. Because Ceph Monitors primarily areconcerned only with maintenance of the Cluster Map, theycan run on just a few, quick cores, maybe two @ 2.3GHzwith about 8GB of RAM.All this being said, keep in mind any other processesyou may have running on your hardware that couldcompete with Ceph processes. Make sure any other VMleaves enough resources for your Ceph processes, and runyour metadata servers and

Ceph is an open-source, distributed object store and filesystem originally designed by Sage Weil for his doctoral dissertation at UC, Santa Cruz. In 2012, Weil started Inktank Storage, which Red Hat acquired in 2014. In 2015, to assist the Ceph community of

Related Documents:

A Red Hat Ceph Storage cluster is the foundation for all Ceph deployments. After deploying a Red Hat Ceph Storage cluster, there are administrative operations for keeping a Red Hat Ceph Storage cluster healthy and performing optimally. The Red Hat Ceph Storage Administration Guide helps storage administrators to perform such tasks as:

Ceph Storage Cluster. Ceph's monitoring and self-repair features minimize administration overhead. You can configure a Ceph Storage Cluster on non-identical hardware from different manufacturers. Ceph Storage for Oracle Linux Release 3.0 is based on the Ceph Community Luminous release (v12.2.5).

Ceph Manager: The Ceph Manager maintains detailed information about placement groups, process metadata and host metadata in lieu of the Ceph Monitor— s ignificantly improving performance at scale. The Ceph Manager handles execution of many of the read-only Ceph CLI queries, such as placement group statistics.

Ceph and MySQL represent highly complementary technologies, providing: Strong synergies . MySQL, OpenStack, and Ceph are often chosen to work together. Ceph is the leading open source software-defined storage solution. MySQL is the leading open source rela-tional database management system (RDBMS). 1 Moreover, Ceph is the number-one block .

Currently, Ceph can be configured to use one of these storage backends freely. Due to Ceph’s popularity in the cloud computing environ-ment, several research efforts have been made to find optimal Ceph configurations under a given Ceph cluster setting [4], [5] or to tune its performance for fast storage like SSD (Solid-State Drive) [6].

Ceph orchestrator: OSD Once the hosts are added, we can list the available devices Then we can add the available devices with: ceph orch apply osd --all-available-devices or ceph orch daemon add osd : Or remove from the map and host with: ceph orch osd rm if the osd is marked as destroyed 88

Introduction and Ceph overview Ceph distributed architecture overview A Ceph storage cluster is built from large numbers of Ceph nodes for scalability, fault tolerance, and performance. Each node is based on commodity hardware and uses intelligent Ceph daemons that communicate with each other to: Store and retrieve data Replicate data

Type A02 : Cable suffices This type comprises people with some limited interest in electronic technologies but who have neither the education nor income to become heavily engaged in using them. Many of this type are men who have recently retired or who are approaching retirement. A high proportion has access to cable television. Type A03 : Technology as fantasy This type contains many old .