CEPH

CEPH

In computing, Ceph (pronounced /ˈsɛf/ or /ˈkɛf/) is a free-software storage platform, implements object storage on a single distributed computer cluster, and provides interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available. [Wikipedia](https://en.wikipedia.org/wiki/Ceph_(software))

CEPH :: Videos

CEPH :: Philosophy

  • Open Source

  • Community Focused

CEPH :: Features

  • Software Defined Storage Solution

  • Distributed object storage

  • Redundancy

    • Replication

    • Erasure

    • Cache Tiering

  • Efficient scale out

  • Build on commodity hardware

  • Most popular choice of distributed storage for OpenStack: Nova (VM virtual disks), Glance (images), Cinder (block storage), RadosGW

  • Copy-On-Write (Glance image to Nova/Cinder)

CEPH :: Storage Cluster

  • Self healing

  • Self managed

  • No bottlenecks

CEPH :: 3 Interfaces

  • Object Access (like Amazon S3)

  • Block Access

  • Distributed File System (cephfs)

CEPH :: Architecture

  • RADOS (Reliable Autonomic Distributed Object Store) .. Documentation Ceph Storage Cluster

    • radosgw (object storage) .. Documentation Ceph Object Storage

      • RESTful Interface

      • S3 and Swift APIs

    • rbd (block device) .. DocumentationCeph Block Device

    • CephFS (File System) .. Documentation Ceph Filesystem

  • Head Node (Controller)

    • SQL Database

    • CMDaemon 1

      • Cluster Management GUI (JSON + SSL)

      • Cluster Management Shell

      • Web Based User Portal

    • Third Party Applications

    • Node-005 (Nova App)

    • Node-004 (CEPH OSD)

    • Node-003 (CEPH OSD)

    • Node-002 (Nova Compute)

    • Node-002 (Nova Compute)

CEPH :: Components

  • Server CEPH OSDs Node

    • Type

      • Fat Node

        • Many Cores / Sockets 20+ HDDs, 1+ Journal SSDs

      • Thin Node

        • Faster recovery

        • 1 Socket is enough

    • Physical Disk

    • SSDs Journals Fast Vs HDD Slow

    • File System (btrfs, xfs)

    • One Object Storage Daemon

      • OSD serve object storage to clients

      • Peer to perform replication and recovery

  • Server CEPH Monitor

    • Store Cluster Map (at least 3)

    • Brain of the Cluster

    • Do not server stored objects to clients

  • Server CEPH Metadata (for CephFS)

CEPH :: Conceptual Components

  • Pool

    • Logical container for storage objects

    • Parameters

      • Name, ID

      • Replicas

      • CRUSH rules

    • Operations

      • Create / Read / Write Objects

  • Placement Groups (PGs)

    • Balance data across OSD

    • 1 PG spans several OSD

    • 1 OSD serves many PGs

    • Tunable (50-100 per OSD)

  • Control Replication Under Scalable Hashing

    • Monitors mantain CRUSH map

    • Clients understand CRUSH

CEPH :: Playground

  • Standalone Storage System

  • Back End for OpenStack Block Storage

CEPH :: Implementation

Considerations

  • Network determines the number of SSDs

  • Number of SSDs determine number of HDDs

  • Number of HDDs determine number of CPU core count

  • Size count of HDDs determines the amount of memory needed

  • Network

    • Single Fabric

      • Single Switch, VLANs

      • Problems: One broadcast domain, bandwidth

    • Multiple Fabric

      • Fabric for VLAN/VXLAN

      • CEPH Access (ceph-public)

      • CEPH Cluster (ceph-cluster)

    • NICs

      • 1 GigE, 10 GigE

    • MTUs

      • 1500 Vs 9000

  • Disks

    • SSD Journals

    • Amount of data to write before failure

    • 1 GigE good for SATA SSD

    • 10 GigE good for PCIe SSD

  • Hard Disk

  • Processor

    • 1 Socket but how many cores

      • Depends on SSD and networking

        • 1 CPU core per Daemon Disk

        • 1 SATA SSD Journal per ~4-6 HDD

        • 1 PCIe SSD Journal per ~6-20 HDD

        • Example. 2 SATA SSDs could handle 12 OSDs which would require 12 core CPU

    • Hyper Threading Cores Vs Physical Cores

      • HT enabled

  • Memory

    • 0.5 GB - 1 GB per TB per Daemon

    • More is better (Linux VFS caching)

    • OSD node with 4 x 2 TB Disks (4 Daemons) -> 8 GB of RAM

    • OSD node with 16 x 2 TB Disks (16 Daemons) -> 32 GB of RAM

CEPH :: Integration

Last updated