Cache-Tiering with Ceph via Benchmark - Starline Computer: Storage und Server Lösungen von erfahrenen Experten

How does cache tiering work under Ceph?

Ambedded experts show here how they improved the performance of an HDD pool via Ceph cache tiering via NVMe using a benchmark test series.

What is a cache tier under Ceph anyway?

Cache tiering allows fast storage to stand in for slower storage as a cache. This involves creating a pool of fast/expensive storage devices (such as NVMe SSDs) that are configured as cache tiers.

This is contrasted with the slower/cheaper backing pool of either erasure-coded or slower devices (such as HDDs) as the storage tier. The cache tier stores frequently accessed data from the backing tier and serves read and write requests from clients.

The cache tiering agent deletes or removes - policy based - objects from the cache tier.

Ceph-Cache-Tier-Demo

In the past, with SATA SSDs as cache storage, the performance improvement from its tiering was insignificant. Nowadays, the cost of NVMe SSDs has dropped sharply and the transfer performance of NVMe SSDs is much higher than that of HDDs. Therefore, it was necessary to find out whether NVMe SSDs as a cache tier significantly accelerate an HDD pool.

To test the effectiveness of an NVMe cache, a demo scenario was designed that could pick up the performance of an HDD-based storage pool.

Cluster Setup

Mars500 Cluster

NVME Hosts            3x Ambedded Mars500 Ceph Appliances
each equipped with
 
  
CPU 1x Ampere Altra Arm 64-Core 3.0 GHz
Memory 96 GiB DDR4
Network 2 Ports 25 Gbps Mellanox ConnectX-6
OSDs 8x Micron 7400, 960 GB

 

Mars400 Cluster

HDD Hosts            3x Ambedded Mars400 Ceph Appliances
each equipped with
 
CPU 8 Microserver mit Quad-Core Arm64 1,2 GHz
Memory 4 GiB pro Knoten, 32 GiB per appliance
Network 2x 2,5 Gbps per node, 2x 10 Gbps uplink via intern switch
OSDs 8x 6 TB Seagate Exos HDDs

Ceph Cluster information

  • 24 x OSD on NVMe SSD (3x Ambedded Mars500 appliances)
  • 24x OSD on HDD (3x Ambedded Mars400 appliances)
  • HDD and NVMe servers are located in separated CRUSH roots.

Test Celents

  • 2 x physical servers. 2x 25Gb network card
  • Each server runs 7x VMs.
  • Each VM has 4x core and 8 GB memory

 

Setup the Cache Tier by Ambedded UVS manager

 1. Create a base pool by using the HDD osd.

 2. Create an NVMe pool using the NVMe SSD osd.

 3. Add the NVMe pool as the cache tier of the HDD pool. 

 

Default cache tier configurations

i. Cache mode: writeback

ii. hit_set_count = 12

iii. hit_set_period = 14400 sec (4 h)

iv. target_max_byte = 2 TiB

v. target_max_objects = 1 million

vi. min_read_recency_for_promote & min_write_recency_for_promote = 2

vii. cache_target_dirty_ratio = 0.4

viii. cache_target_dirty_high_ratio = 0.6

ix. cache_target_full_ratio = 0.8

x. cache_min_flush_age = 600 sec

xi. cache_min_evict_age = 1800 sec

The HDD pool was tested before and after adding the cache level. Up to 14 clients were used to generate test loads. Each client mounted an RBD for the fio test. The test load started with one client and the number of clients was increased after the completion of each test run.

Each test cycle lasted five minutes and was automatically controlled by Jenkins. The performance of a test job was the sum of the results of all clients. Before cache tiering was tested, the experts wrote data to the RBDs until the cache tier pool was filled beyond the target fill rate (0.8).

2023-04-21 09_32_32-Window

Test results in detail

The diagrams show that the performance of the HDD-only pool improves significantly after adding an NVMe cache pool.

2023-04-21 09_34_56-Window
2023-04-21 09_35_15-Window
2023-04-21 09_35_30-Window
2023-04-21 09_35_49-Window
1/4

During the cache test, the engineers observed the pool statistics using the ceph osd pool stats command. Flushing, evicting and promoting activities took place in the cache and base pools. The pool statistics were recorded at different times during the cache test.

Data was written to the cache 
pool cache id 84 
 client io 21 MiB/s wr, 0 op/s rd, 5.49k op/s wr 

pool mars400_rbd id 86 
 nothing is going on

Cache was doing promote and evict 
pool cache id 84 
 client io 42 MiB/s wr, 0 op/s rd, 10.79k op/s wr 
 cache tier io 179 MiB/s evict, 17 op/s promote 

pool mars400_rbd id 86 
 client io 0 B/s rd, 1.4 MiB/s wr, 18 op/s rd, 358 op/s wr

Cache was flusing 
pool cache id 84 
 client io 3.2 GiB/s rd, 830 op/s rd, 0 op/s wr 
 cache tier io 238 MiB/s flush, 14 op/s promote, 1 PGs flushing 

pool mars400_rbd id 86 
 client io 126 MiB/s rd, 232 MiB/s wr, 44 op/s rd, 57 op/s wr

PG was evicting 
pool cache id 84 
 client io 2.6 GiB/s rd, 0 B/s wr, 663 op/s rd, 0 op/s wr 
 cache tier io 340 MiB/s flush, 2.7 MiB/s evict, 21 op/s promote, 1 PGs evicting (full) 

pool mars400_rbd id 86 
 client io 768 MiB/s rd, 344 MiB/s wr, 212 op/s rd, 86 op/s wr

PG Flushing and client IO direct to base pool. (clients were writing data) 
pool cache id 84 
 client io 0 B/s wr, 0 op/s rd, 1 op/s wr 
 cache tier io 515 MiB/s flush, 7.7 MiB/s evict, 1 PGs flushin 

pool mars400_rbd id 86 
 client io 613 MiB/s wr, 0 op/s rd, 153 op/s wr

Reflections

After the endurance test, the experts let the cluster rest for a few hours and repeated the 4 kB random write test. They got much better performance afterwards. This was because the cache space was released for the new write.

After this test, they were also confident that using the NVMe pool as the cache tier of an HDD pool could provide a significant performance improvement.

It should be noted, however, that the performance of cache tiering cannot be guaranteed. Performance depends on the current cache hits, and the same performance cannot be achieved by repeating tests with the same configuration and workloads.

Conclusion: If an application requires consistent performance, it should have a pure NVMe SSD pool.

This article was first published by Ambedded

Show more
KB
Konrad Beyer
Technical Support

Our technical manager has a comprehensive knowledge of all storage and server topics.