#55 - Storage for Hyperscalers

This week we talk to Mark Carlson, co-chair of the SNIA Technical Council, about the storage needs of hyperscalers. Mark defines hyperscalers as those companies opening multiple data centres a year, most notably Amazon, Microsoft Azure, Google and Facebook in the US and Baidu, Alibaba and Tencent in China. These vendors are deploying petabytes of storage a year, with specific requirements on storage media. The hyperscaler applications have issues with HDD and SSD performance characteristics, such as tail latency and the effects of garbage collection. As a result, drive manufacturers are building in new features to meet the needs of these companies.

What exactly are the issues? The conversation covers some of the typical problems, like reusing existing partially failed media, the problems caused by tail latency, write amplification and why non-deterministic latency is such an issue. Some vendor solutions include Depop, Fast Fail and Open Channel. Not sure what these are? Listen in to find out!

In the podcast, Mark mentions a white paper on hyperscaler storage. You can find it here (PDF). The SNIA YouTube channel can be found here. Mark mentions his presentation at Tech Field Day, which can be found here.

Elapsed Time: 00:27:54

Timeline

00:00:00 – Intro
00:01:30 – Who do we mean by hyper-scalers?
00:02:30 – ODM (Not OEM) – Original Design Manufacturers.
00:04:00 – Aren’t HPE and Dell selling to hyper-scalers?
00:05:30 – Why do hyper-scalers need different storage media?
00:07:00 – Differences of approach to resiliency compared to Enterprise.
00:09:00 – Don’t we want to have partial device failures?
00:10:20 – Depop – depopulate and reset drive to factory settings.
00:11:30 – Back to tail latency – extended response times.
00:12:30 – NVM Sets to logically partition a drive for each application.
00:13:40 – Fast Fail for HDD reads.
00:15:00 – putting drives into a deterministic window – putting maintenance off.
00:16:30 – Open Channel approach – let the host do the maintenance work.
00:17:30 – How are changes introduced? ECNs and Technical Proposals.
00:22:00 – Will hyper-scaler features filter down to the enterprise?
00:26:00 – Wrap Up

Mark’s Bio

Mark A. Carlson, Principal Engineer, Industry Standards at Toshiba, has more than 35 years of experience with Networking and Storage development and more than 20 years experience with Java technology. Mark was one of the authors of the CDMI Cloud Storage standard. He has spoken at numerous industry forums and events. He is the co-chair of the SNIA Cloud Storage and Object Drive technical working groups, and serves as co-chair of the SNIA Technical Council.

Podcast: Play in new window | Download