#105 – Introduction to VAST Data (Part I) with Howard Marks (Sponsored)

#105 – Introduction to VAST Data (Part I) with Howard Marks (Sponsored)

Chris EvansAll-Flash, Guest Speakers, Sponsored, Storage Hardware, Storage Unpacked Podcast

This week, Chris and Martin talk to Howard Marks, Chief Storyteller at VAST Data.  You may know Howard as an independent analyst and author for a range of online publications.  Howard recently joined VAST to help explain and promote understanding of their data platform architecture.

The VAST Data platform uses three main technologies that have only recently emerged onto the market.  QLC NAND flash provides long-term, cheap and fast permanent storage.  3D-XPoint (branded as Intel Optane) is used to store metadata and new data before it is committed to flash.  NVMe over Fabrics provides the connectivity between stateless VAST front-end servers and JBOF disk shelves.

The architecture has some very subtle differentiated points that allow the solution to be scale-out and highly efficient.  The server components are stateless because metadata isn’t cached locally. That removes issues of cache coherency and keeping all metadata synchronised.  3D-XPoint allows data to be written as huge stripes with as little as 3% overhead on large systems.

If you want to learn more about the VAST platform, check out https://www.vastdata.com, read our blog on the VAST technology or visit the Tech Field Day website where you’ll find more in-depth videos from the founders of the company.

Elapsed Time: 00:30:49

Timeline

  • 00:00:00 – Intros
  • 00:02:00 – Who is Howard Marks?
  • 00:02:30 – Who are VAST Data?
  • 00:04:00 – Do we need hyper performance or good enough?
  • 00:07:30 – Three technologies – QLC NAND & Optane
  • 00:09:30 – Intelligent JBOFs
  • 00:11:50 – Take a breather Howard!  Let’s review!
  • 00:13:30 – Server components are stateless containers
  • 00:14:50 – Shared nothing? No DASE – Shared Everything
  • 00:17:40 – Why does Persistent Memory allow scale-out?
  • 00:19:34 – Wide stripe optimisation with erasure coding
  • 00:21:00 – Optimised deduplication with similarity hashing
  • 00:24:00 – Wide stripes with sequential I/O improves endurance for flash
  • 00:26:00 – It’s a log-based file system (not WAFL)
  • 00:29:00 – 10 year guarantee on QLC drives

Transcript

[bg_collapse view=”button-orange” color=”#4a4949″ expand_text=”Show More” collapse_text=”Show Less” ]

Announcer:                  This is Storage Unpacked. Subscribe at storageunpacked.com.

Chris Evans:                  In this episode of Storage Unpacked, Martin and I talk to Howard Marks from VAST Data. Yes, it’s the Howard we all know and love from his independent days, now Chief Storyteller with VAST. This episode ran for about an hour, so we’ve split it into two parts. You’re listening to the first 30 minutes, where we cover the background the technology, hardware, and data protection.

Chris Evans:                  Hi, this is Chris Evans recording another Storage Unpacked podcast this week with Martin. Hi, Martin.

Martin G:                      Hey, Chris, how are you doing?

Chris Evans:                  I’m pretty good, thanks. Just getting back into work after a little bit of a break. How about you, what you up to?

Martin G:                      Well, I had last week off for half term and it looks like summer has arrived.

Chris Evans:                  Yes, it certainly does. The weather’s certainly picked up, which is a great thing. Another great thing happens to be the fact that we have a guest this week.

Martin G:                      Not Donald.

Chris Evans:                  No. Oh, do you know what? Yeah, he’s busy, I think. I think he’s going to meet the Queen or somebody. Somebody possibly not quite as important, but he’s going to meet somebody else anyway. So no, we don’t have Donald with us.

Martin G:                      We have somebody better, though.

Chris Evans:                  Instead, we do have an American.

Howard Marks:             I am an American, that’s for sure.

Chris Evans:                  Absolutely. We’re joined by Howard Marks who is now from VAST Data. Hi, Howard.

Howard Marks:             Hello Gentlemen, how are you?

Martin G:                      We’re good.

Chris Evans:                  Pretty good. So Howard, last time we spoke, you weren’t working for VAST Data. In fact, you were independently working for yourself as deepstorage.net or Deep Storage Net, depending on which profile we used to look you up. So what happened?

Howard Marks:             I decided that I was looking to join a team. The VAST guys came around and had a great story to tell. So I joined up as the storyteller.

Chris Evans:                  Fantastic. Now, VAST Data, as we’ll get to in a second, is a startup. So you’ve joined them as, as you said, storyteller. It’s obviously a storage company, that’s your background. Not sure whether there’s anybody on this podcast who possibly doesn’t know who you are, but how about that 10 second little background about who you are just in case people don’t know?

Howard Marks:             Sure. I started in the industry writing device drivers for hard drives 40 years ago, but spent most of my career as an independent consultant, journalist, analyst. So you’ve seen me at PC Magazine, and Network Computing, and Network World, pontificating about the storage world, and I decided to get back in it and to actually make some stuff again.

Chris Evans:                  So let’s talk about VAST Data then, Howard. As a startup, it’s a new company. What is the company about at a very high level? And what are they selling?

Howard Marks:             Well, the beginning of the company is, our founders were involved from the very beginning in some of the all-flash arrays that people are using now. And when they went off to think about what they were going to do next, having already built all-flash arrays, they talked to the customers and discovered that very few people who have an all-flash array are looking for something faster.

Howard Marks:             What people who have all-flash arrays want is an all-flash array they can afford to put the applications they can’t afford to put on their current all-flash array on. So VAST is about building all-flash systems that are affordable, like spinning disc systems. And as you might guess from the name, part of that affordability comes from scale. So we build systems that are vast and are scale out and go from petabytes to exabytes.

Chris Evans:                  So that’s an interesting background that you quote there from the founders of the company, in the sense that they went out and talked to people who already had platforms. Martin, you and I on this podcast many times have talked about the fact that we end up with a number of iterations that come along all the time, where we see the vendors that have traditionally raised producing technology that was going at one speed, but then we saw a real speed up of technology where people tried to take software out of the IO path and really get down to very, very low latencies. As Howard says, that’s clearly not a market that everybody’s going to be able to put every single solution on because it would just be too expensive to do it.

Martin G:                      Yeah, we’ve talked about this a lot, haven’t we? That we strive to supplement certain latencies. This isn’t necessary for a lot of people. With people moving their processing off into the public cloud, for example, these people obviously don’t need incredibly high performance storage. So it’s a small product, and we could go back to the days when we used to laugh about Violin. We used to say that performance wasn’t a product and I think this is what maybe VAST is beginning to address, but it’s not all about performance.

Chris Evans:                  Yes, it’s the general applicability thing isn’t it Howard?

Howard Marks:             Well, it is about performance, but it’s about sufficient performance, not the absolute best performance. Once you have sub millisecond latency, the difference between 600 microseconds of latency and 300 microseconds of latency is very small if your application is busy doing other things. So I wouldn’t go to performance doesn’t matter, but once you’ve reached that sub millisecond latency, then it’s much more important to be affordable and available and scalable than it is to be slightly faster.

Howard Marks:             Now, don’t get me wrong, there’s always a market for the next fastest thing. Early on in my career, I built systems using head protract disks. So faster is better, but how much you’re willing to pay for faster becomes an issue and where VAST comes around, we’re selling faster, but we’re not selling faster for your oracle database that’s already on some all flash array. We’re selling faster for all the applications and all the data you’ve tiered off to spinning disk, because you may not be able to get full use of it from the spinning disc. You know, things like analytics require a certain amount of backend performance and if you spool your archive off to an object store and then you want to do a full text search because you’ve got a subpoena, you may have to bring it all back from the object store to search it. If you had one storage system fast enough for your critical applications, scalable enough to go to exabytes, then all your data would be where it was accessible for that kind of deep learning.

Chris Evans:                  Okay. So we I guess framed that quite nicely because what we’re talking about is a platform that’s bringing a much more accessible use of certain technologies, which I’m about to talk about in a second, to the marketplace and it allows you to in general deliver a higher level of performance to a wider range of your data than you possibly would be doing today.

Howard Marks:             Yeah, well I mean our goal is to kill off tiering. If we can be as cost effective as what people are now using for their low performance tier, why have a low performance tier?

Chris Evans:                  Killing off teary might be initially I think a challenge in the sense that there’ll always be the stuff at the high end, but I would agree with you that consolidation at the lower end is without a doubt a very sensible, reasonable goal to be heading towards.

Howard Marks:             And there is some small market for the MVME over fabrics, 120 microsecond high frequency performance trading, we can monetize that and we don’t play in that market and that’s fine.

Chris Evans:                  We’re pretty clear where your technology is intending to sit. Now let’s talk about the actual componentry and [inaudible 00:07:47] if that’s a real word, the components that have allowed you to deliver that. And I think looking at the way that the market’s gone over the last say 18 months, two years, we’ve seen two technologies that have really come in that have allowed you to build a solution that can really address this.

Chris Evans:                  And the first one, which is something we’ve talked about a lot on this podcast, and I’ve written a lot about is QLC flash. We’ll come back to that in a second. And the second one is the use of what some people like to call persistent memory and others like to call storage class memory, but what we basically mean there is byte addressable, persistent storage that potentially is sitting on the memory bus within a server. And I think without those two technologies we would struggle to get where we are today. But those really form the component parts that build on your architecture. Is that fair to say?

Howard Marks:             Close. We use Optane SSDs, so it’s not really persistent memory because it’s not really byte addressable. We treat it as if it was byte addressable, but we’re not using the Optane dims. We’re using the PCIE SSDs.

Chris Evans:                  Okay. We can think of it in terms of the the octane side, but not necessarily the connectivity. That’s an NV dim style format.

Howard Marks:             Right. Our connectivity is NVME over fabrics. So the way we like to say it, there’s three technologies that make our architecture possible. And when our founders were designing the system, all three were announced but not shipping yet. And so you weren’t 100% sure that they would be ready when the product was ready. So we made a bet. And the first of those is QLC flash, just to bring the cost of flash down. The second is 3D cross point, so that we can have a persistent shared store that’s accessible to all of our VAST servers, the software that acts as a controller and more in our architecture. So we have an HA enclosure that’s connected via NVME over fabrics to a series of VAST servers. And the VAST server is a piece of software. It runs in a docker container. Most of our customers so far have bought it from us as an appliance, but because it’s a docker container, we’re flexible about that.

Howard Marks:             The VAST server acts as the protocol endpoint. So when you access a VAST cluster via NFS or S3, or by the end of this year,SMB, you’re talking to a VAST server and it’s a ScaleOut system. So you’re going to have as many VAST servers as you need to provide the performance that you’re looking for. The VAST servers talk via NVME over fabrics, over hundred megabit per second ethernet or Infiniband to the HA enclosure. The HA enclosure includes both Optane SSD, so 3D cross point, and QLC flash.

Howard Marks:             So the Optane SSDs are where we store the metadata. And so those vast servers that are the front end that are talking to the users are stateless. They don’t store metadata in dram, they don’t stir metadata in NV dims with backups, capacitors, all the state of the system is immediately written to the 3D cross point. So if a VAST server dies, another VAST server assumes its virtual IP address. Your NFS or S3 or SMB client does a retry and reconnects and you pick up right where you left off. And we’re saved all the difficulty of keeping cash coherent between an arbitrary number of servers acting as protocol and management nodes. Every time a server wants to know something, there’s a single source of truth and that’s the metadata in the Optane that’s shared by all of these servers.

Chris Evans:                  Okay. You’ve dived straight into the depths of the technology there Howard, which is great, but I’m going to step you back a little bit just so we can make sure people can build a visual picture of exactly what we’ve just been talking about. So from a physical layer in terms of storage, you’ve got HA servers, which are holding both QLC and the persistent memory, the Optane.

Howard Marks:             Server’s a little bit strong of a term. It’s really just a JBOF.

Chris Evans:                  Right. Okay.

Howard Marks:             So the VAST servers, the ones that run our software, mount every SSD in every enclosure in the cluster when they power up and they talk directly down to those SSDs.

Chris Evans:                  Right. So we can even think of them as literally just shelves of devices. Nothing more. No intelligence, no CPU, no processor involved.

Howard Marks:             There are CPUs, but they run minimal code.

Chris Evans:                  Right. Okay. So let’s think of it in terms of that. Just a bunch of flash devices sitting on the network. So there must be, I guess some sort of backup plan that allows those devices to be exposed to the network that you’ve got in place.

Howard Marks:             Yeah. Well the CPUs in that box really just connect between the Infiniband and the Ethernet and the SSDs.

Chris Evans:                  And the SSDs. Yeah, so they do that sort of that interface piece, but after that, effectively all of those devices are just exposed on the internal storage network, I’ll call it that for now because I would imagine a deployment will probably keep that separate, but the storage network that consists of all of those devices and it could be one of those boxes, it could be a hundred of those boxes.

Howard Marks:             Yes.

Chris Evans:                  So I can scale up my storage physically to as much capacity as I like. But then in terms of how I access it, there’s effectively a second layer, almost like a bit of a client server type architecture, where my host like a server, it’s actually accessing that storage directly because actually you can see all of the end of the array devices on the entire network. Now that in itself is I think quite an unusual design in that sense to make every device available.

Martin G:                      Chris, I would tend to disagree with you it’s an unusual design. I actually think it’s a evolutionary design. The way I can visualize this now and how he has described it, and he’d probably hit me for this. It feels like what should have happened when NetApp tried to build their scale out system. So you’ve asked servers, [inaudible 00:13:57] your heads. They’re stateless heads, but they’re still like heads. The vast enclosure is your shelves and the NVME switches in there are the internal fabric to what would be a traditional storage array. It feels like an evolution. It feels like a clever evolution, but from my point of view, it feels quite evolutionary.

Howard Marks:             Well, it’s revolutionary because that back end fabric didn’t exist before NVME over fabrics really. I mean, there were SASS switches, but I never, I don’t remember ever seeing them in a storage system where eight controllers could connect to 22 shelves directly. SASS was almost always done in loops and so one drive in a SASS shelf could only be talked to by two controllers because it had two ports.

Martin G:                      Yeah. This is the thing, so you’re moving away from a dual head, but actually if you were to draw the architecture, in some ways it would feel very familiar to somebody.

Howard Marks:             So most scale-out systems today are shared nothing. That each node has its own media and CPU, and they coordinate amongst themselves to build a storage system. Our model is shared everything, all the media in the system is shared by all the VAST servers. And so no one has to be responsible for managing any particular device, that can all be parallelised and shared.

Chris Evans:                  I think it’s quite an interesting analogy to try and look at the two, and you just highlighted the other shared, nothing shared everything. It might be that in some scenarios you might think shared everything was bad because you might have dependencies on devices that are connected to each other. But in terms of the way that your architecture is built, that shared everything doesn’t actually create any dependencies in that sense.

Howard Marks:             No, because the MVME over fabrics fabric essentially makes it a mesh. That all of the media is equally accessible to all of the VAST servers.

Chris Evans:                  Whereas Martin, if you look at the architecture you were thinking of in certain respects, some of that dependency might be that some [inaudible 00:15:53] might be closely coupled with maybe at least one controller and as a result there would be some dependencies existing in the architecture somewhere. This effectively completely obstructs that and takes it away. So I guess in terms of what you mean by evolution, I can see how that isn’t a an evolution of the way things have gone.

Martin G:                      Okay. So from my point of view, it looks very similar to how I would design and build a GPFS cluster. Take it back to what I do in a day job. So if I’m building a GPFS cluster where I have a number of scaled out front end nodes. I know it’s not identical Howard, so don’t get too upset. But then I’d have a fiber channel mesh, so all the servers will be talking to, could be talking to all the desk controllers and that’s how we build it. And then we’d have protocol notes coming out the front, which would be per part per servers. So from my point of view, this does look very familiar. It feels architecturally familiar, comfortable. It may not be so familiar for people who haven’t dealt with parallel file systems and how he built these, talk about it being revolutionary apart from the NVME switches. It doesn’t feel revolutionary, which isn’t a bad thing. It means that immediately, from an architectural point of view, I’m very comfortable with it.

Howard Marks:             And there are disadvantages to being too revolutionary, being something that customers understand in a 10 minute conversation, not a three hour conversation is a good thing.

Chris Evans:                  Yeah, definitely. And I think that’s why it was worth just re-emphasizing some of those hardware components. Because immediately from my memory, I got one bit wrong, didn’t I? The fact that you’re not using SCM in an NV dim format, you’re using an SSD format because you need to be able to present those devices and the fabric just like you do with every other device.

Howard Marks:             Because they have to be shared, because we share everything.

Chris Evans:                  Yeah, absolutely. It just shows you, even I’m forgetting the detail of what I’ve seen in the past as part of the discussion, so you know exactly how you put these things together, it does help to go into that next level of detail.

Howard Marks:             Right.

Chris Evans:                  But I’d like to just touch quickly on the system memory siz,e again because if we were doing what Martin was just saying, that we were building another type of design, we potentially would build out maybe controllers or some sort of control aspect within the architecture that managed the IO at the front end, that would somehow hold some metadata that described exactly what was going on in terms of IO progress as well as the way that stuff was laid out on disk. And one of the classic issues with any scale out architecture or a even just a dual controller architecture, is somehow you have to keep that [inaudible 00:18:06] coherent. Now in your architecture, you’re not having to do that because the server component is stateless, isn’t it?

Howard Marks:             Right. So new data comes into a VAST server. That VAST server writes that data to multiple freely cross-media SSDs in one or more enclosures. And once that right transaction is complete, that data, that state, that data is coherent, it’s stored and it’s available to all the other VAST servers immediately. So we don’t have all that east west coherence management traffic.

Chris Evans:                  That’s the bit. I think I was just the next stage. I think that is important in understanding how this could scale, because without that you can’t scale the front end server component. Whether that’s deployed as a piece of software or even if it’s deployed as an appliance, it would be very hard to scale that across, let’s say tens or thousands of clients. Unless you would have somehow obstructed that away, it just wouldn’t be possible.

Howard Marks:             Exactly.

Chris Evans:                  Okay, so we’ve got no idea of the way that you’ve basically I guess desegregated everything and hence the reason the architecture is [inaudible 00:19:10].

Howard Marks:             Dis-aggregated, shared everything.

Chris Evans:                  Which is exactly what we’ve already just said. Once we’ve done that, we’ve got an idea of physically how we’re laying the data out and how we’re using both SCM and then QLC storage. You didn’t really sort of finish the discussion as to exactly how the two pieces of the persistent storage are being used. You said everything gets written into the octane layer, but then how does that get passed on to the QLC?

Howard Marks:             Ah, here’s the next big secret, right? So new data comes in and it gets written to the 3D cross point by the VAST server that’s processing that transaction. Asynchronously, a VAST server collects up data from the cross point and destages it to the QLC flash. And we do a couple of extra special things on the way. So QLC is good because it’s less expensive, get four bits per cell and the flash renders charges less for flash so we can be more cost effective. The problem with QLC is that the penalty for squeezing that extra bit in there is that you can overwrite that flash fewer times before it wears out. So we do lots of things in what we call a global flash translation layer, to treat that QLC flash gently.

Howard Marks:             So when a VAST server decides it’s time to de stage data from cross point to Flash, first we reduce the data and we use a slightly reduction method than everybody else. We use what we call global compression. Traditional data reduction uses two techniques, local compression where you take a chunk of data and you compress it, and deduplication where you identify two chunks of data that are identical and you only store one. The problem with that combination is if there are changes that are small but still make the block different, that block still gets saved. So a one bite difference on a system that’s doing deduplication on 64k blocks means you’re storing 64k, okay reduced, 40K of data. What we do is like deduplication. We break data up into chunks and then we use what we call a similarity hash. So we hash the chunk not with a strong collision resistant hash like [inaudible 00:21:48] but with a hash that generates the same hash value for chunks that contain similar data.

Howard Marks:             And then we take multiple chunks that compress, excuse me, that hash to that same similarity value and we compress them together. If you understand how data compression works, two chunks that generate the same similarity hash are going to compress with the same compression dictionary. So the Delta, the difference between the first chunk we saw that generates that hash that we call the reference chunk, and a new chunk that generates that hash is typically very small. And then we store those Delta Chunks.

Chris Evans:                  That’s a very different method than traditional data compression and deduplication [inaudible 00:22:36]. Very different. Now is that done to save on IO cycles? Because obviously all the work’s being done by a server and that could be on the client, or is it done to save space or is it done for combination of both?

Howard Marks:             It’s done for combination of both. First of all, it gives us tighter, it gives us better reduction. And in fact, we’re just introducing a guarantee program where we guarantee to reduce data better than anybody else.

Chris Evans:                  Hurrah. We had a program I’m not a couple of weeks ago, so you could come top to the list in terms of people who were recommending the guarantee program.

Martin G:                      I like that. Better than anybody else ,because it means it takes some things outward, quite sure where you get data, which isn’t very compressible, so it actually becomes a bit more realistic.

Howard Marks:             The only fine print is that the data not being encrypted.

Chris Evans:                  I don’t think anybody would allow you to do that with their data, to be honest.

Howard Marks:             Encrypted data is very difficult to reduce, so nobody’s going to reduce it well. You know, if your data isn’t encrypted and you can demonstrate to us that somebody else’s storage system compresses your data better, then we’ll make good and make sure that you get the benefit of that better reduction. We don’t think it’s going to happen. So first of all, it gives us better reduction.

Howard Marks:             Second of all, all of this data reduction happens before the data is written to Flash, so the VAST server reads the data from the right buffer that’s in 3D cross point and compresses it and writes it back to 3D cross point temporarily, until it’s built a whole stripe of data and then it writes that whole stripe of data down to the QLC sequentially. If you look at how QLC SSDs work, one vendor published on their spec sheet, the endurance of this drive, if you did 4k random writes and various mixtures of reads of random and sequential writes till they got to 128k sequential writes. Writing large sequential blocks to that SSD, that vendor says you get 16 times the endurance than if you wrote 4k block.

Chris Evans:                  And I don’t know whether you’re referring to micron in that particular instance, but I’ve seen some data that micron have published quite freely, that shows how, depending on how you write to their QLC [inaudible 00:24:52], you’ll get much higher resilience, and they actually graph that up for you to show you that high block sequential writes are actually much more beneficial to their platform and therefore give you higher endurance.

Howard Marks:             So we write very large stripes, substantially bigger than 128k. So we write large, substantial stripes. We get better endurance from the QLC SSD right there. Second, we write these stripes that are as big as the SSDs behavior tells us it erases things in. So today’s QLC flash chips have erase blocks that are many megabytes in size. So if you send data smaller than that erase block and the amplification that’s caused by the SSD controller, the SSD eventually has to do garbage collection.

Howard Marks:             We write and manage all of our data in huge stripes, that mean that the SSD doesn’t have to do garbage collection more often than our log structured element store file system has to do garbage collection.

Martin G:                      So, Howard.

Howard Marks:             Yes.

Martin G:                      Howard, if I can interrupt, this writing things sequentially, [inaudible 00:26:06], have you not just reinvented Waffle?

Howard Marks:             Yes, it’s a log based file system, but Waffle’s based on 4k chunks.

Martin G:                      Yeah, you’ve invented Super Waffle. We’re going to call it super waffle. That’s what we’re going to call it now Howard.

Howard Marks:             I cannot endorse that name.

Chris Evans:                  They’ve probably got a trademark on it anyway.

Howard Marks:             So you know, compared to a typical log structured file system, there’s a couple of differences. The first is that our units are just much larger and the second is that all of our metadata is byte oriented, not block oriented. So in a file system like Waffle, a file extent has a pointer to a 4k block. In our file system, a file extent has a pointer to an LBA in an SSD to a particular byte and then a length, so that we pack data tighter together and we don’t have the problem with compressed data, that you compressed 4k down to 1700 bytes, but you only have 4k blocks so you have to store that 1700 bytes in 4k anyway.

Martin G:                      Yeah, so super dense Waffle. There you go. Super heavy, super dense Waffle with extra cream. Sounds great.

Howard Marks:             Well, I like the extra cream part. So you know, we write very carefully. We understand how the SSDs work, you know we wear level across the entire system. So if you have a system with hundreds of SSDs, one application that’s doing a huge amount of writes isn’t going to wear out the SSDs that are in that rate set, because that write data is going to be spread across the whole pool. And then we use a new version of erasure coding called locally decodable codes, that let us write very wide stripes with very high efficiency. On our smallest system we use plus four erasure coding, so we can afford to have four device failures before data loss, but our overhead is still 10%. And as the system grows, the width of the stripe grows. A typical VAST system, we do 150 plus four erasure coding, and that means that our overhead is down under 3%.

Chris Evans:                  So it’s fair to say that the intention, like we said at the very beginning of this discussion ,is the reason the company is called VAST. So the intention here is that this isn’t going to be a system who for somebody who wants to put in something for 100 or 200 or 300 terabytes, we’re talking about tens or hundreds of petabyte deployment solutions and it’s designed specifically for that.

Howard Marks:             At a minimum, we start at a couple of hundred terabytes, but most customer … Today we’re selling into the data intensive applications, media and entertainment, hedge funds, oil and gas, those kinds of industries where they have a lot of data. So we’re selling systems that are multiple petabytes.

Chris Evans:                  Right. Just to summarize then, you’re using a different style of data protection across all of the media, which is a form of erasure coding. You’re writing the data to SCM or at least to obtain before you then stripe it and then write it to the QLC to minimize the write amplification, which saves a huge amount of wear on the actual QLC itself, which obviously allows you to produce, or at least that allows you to have a product that has a lot longer life time than it would doing it any other way. So what are you offering in terms of a guarantee on that QLC technology?

Howard Marks:             10 years.

Chris Evans:                  10 years.

Howard Marks:             If you pay your maintenance for 10 years, we’ll replace your SSDs for 10 years.

Chris Evans:                  That’s a nice offer That’s a nice guarantee, and that it’s guaranteed does it?

Howard Marks:             It is. And we missed one more thing about the erasure codes. When I say erasure codes, I’m sure you guys, because I know you well, think Reed Solomon and if you wrote … And the math in Reed Solomon would allow 150 plus four coding. But when an SSD failed in a Reed Solomon stripe of 150 plus four, to rebuild, you’d have to read 149 data stripes and the parody. Our local erasable codes allow us to rebuild with only a quarter of the data stripes. So the rebuild has a lot lower impact on the system, so we can make our stripes wider and therefore be more efficient.

Announcer:                    You’ve been listening to Storage, Uncut. For show notes and more subscribe at storageunpacked.com, follow us on Twitter at Storage Unpacked or join our linkedin group by searching for Storage Unpacked Podcast. You can find us on all good pod catches, including Apple Podcasts, Google Podcasts, and Spotify. Thanks for listening.

[/bg_collapse]

Related Podcasts & Blogs

Howard’s Bio

Howard Marks is VAST Data’s Technologist Extraordinary and Plenipotentiary helping customers realize the advantages of Universal Storage. Before joining VAST Howard spent 40 years as an independent consultant and storage industry analyst at DeepStorage. He is a frequent, and highly rated speaker at industry events and Storage Field Day delegate.


Copyright (c) 2016-2019 Storage Unpacked.  No reproduction or re-use without permission. Podcast Episode FC04.