In this episode, returning guest Douglas Fallstrom from Hammerspace takes Chris and Martin through the details of the Hammerspace global file system platform. The solution is software-defined, running either on virtual machines or in the public cloud. Customers can choose to leave data on existing hardware platforms and simply abstract the data into the Hammerspace platform, or use block storage to build out a distributed file system.
The ability to abstract the physical placement of data from metadata allows customers to choose exactly how to manage their content. Traditional storage platforms have implemented data protection or other management features at the system or hardware level. With Hammerspace, this can now be applied directly to individual files through policies.
Extensible and rich metadata also allows workflow to be applied to data in ways that couldn’t achieved before. Douglas highlights some examples in the recording, including customers with over 100 billion files.
You can find more information on the Hammerspace platform at https://hammerspace.com/ or watch the Tech Field Day recordings at https://techfieldday.com/companies/hammerspace/. Follow Hammerspace on Twitter at https://twitter.com/Hammerspace_Inc and Douglas at https://twitter.com/dfsweden.
Elapsed Time: 00:42:20
Timeline
- 00:00:00 – Intros
- 00:02:45 – Hammerspace is an SDS solution, VMs or the cloud
- 00:04:10 – Hammerspace can use existing data in place or block storage
- 00:06:25 – How is existing data put under Hammerspace management?
- 00:09:30 – Hammerspace supports multi-protocol – NFS & SMB
- 00:11:00 – Metadata management is a key differentiator in Hammerspace
- 00:16:00 – Does Hammerspace use underlying physical storage attributes?
- 00:20:25 – Protection is assigned at the data level, not storage
- 00:23:00 – Searches generate no I/O load on the underlying storage platform
- 00:28:10 – What type of data is deployed on the Hammerspace platform?
- 00:32:15 – What if I want to take Hammerspace out of the environment?
- 00:36:00 – Migration is one use-case for the Hammerspace platform
- 00:37:00 – What container-based support does Hammerspace offer?
- 00:39:30 – What is the lightbulb moment? Demos!
- 00:41:00 – Wrap Up
Transcript
[bg_collapse view=”button-orange” color=”#72777c” icon=”arrow” expand_text=”Show More” collapse_text=”Show Less” ]Speaker 1:
This is Storage Unpacked, subscribe at storageunpacked.com.
Chris Evans:
Hi, this is Chris Evans recording a Storage Unpacked Podcast. I’m here with Martin again, how are you Martin?
Martin G:
Yeah, not too bad, still in lockdown.
Chris Evans:
Yeah, me too. But then again, I’m always in lockdown, I never leave the house.
Martin G:
Well, you should have a word with Mrs. E about that.
Chris Evans:
Yeah. I’m afraid working from home, you don’t get to go out very often, you spend most of the time sitting in the house. But to be honest, it’s nice to bet out when you can, and hopefully this lockdown will not continue for too long.
Martin G:
Yeah, hopefully not too much longer.
Chris Evans:
Absolutely, yeah. So this week we’re joined again by a guest that’s been on before actually, we’ve got Douglas Fallstrom from Hammerspace joining us again. Hi, Douglas.
Douglas Fallstrom:
Hey, Chris, Martin.
Chris Evans:
How are you?
Douglas Fallstrom:
I’m doing good, thank you. We’re still in lockdown as well, and I expect us to go on for quite a while I guess.
Chris Evans:
Yeah, we’ll just have to sit and wait to see how long. It’ll take as long as it takes.
Douglas Fallstrom:
Yeah. It’s nice to be home with the kids all day long, it’s also challenging. But in the end, slowing this pandemic down is going to be worth it, so its small price to pay.
Chris Evans:
Absolutely. So this week we’re going to dig down and talk about your product, your platform in a bit more detail, so it would be great if you could just give people a bit of background as to what Hammerspace does, origins of the company, and what the product itself is actually… what problems it’s typically solving.
Douglas Fallstrom:
Sure. Hammerspace is now seven years in the making, two years as Hammerspace. The origin of our technology is something that the team has been working on for a very long, long time. We were formerly known, or the technology was formerly known as, some of you have been using a company called Primary Data, and when we decided to start Hammerspace we reused the technology for a primary data investment. We took the core engineering team, as well as the product and marketing team, and rebuilt it as Hammerspace. And then we spent the last two-plus years now, building a global file system on top of it.
Douglas Fallstrom:
And the purpose for that, and there’s a few other things we’ll talk about on this podcast, global files, it’s probably the biggest feature in Hammerspace today. The purpose for that is to be able to support a hybrid cloud storage environment, where we can help customers present data on the location where they need the data, whether it’s in the Cloud, across data centers, and so forth, essentially making their data agile and portable across distance.
Chris Evans:
Great, so that gives us a bit of a background there as to exactly what you’re doing, let’s dig down and sort of just talk about what it looks like from the customer’s perspective. Is this a distributive global file system, is it running commodity hardware, how typically would the customer consume this?
Douglas Fallstrom:
Yeah, we’re a software defined platform, so we run in virtual machines, we run on bare metal deployments, and we run in the three major Clouds today, Azure, Google and obviously Amazon being the biggest of the three today. And our software runs as a metadata server, and a data services layer, and there are two distinct components. When a customer installs us on prem for example, it takes about 20 minutes to install us. We ship an appliance-based form factor ISO to the customer, they pop in, or virtually pop into a VM, go through the installation, and then they’re up and running. So typically, within 30 minutes or so, given how fast they click on the screen, we’re up and running with a global file system on top of any technology they have.
Douglas Fallstrom:
And we also run, I should say, in nearly any virtual environment. We’ve tested [inaudible 00:03:43], Hyper-V, KVM, the Xen server, Nutanix, and so forth. So it truly is a virtual bare metal, and a Cloud experience with us.
Chris Evans:
I’d just like to qualify a bit of that. You sort of implied that it was easy to get running, and you run in VMS, are you consume native storage or are you using storage from underlying other hardware platforms? Is it a mixture of both of those models?
Douglas Fallstrom:
We’re a mixture. So we have customers that have existing data on existing platforms, and we can leverage that without having to move the data, we call that process assimilation. Ala De Borg, I guess. We’re couple of Star Trek fans in the company that came up with that. So being able to leverage existing data in place, allows us to build a global file system without actually moving a single byte anywhere, until they use it.
Douglas Fallstrom:
We also are able to take block storage, NVMe, SSD hard drives, SAN and [inaudible 00:04:38] storage, and use that as well. And we can use that as storage in the global file system.
Chris Evans:
So you’re creating almost like, in some respects, an abstraction layer here. You’re allowing people to use existing physical storage, and abstracting it into a view that looks like a consistent single file system, but I guess that’s one of the features you must describe the product as.
Douglas Fallstrom:
Yes. We think of it as our file system, but I guess some people think of it as a global file system. We took global to the next level, where we actually made it geographically physically separate. What’s interesting we’ve found, is that customers have a lot of existing environments on prem, and they have nothing in the Cloud, or very little in the Cloud. So what we found is, often we layer with existing storage on prem, and we end up being the brand new storage for them in the Cloud, leveraging obviously technologies like EDS for Amazon, or Azure managed discs and so forth for the various Clouds.
Chris Evans:
Martin, we’ve seen a lot of virtualization solutions before, haven’t we, in storage, but typically these have been block-based virtualization solutions.
Martin G:
Yes, [inaudible 00:05:44] virtualization has been tried before, we’ve had things like Rainfinity from EMC in the past, I think FI have had something called The FAN for File Area Network. So [inaudible 00:05:54] another one as well. One of the problems is, it’s always been difficult to get your existing storage into the virtualization layer. So I’m very interested to hear how Hammerspace have actually managed to overcome the sheer amount of lifting, which there’s been in the past, to actually get existing storage in existing file systems into the metadata server.
Chris Evans:
Well, why don’t we dig down and let’s find out. Douglas, let’s dig down into the technology and understand it a bit more then. Let’s start with that, that question about how you actually take potentially existing platforms and put them underneath your management.
Douglas Fallstrom:
Yeah, so we had the ability to assimilate existing shares, or directories, or just even individual files. And assimilation files, is just simply an extraction, a copy of the metadata of the existing SMB and NFS metadata that’s specific of the existing storage, and putting that into Hammerspace, and that is done in a highly parallel background and on-demand way. So let me give you an example, let’s say I have an existing, I’m going to use NetApp and EMC, because they’re two leading NAS vendors to the environment, you can hand us a share and we can do inline assimilation, or a side core assimilation of that share, and present that same namespace in Hammerspace in less than 60 seconds.
Douglas Fallstrom:
We can do that independent of the amount of data you have, and the number of files. And the way we achieve that, is by allowing assimilation to become a background process, as well as an on-demand process. On-demand means that, you may have just assimilated the initial set of directories and the files from that share, but 30 levels down, you have no idea what it is. And if the user navigates that far down, we fold in the directory structure and the file structure on the fly as they’re navigating in there. And these are just a few metadata calls to the backend storage systems that the original data sits on. It takes me longer to explain this, than it does to do it for thousands of files. So we have a system that can scale very well, but at very large amount of files.
Douglas Fallstrom:
We had a customer today that’s in the process of assimilating over 100 billion into our namespace, which is an insane amount of files from a very, very large other scale NAS vendor estate.
Chris Evans:
I just want to make sure I understand that. So if I had a hierarchy of files structures, and I went down to say a third level of a directory structure, you’re saying that potentially you might not have actually assimilated that directory structure at that point, but as soon as the user goes in and actually requests to access that directory, you would then go off and do that if it hadn’t already been assimilated. Is that the way you’re describing it?
Douglas Fallstrom:
That’s right. And the way you practically get there, let’s use a file in your Windows Explorer for example, you click on your drive letter, you navigate to share, you click on a directory and then you get there. The mere fact that you’re clicking directory, actually causes us to assimilate the metadata for those directories, if its not done already in the background, all right? So let’s pretend the background piece hasn’t gone there yet, so you’re clicking there and we do this essentially on-demand assimilation of the metadata. If you typed in the path in the title bar, then [inaudible 00:09:06] down, that works too.
Douglas Fallstrom:
What happens in the background, is that the Explorer actually opens up every path all the way down there, and again, the mere fact that, that happens, causes us to assimilate things on demand. And the assimilation, let’s say of a directory with, I don’t know, 1000 files, 10,000 files, that happens easily in sub-second time. So it’s a really, really quick experience. The only thing you would see if you click that the first time, let’s say a day later, is that the day later it’s probably a little bit faster because then the metadata already exists in our management server. But I’d be hard-pressed to see whether you can actually tell the difference or not.
Douglas Fallstrom:
And we do this assimilation for metadata for both NFS and SMB at the same time, so it’s an important point that we actually handle multi-protocol assimilation, and so it does multi-protocol presentation at the same time.
Chris Evans:
So let’s talk a bit more about that whole metadata thing, because obviously it’s fascinating how you’re managing to do that side of it. What about the way that you’re managing metadata in general? And one of the reasons I say that, is because clearly when you’re doing things like searching, when you want to find files within a file system, actually the ability to do clever searching, maybe that’s using sort of regular expressions or other types of logic, can be really powerful, so how are you building your metadata within the product?
Douglas Fallstrom:
Yeah, so some of that is a little bit of our secret sauce, but we’ll share how much we feel comfortable with here.
Chris Evans:
I wasn’t expecting you to tell me your algorithms, I was hoping… Well, that might be pushing it a little bit too far, maybe just understanding the sort of things that you’re doing that makes your product different, I guess is what I’m after.
Douglas Fallstrom:
Sure. Yeah, we use a very robust enterprise key value store in the back end to store the metadata, allowing us to do extremely fast lookups, and searches and so forth. It also as I worked with the engineering team now for, actually it feels like forever, probably four or five years now, it also allows us to extend the namespace with new metadata information in a very, very easy way, versus traditional file systems as a rewrite and iNote layout for example. That used to work for Veritas for a long time, and obviously [inaudible 00:11:17] went through that transition.
Douglas Fallstrom:
Anyhow, so I built file systems for many years, and anytime you wanted to add something to a traditional file system, you had to consider the iNote layout, the space the iNote takes on disc, and everything else. And then when you did a layout change, you typically had to go back and rewrite the whole structure, so various file systems are managed slightly differently. But overall, it’s extremely cumbersome, and it’s a very, very sensitive part of the file structure. But separating that out into our own file system, that it really is just the metadata, it allows us to add new components to it without any kind of risk to the old ones, but also in a very, very creative way. We don’t have to worry about things like, iNote layouts that takes up space on discs, that may no longer have room to exist and so forth.
Douglas Fallstrom:
So there are new features like Worm and other things that we’ve added in there, that would’ve been maybe traditionally extremely hard to get into an existing file system, that have simply just been weeks of work versus months and years. So that part of separating it out, has been really key for us to evolve very, very fast.
Chris Evans:
This sounds like something, Martin, that would be maybe in your environment it would be incredibly useful if you had that ability to extend metadata around some sort of way of managing it, to understand what the content was for example. A bit like adding metadata to object store items.
Martin G:
Enriched metadata is one of those things that gets talked about a lot at Media. So the way we tend to do it is, we’d abstract it even away from the file and we’d have a database, and normally done with a meter asset manager, or digital asset manager, that’s where you hold all your metadata. And then it depends, it might get stored in traditional relational database, or something like Amongo, or a Cassandra database so it becomes searchable. But metadata enrichment for files in the Media world, is a very hot topic.
Chris Evans:
I do think that’s really something that may be being underplayed in some respects across the requirements of file systems in that, the more that we do work with more analytics-type data, where we’re analyzing content and we’re modifying it, we’re looking at new different ways to search it. That ability to tag and use metadata more effectively Douglas, I think will probably be something that we’ll see become more of a requirement, and it hasn’t necessarily become a requirement because customers just couldn’t do it.
Douglas Fallstrom:
It’s interesting because, what if the metadata can also drive the behavior of the data. So for example, what if there’s certain metadata tags that dictate that the data’s now actually read only. It doesn’t matter what the [inaudible 00:13:44] or the NFS permissions that a file says, it’s a Worm file at that point. What if you can’t even read a file until you have the right metadata tags on it? So metadata for us, is the forefront oof doing a lot of things with the system, so it’s really interesting to see it.
Douglas Fallstrom:
Obviously, our system stores the regular file system metadata, like [inaudible 00:14:06] information, it stores a lot of SMB metadata in there as well, and we have a lot of telemetry metadata, or performance metadata built in there, we also have things like key value pairs. We have tags, and keywords, and so forth. So now you have all the OPX store-style metadata into the system, you have things like inheritants, or not inheritants, you have metadata that can only be increased in number. For example, let’s say you take the date for expiration for Worm, that metadata field, it’s not immutable, it’s immutable from the way that it can only ever be increased. So you can’t shorten the expiration of the Worm file, but you can increase it.
Douglas Fallstrom:
So there are different types of metadata that needs to be handled differently, and there is metadata in our case, that can also control the behavior of the file. So for example, you can control the IO error behavior in our system, by changing some of the metadata on the file. So a system can truly be tailored to fit the application that is using it, at a level that has not been seen before. So everything we do in our system, is file granular, which means that I can protect all Martins media files with a Worm attribute after five minutes, or let’s say after 90 seconds, or whatever you like the number to be, but only if the owner is Martin, and only if it’s 90 seconds later, but after 8:00 PM at night, I’m going to allow open access for two hours.
Douglas Fallstrom:
So you can configure a system to be extremely flexible at the file granular level, which can be very, very useful, especially when you work with data across many sites.
Chris Evans:
That’s difficult to get your head around, and we’ll come back to some customer examples in a little while, but that gives you such a range of things you could possibly do. I don’t remember hearing anybody else who offered something similar to that. That seems quite unique in my experience.
Martin G:
I’ve not come across anything which does anything quite so ambitious. It’s very interesting to hear.
Chris Evans:
Let’s talk about the actual physical side of it the. So we’ve got our metadata, we are separating data from metadata, we’ve got the ability to use existing hardware and technology, so are you adding in additional protection features into your layer of the software, are you just relying on some of the underlying technology, or is it a mixture of the two?
Douglas Fallstrom:
We do both. So our metadata can dictate on what kind of protection you want, what kind of performance you want for the data. So for example, let’s say NetApp enter another volume into the storage configuration, we automatically detect what kind of resiliency and performance that volume is offering, we do realtime performance management, all that thing. Anyway, we detect things that add time of the system. The same thing for an Isilon or a DSX node, which is our own data services node, as well as for other systems along the way.
Douglas Fallstrom:
So that piece is really key, because if you know that the data has let’s say, two nines, versus five nines, versus nine nines availability and durability, that means we can treat the data differently. So we have some customers that are really comfortable with their NetApp environment for example, and that’s fantastic, they like to store their data on NetApp, but they need to do things that NetApp can’t do, let’s say, archive things to the Cloud, span multiple systems to create a bigger share and all that. So the data is stored always in the NetApp, because that’s where they want the data to be, but the ability for a system then to create additional copies [inaudible 00:17:30] off the NetApp, let’s say snapshot and now store it in Amazon, or Azure or both.
Douglas Fallstrom:
So it allows our system to not only extend the protection of the underlying storage, but also extend the reliability of it. Because now if, let’s say, a node is down for maintenance or whatever reason, and somebody is opening a file that sits in that node, and that node is no longer there for, let’s say 10 minutes, or day, or forever, we will automatically rehybrid data from another location, let’s say the Cloud, assuming it’s the same copy, and put that on another surviving system.
Douglas Fallstrom:
So not only can we automatically determine how data’s protected, you can add additional protection to it, and you can automatically remediate when things go wrong. And that last part is really key, because designing a system that takes the best snapshots in the world, is terrible if you can’t restore from the snapshot. And we’ve taken that an extra step, we’re saying, “We not only take the best snapshots, we also copy the data the best way, and we automate the restore process so you as an end user don’t ever realize there was an actual outage,” for example. And we hope obviously nobody ever has an outage, but it happens all the time.
Douglas Fallstrom:
But that’s just one aspect of it, the other aspect of it is, how do you protect the user themselves from making mistakes? And we have built into the system things like undelete, so now suddenly you have an undelete function to your existing data that protects you from, or maybe some evil user decides to go and delete all the data in your office share for example. So you don’t have to go back to the snapshot to restore it, but you can go back to the version that was just written two minutes earlier, and just simply undelete that file.
Douglas Fallstrom:
So protection comes, Chris, from various different levels, and they’re really broad from just protecting the number of copies of data you have, or controlling it, to managing the metadata on top of that.
Chris Evans:
I’m just thinking that it seems to me that you’re mixing what seems to be some attributes of what you might think of as an object still, and some aspects of what you might think of as either a scale out file system, or even a parallel file system. You seem to be bringing attributes from all of those different platforms together in a single solution.
Douglas Fallstrom:
Yeah, that’s kind of what we end up delivering and trying to. So for example, let’s say you have data that’s really important to you, that data might be, let’s say your CEOs documents just to make an example out of the CEO, so every time when he updates an Excel file, the data needs to have let’s say nine nines or something on the durability and availability. Which means, that it’s going to end up multiple copies, because most storage systems are not nine nines, they only are five or six nines at the most. So you’ll end up with two copies across two different storage systems.
Douglas Fallstrom:
But then as time goes on, he doesn’t need all that data on NAS all the time, because he’s got an insane amount of data, so as time goes on it gets archived into Object. And Object, let’s say in Amazon or Agile Land, Object has a large number of nines, so you don’t need to put a copy into Object stores, you just need to have one copy in the Object in the object store and no copies on NAS. And the moment someone opens, or he opens the file again, he or she, then the file moves back to NAS with two copies to make sure that the number of nines is always met. So you can in our system, configure protection of data to be data specific. So it doesn’t rely on the volume underneath the storage file system to be configured correctly, and that is really a big differentiator.
Douglas Fallstrom:
So you assign protection at the data level, versus assigning it at the storage level, where if I store data on the share it’s always replicated, snap shotted once an hour and backed up weekly, that’s kind of like 1990s, so our protection goes in a more modern world at the data level.
Chris Evans:
I was just about to say that to Martin, that we tend to of come from a history of protecting data by looking at it from a physical aspect of red groups, and even to a certain degree distributing data these days using [inaudible 00:21:23] way, we’ll put it in big pools and then we’ll still do that distribution of that data across the technology. But Martin, this is interesting to think that you’re actually applying policy to individual files in order to set that availability level.
Martin G:
It does seem to be coming more common. So look, a lot of file systems these days are developing policy engines where you can define how a particular user’s data would get protected, or a particular type of data would get protected in a different way to everything else in a directory or a file system. So it’s one of those things which is coming, it’s very interesting to see how you do it at scale. It’s okay taking the example of a CEOs Excel file, because it’s fairly easy to deal with and it’s not very many, but actually then defining that live cell for something which is a bit more complex on things where actually, resilience will change over time. So you may have a file which you want to be highly resilient for the first 90 days, and then that resilience will change, but you could be talking about millions and even billions of files which you’re trying to apply policy engines to.
Douglas Fallstrom:
Yes.
Chris Evans:
Well, in that case Douglas, let’s talk about that, because that’s really quite relevant, and I think that probably leads us into a discussion about how people are using your platform to manage their data. So how do you manage that sort of level of scale, and can you give us some examples of exactly how customers are doing just that?
Douglas Fallstrom:
Yeah, and it’s interesting because we have customers now with many billions of files in their systems. And the way our system works is, one, we have a really fast metadata store, period, and that allows me to do a lot of updates to the metadata store without having to worry about many things. The traditional file system we don’t need to worry about. So for example, you can do a search, a recursive search through the file system, and it’s not going to generate a single IEO on the storage system behind the scenes, and it’s going to be really fast. So traditionally if I did a file job on an old system, I would be worrying about opening a file because you’ll be stuck behind some kind of metadata operation being served, in this case, your metadata being off to the side.
Douglas Fallstrom:
Having that level of speed allows us to think and apply different solutions to the problem. So for example, when you change policies… We don’t expect people to put a policy on every file, that would never scale, so you have the policies that you apply at the share level, at the director level, and so forth. They are computed however, at each individual file, the Object in the system, and that allows me to be file granular. So if I apply a policy that says, tier data out after 30 days, we have a multi-threaded, we can think of it as a sweeper, and some of these things change over time the way we implemented them, where in the background we look at all the objects in our database, and assign a value to those objects and what needs to happen to that next.
Douglas Fallstrom:
In the end, the object is either aligned or not aligned to what you’re telling the system. So if you’re telling a file to archive out after 30 days, and on day 30.1 the system tags that file as being archived, it’s actually going be flagged red. And then we have a job engine that goes and looks for these jobs that needs to get done, these tasks, an example for that is, move this file to the archive for example, and then it schedules a job for that, that is then executed on and then metadata’s up there, again to be in alignment.
Douglas Fallstrom:
So our metadata is really scalable because you have objectives, we don’t call them policies because they really are objectives, you state your desire with the data versus telling it what to do at that point, and then you’re letting the system, using machine learning, figure out what is the best way to achieve the stated desire. If it’s the number of nines, or if it’s archiving it after 30 days, or is it keeping it on fast storage for 10 days and then keeping it on slower storage, and so forth. The number of permutations here is nearly infinite, and the way we scale here is just simply doing a very traditional approach of generating a lot of jobs to move data around.
Douglas Fallstrom:
And because data mobility in our system is live, completely non-disruptive between any storage system in the backend, that allows us to move data at any point without any worry that the client may not actually be able to open their file while we’re moving files. So we never need to worry about whether a file is open or not, or whether it’s just about to become in use, we are free to schedule these jobs at any point in time. And that scales even to multi-billion file environments.
Douglas Fallstrom:
So for example, we have a lawyer firm that has, I think it’s 2.3 billion files in one of their systems with Hammerspace today, that is backed by five to 10 NetApp systems that they’re storing the data on. And what they’re doing as an example is, creating a tertiary copy of the data for ransomware protection into the Cloud. And what enables us to be unique in that area is to, one, we give them a single D-Doop domain across all their backend storage systems, so they’re getting really good D-Doop across all their systems. And second, and this might have been maybe most important for these guys, is that they can use their own on prem key management system to encrypt the data. So when the data’s uploaded into the Cloud for resilience, that allows them to ensure that no-one can ever read the data from the Cloud. Even if they hack into the SV bucket and download the data, they’re just going to see encrypted garbage. So that’s one piece.
Douglas Fallstrom:
And the third angle that is really important for these guys, is that they have multiple sites. So they can use our multi-site file system, our global file system, and if one site becomes impacted or affected say from ransomware, they can easily turn that site off, or go and do their thing on that site, and start accessing the data on the other site with our global file system being the framework of reference at that point. So now they can use our global file system to access data remotely. So that’s one example.
Douglas Fallstrom:
We have another example where they’re taking a lot of media content, specifically media content in this case, and moving it up to a refreshed, newer hybrid environment. So they’re using a live data migration facility to move data from one scale of NAS vendor to another. And they’re also presenting that data in Azure, they happen to be in Azure in their case. And so what they’re doing is, they have some 30,000 plus customers, large amount of customers that’s consuming the output from these media files, and some of them consume it through Azure, some of them consume it through on prem services. So they are presenting that data out over both object as well as file protocols across these very different environments.
Douglas Fallstrom:
And we’re helping them abstract all the data away from the underlying infrastructure so they can go through a hardware refresh cycle, they can move the data to the Cloud, and only the data they need to and so forth. So it’s quite the dynamic environment as you can imagine, and that environment is very, very large.
Chris Evans:
That’s interesting, because one of the questions I was going to ask you was, what type of data do you see? You talked about a lot of maybe typical file data, I was going to ask you whether you see a lot of analytics-type data, or other types of content that’s unstructured, but sitting on your platform. I guess you see the whole range, and I’ll let you answer that in a second, but before I get to that point I just wanted to add that, a lot of what you’re talking about seems to be more about things that workflow, and data management, and how you actually use the content, than specifically purely, we’re just another platform onto which you can put your data.
Douglas Fallstrom:
Right. Really, in the end, we’re really agnostic to the platform you’re in, so we choose to do industry standard interfaces to access the data stored on whatever platform the customer chooses to use underneath the hood, and that’s for a very, very good reason. We don’t believe there needs to be another vendor with a better mousetrap than another one. We’re a storage agnostic file system that, sure you can aruge that we’re the new mousetrap, or you just can pick whatever mousetrap to put your data in. But you know what, we also built an uninstall into our system, so if someone doesn’t like us, we don’t need to copy [inaudible 00:29:09] information out to our system, you simply run our uninstall utility, put the metadata back to where you want it to be.
Douglas Fallstrom:
Anyway, we decided to use industry standard protocols, because half our team in the company, is part of the team developing the next generation NFS standard, and has been instrumental in driving that piece forward. And we believe in open standards from that perspective, so it was really important for us as we built our system out. And sure, there is some proprietariness in our metadata server. We’re a company, we’re still out to make money, so there needs to be some uniqueness to it. But other than that, data access is through standard protocols, and rather than using a proprietary, we sell to a client. So I’m really happy that we’re able to deliver all this functionality across Windows clients, Linex clients, Unix clients, even frankly Android cellphones if you really create enough to mount NFS and be over it, you can.
Douglas Fallstrom:
So it’s really agnostic to the platform. And so with our customers, it becomes more important to understand their workflows and how data has gone through their environment, than just saying, “Hey, we can store data cheaper than someone else.” The cheaper part may actually just be the fact that the customer is smart about where they place their data, and we simply enable them to do that better than someone else.
Chris Evans:
And to be fair, you could do that with other technologies. So you could replace the underlying hardware component with cheaper technology and mix and match that. So in actual fact, it almost seems irrelevant to build that into a product, but almost more sensible to have a product that supports that ability to do it.
Douglas Fallstrom:
And it’s really interesting, because of one of our biggest customers, and unfortunately I’m not allowed to share their name because they have some proprietary business, they’re not government, they’re in the telcor-style business, they are taking their existing scale out NAS environment, and they are adding existing servers to it that is running their Hammerspace, to expand and migrate to a new environment. And so not only are they doing all three, they’re leveraging the Cloud for new workloads, they are leveraging the fact that they can use [inaudible 00:31:13] define and white box servers, ironically from the same vendor selling them the other storage solution by the way.
Douglas Fallstrom:
They just like choice, and they like the freedom of hardware choice, they like the freedom of, “I think this vendor provides better support, lunches, dinners, whatever it might be, and therefore I want to use them. But I want to make sure that the data is not stuck when I want to move the data around, I want it to move non-disruptively so I don’t have to take a multi-hour outage.”
Douglas Fallstrom:
And I was talking to another customer the other day, they are taking 17 hour outages, only because their vendor’s asking them to restart their applications, not realizing it’s nearly a full day on the weekend for a poor administrators to sit there waiting for their applications to shut down, to unmount, remount, move some data and so forth. And it’s totally just… It doesn’t happen every day, but it happens roughly every year, and it’s extremely, extremely painful and disruptive for them. So having an environment where data can move freely, is really key for these guys.
Martin G:
So Douglas, I’m a customer, I put Hammerspace into my environment, I’ve imported things, I’ve started running policies, I’ve got my data moving around the global file system, and do you know what, I’ve decided I don’t particularly like this, so I can drop Hammerspace out, but potentially my data is disbursed all over the place. I might have stuff which is [inaudible 00:32:39] due to the lifecycle management, it’s now sitting in S3, so what are my options of getting out of this?
Douglas Fallstrom:
Yeah, so it’s interesting, because we originally were like, “Why do we need to build an uninstall? If people move their data in there, they should be in my mousetrap.” Then we decided, “You know what, that’s the whole point of our product is, not to be that one.” So what you would do is, you simply put a policy in our file system as an objective that says, move all the data to the storage system where I intend to put all my data on. So we’ll move everything disruptively in the backend. And then when you’re done, you’re going to have to stop your client IO, you’ll run the uninstall process, which essentially puts back all the metadata on those files, and then you have to start up your clients mounting from that storage system.
Douglas Fallstrom:
But we can’t do for you, is to spread… Well, we can, but you’re not going to like it, is to spread the data out on all the other storage systems. So we can consolidate all the data for you on this new storage system, and then we can get out of your way. Getting out of the way is a destructive operation, I don’t want to be very open about that because you’re going to have to unmount and remount, but that’s the only thing you’re going to have to do.
Martin G:
As I said, Douglas, that sounds really good, that’s a really good answer. I mean, that’s the important thing. We’ve seen it before with file virtualization tools, where the promise of file virtualization, then you ask this sort of question, “I’ve decided for whatever reason it’s no longer what I want to do,” and that whole uninstall process is massively disruptive. At that point you decide, “You know what, I’m not going to go that route, because you’re basically locking me in,” and I think that’s a really good answer.
Martin G:
And I have another question about how you see this being used. Do you find that people start looking at it as migration tool? So I’ve got say, vendor X is a NAS device, and I want to move to vendor Ys, it’s quite difficult I could do it with [inaudible 00:34:24], but that’s potentially disruptive and very time consuming. So basically, I want to just move the data. Do you find that people’s initial conversation sometimes around that, can I use your device as a migration device?
Douglas Fallstrom:
Yeah, it comes up, not as often as we originally honestly expected. We’ve seen a lot more people asking us, “Can we archive data out to the Cloud instead?” That’s more common. As I say, maybe 20% of the time I get pulled into migration conversations. Now what happens often, is the migrations have been part of what they use our product for, but not the sole purpose. Almost actually every single customer so far, have used us to move some older data to a newer system, whether it be the Cloud or another NAS system, and then they discontinue the older system, but they haven’t taken us out of the mix yet. So there hasn’t been a true migration where you use us as a tool to just get the job done, but you certainly could do it that way, no doubt about that at all.
Douglas Fallstrom:
And we have ways to speed that up by the way, so it’s less disruptive and you can make it really minimal, but those are maybe technical details. But it’s interesting, because we thought that originally it would be one of the really big ones, and it’s become one of the smaller ones. And the hybrid Cloud storage thing, where you archive data out or you make data accessible across sites, have become much bigger challenges for customers to solve.
Martin G:
Yeah, it’s interesting. I thought you’d see more migration myself. So I’m looking thinking, “Oh…”-
Douglas Fallstrom:
Yeah, and some of that-
Martin G:
… “imagine big storage devices to move around, so…”
Douglas Fallstrom:
Well some of that might be how we take that to market maybe. We don’t focus a ton on that when I go to market, so that might be why it’s a little biased from my side.
Chris Evans:
Just to add my two cents to that bit. I think really, I would say it depends on whether you value the features that are in there from an ongoing basis, or whether you think that you literally wanted to use it as a migration tool. And by that what I mean is, imagine you’ve done that migration but you think, “Actually, do you know what, this has given me more flexibility around data placement and various other things that, really, I’m not going to even get on that new platform I’m moving to. So I’d rather just leave it in, and see the benefits in both respects.” And possibly many clients can find ways to pay for the fact that they’re using your solution as part of the optimization of moving the data around.
Chris Evans:
So it may well be that they’ve realized, “Actually, we can just leave it where it is, and leave it in place, and it’s still not going to cost us that much, because we’re going to see additional advantages.” So maybe some of that is at play in this as well.
Douglas Fallstrom:
Yeah, I think we’ve seen several customers justify our price by just simply saying, “Look, I can move 40% to 80% of my data to cold storage to the Cloud, and that’s simply going to pay for my next hardware refresh, because I’m going to have to refresh with much less capacity, or not grow my capacity but just use what I have because I can move all the old data out.” It’s a very common RY conversation we end up having.
Chris Evans:
So what about the buzzword that we hear all the time, and that’s Kubernetes, how are you integrating into that migration towards a container-based environment?
Douglas Fallstrom:
We built a container storage interface driver, a CSI driver, to help translate the APIs from Kubernetes into Hammerspace APIs. So in Kubernetes, we are really unique in that we can deliver a block file and NFS experience, for data that is persistent data in Kubernetes. And not only do we do that within the cluster, we also do it a cross distance with our global file system. So you can have a Kubernetes cluster running on prem, you can have another Kubernetes cluster running in the Cloud, you can simply stop the pod on prem, you can just literally seconds later start the pod in the Cloud, and if the data isn’t up there already through our policy framework, the data will be pulled on demand when you access it and the pod will just start.
Douglas Fallstrom:
Zero complexity with federation, and zero worry about, “Did I have my storage or A vendor replicate the right data up into the Cloud for me to use Kubernetes?” And that’s another reason why active, active namespace is key, because you can run some pods up in the Cloud and you can run some pods on prem, and they all sit in the same share if you like, or shares, and exchange data that way.
Chris Evans:
And that includes block then. So if I had a block device that I was running in a non-prem environment, and I wanted that same block device connected to a different container that was now running in the Cloud, I could make that happen?
Douglas Fallstrom:
Yeah, as long as it’s managed in a containerized environment, yeah. So we can create Kubernetes, what’s called raw devices in there, or block volumes, and they are simply represented as a file in Hammerspace, and that file is portable. So that file can be accessed, obviously not concurrently in the Cloud, because that would violate the rules of physics, but it can be accessed first in the Cloud and then on prem. But yeah, that’s the beauty of us and Kubernetes.
Chris Evans:
And I guess, I would see that as being useful for things like system data that might be structured like some sort of sequel data base or something like that. That might be a good example of where that could be used.
Douglas Fallstrom:
Exactly. And we also do block and also file system in a pod, so things like Jenkins and other similar things, where you don’t need the metadata stored in a NFS end point, and frankly Jenkins is high performance enough that you don’t want to do that. That is also another good use case for using this kind of mix and match.
Chris Evans:
Okay. So in total then, across… Well, we’ve talked about so many different things here, whether that’s at the top end, the metadata, global M space, the support for different protocols, the data protection, the abstraction from physical storage, and the ability to mix and match, and then even the Kubernetes side of things. What is it that you found out when you’ve gone to talk to customers, they’ve seen and thought, that’s the light bulb moment where they suddenly realized why what they were doing before wasn’t working?
Douglas Fallstrom:
It’s interesting because we present slides, and the light bulb moment nearly always happens in a demo. People they theoretically get it, but I until they see it they don’t believe it. And the light bulb moment always is when I’ve assimilated existing data from an existing storage system, and I show that data either moving to another storage system with literally a single click, or being accessed in the Cloud without any data migration kind of done on the way. And that’s the light bulb moment. And the moment we show that, it tends to be a very different conversation after that.
Chris Evans:
Everybody loves a good demo.
Douglas Fallstrom:
Everybody loves a good demo.
Martin G:
Yeah, we do. We love a good demo.
Chris Evans:
Yeah.
Martin G:
Especially if done live.
Chris Evans:
Oh yeah, absolutely.
Douglas Fallstrom:
Always live, always.
Chris Evans:
Always a good thing to do them live. Well, Douglas, that was fantastic. I think we talked about some very interesting things there. I think there’s a lot of stuff to go off and look at, and try and understand in a bit more detail, so if people have listened to this and thought, “Okay, I need to understand more about your metadata process,” or even just how the system is deployed, where can we point them to, to go and find that sort of information?
Douglas Fallstrom:
So we have some good Tech Field Day presentations on the web, so if you just search for Hammerspace and then Tech Field Day, you’ll find them. It’s all linked from our website, so hammerspace.com, and the resources section will have everything in there as well.
Chris Evans:
Brilliant. I mean, we always like the Tech Field Day video. Usually the ones that I’m not in, are the better ones, but what I’ll do is I’ll put a link in to your website, to all of those presentations. So people can go find that through the show notes. But for now, Douglas, great, thank you very much for joining us, and look forward to catching up with you soon.
Douglas Fallstrom:
Yeah. Thanks both Chris and Martin, great conversation.
Martin G:
Cheers Douglas.
Speaker 1:
You’ve been listening to Storage Unpacked. For show notes and more, subscribe at storageunpacked.com, follow us on Twitter @storageunpacked, or join our LinkedIn group by searching for Storage Unpacked Podcast. You can find us on all good pod catchers, including Apple Podcasts, Google Podcasts, and Spotify. Thanks for listening.
Related Podcasts & Blogs
- #152 – Global File System Concepts
- #150 – Myriad File Systems
- #75 – It’s ILM All Over Again with Chris Mellor
- SFD7 – Primary Data and Data Virtualisation
Copyright (c) 2016-2020 Storage Unpacked. No reproduction or re-use without permission. Podcast episode #lwfj.
Podcast: Play in new window | Download