#65 – Challenges in Managing Unstructured Data with Shirish Phatak

#65 – Challenges in Managing Unstructured Data with Shirish Phatak

Chris EvansData Management, Guest Speakers

In this week’s podcast we focus on the issues of managing unstructured data in a distributed world.  Chris and Martin are joined by Shirish Phatak, CEO at Talon Storage.

It’s interesting that “unstructured” proves to have a moveable definition, depending on what you want to include.  While we traditionally think of files and objects as unstructured, these so-called binary pieces of content typically do have structure within them.  In contrast, databases can be made up of unstructured data – e.g. files, that together take a structured form.

Getting past the definition, we find that data growth is certainly dependent on the industry, with a minimum 20% annually, rising to as much as 100%.  As Martin points out, in his company, the assumption is that 80% of storage will be full within 6 months of deployment.

With distributed data, we see processing at the edge and data management at the core.  In practical terms though, this can also mean moving data into the core for more analytics or match processing.  The conversation highlights how data consistency or concurrency is so important in a distributed environment.  It’s easy for users to simply copy and rename a file, throwing data management processes into confusion.

Finally, the conversation moves to the public cloud, which at present seems to be acting simply as a large, easy to use repository.

You can find Talon Storage here – https://www.talonstorage.com/ and Shirish on LinkedIn here.

Elapsed Time: 00:31:51

Timeline

  • 00:00:00 – Intros
  • 00:01:00 – What is unstructured data?
  • 00:04:30 – Why is unstructured the source of new data growth?
  • 00:06:30 – Automated/background tasks creating data
  • 00:07:00 – To centralise or not centralise?  What data is actually useful?
  • 00:09:00 – How can you define security rules outside the data centre?
  • 00:12:00 – Increased volumes of data result in policies, not active management
  • 00:13:30 – Consistency and concurrency – enemies of distributed data
  • 00:16:00 – One true copy – but at the risk of performance?
  • 00:17:30 – You can’t fix stupid users!
  • 00:19:00 – Are filesystems at fault?  Do we need ILM (again)?
  • 00:22:30 – How is public cloud helping manage data?
  • 00:28:30 – Are there any standards or best practices we can follow?
  • 00:30:30 – Wrap up

Related Podcasts & Blogs

Shirish’s Bio

Shirish Phatak is the Founder and CEO of Talon. Shirish has over 15 years of experience building scalable, high performance systems that solve mission critical information technology challenges.  Shirish was Chairman of the Board and Co-founder of Velocius Networks, a creator of network performance management solutions. Shirish was instrumental in the technical direction, enterprise requirements and vision leading to its acquisition by Akamai (AKAM) in 2013. Prior to Velocius, Shirish was a lead technologist in the advanced technology group at Bluecoat.

Shirish also co-founded Tacit Networks, a pioneer in Wide Area File Services (WAFS) solutions that was sold to Packeteer (PKTR). Packeteer was later sold to Bluecoat.  Shirish earned an M.Tech in Computer Science from the Indian Institute of Technology, Mumbai and an M.Phil in Computer Science from Rutgers, The State University of New Jersey.


Copyright (c) 2016-2018 Storage Unpacked.  No reproduction or re-use without permission. Podcast Episode AXDI.