Sports Content Management Forum: The Challenge of High-Capacity Archive Storage
Both the technology and the amount of high-value content are growing rapidly
High-capacity archive storage was the focus of a panel of industry leaders at the 2019 SVG Sports Content Management Forum, held in New York City. Among the topics were the impact of data tape, SSD, NVME, and advances in disk-drive density on the storage platform for large-scale archiving over the past year, along with cloud-based storage and where it fits into the mix.
Tab Butler, senior director, media management and post production, MLB Network, said that scale and time are the two biggest challenges MLB Network is facing.
“It seems like the hardware refresh cycle happens far too often and in too short a period of time,” he said. “And, as time goes on, we are realizing there is much more value in content we never thought of [as] being of value, and [it ends] up being just as important to the viewership as a walk-off home run. It is a situation where we now are cataloging and keeping more angles as technology makes it easier for us to create and consume content. When we started, we thought we were doing a lot by keeping four or five copies of every game, coming in [at] five hours for every hour of play. But now we are at 10 hours for every hour of play for a regular game while a showcase game will drive that up to 20 hours.”
The MLB Network archive is 1.2 million hours of content, with all of it available to the production team via proxy video. And, with everyone accustomed to searching for video content on their phones and finding it within seconds, having to wait three minutes for a video file to be pulled out of a system is unacceptable. That means that content historically stored on tape has to be moved to primary disk storage. In addition, those proxy-video–storage systems will wear out after spinning for five years, which means that the entire process needs to be started again.
Eric Bassier, senior director, products, Quantum, noted that a lot of customers say they have 100 PB that they need to keep forever. And part of that is due to the difficulty of predicting what content will have value at what time. “As an industry, we’ve made progress,” he said, “but it’s a difficult problem to solve.”
David Taylor, executive cloud architect, IBM Storage and Software Defined Storage Solutions, IBM, cited the other big problem: everything is going to 4K and even 8K. He described the content as coming in “like a fire hydrant with upwards of 6,000 gallons per minute and the hydrant never closes and you also can’t spill a drop. We’re doing a lot in AI and machine learning to identify where the valuable content is.”
Nick Gold, VP, marketing, Catalog DNA, noted the role of AI and machine learning, whereby the machines can learn to predict where certain kinds of content needs to be stored given its possible use scenario.
“With a lot of the new reality-TV shows,” said Taylor, “they have so much content that editors can’t go through it, but they need to know what content was in focus or who was talking. It’s about mining for valuable content so the editors are productive.”
Media Translation CEO Jay Yogeshwar said that, because archive technology will need to be replaced at some point, the goal is to figure out when that move will happen and whether a change improve workflows or create new monetization opportunities.
“One of the areas I am interested in,” he explained, “is how to transition from one to another without causing disruption by doing bulk migrations in the background. I am also interested in virtualization with things like an abstraction layer for an archive-management team that emulates the current system.
“You abstract the tape libraries and then also abstract the object storage and move the burden of the technology,” he continued. “This has been a movement for a long time now, like FIMS (or Framework Interoperable Media Services). The point of that is, how can we create vendor neutral archives where tools can be created and then plugged into it?”
Hossein ZiaShakeri, SVP, business development and strategic alliances, Spectra Logic, said that he sees two tiers: production and archive, or storage in perpetuity. The production tier, which is typically the most costly, includes editing, rendering, and other functions that need quick access to files. The archive tier is where the mass is, and the more automation brought into that area, the greater the efficiencies.
“Object-based storage is truly the way we as the platform do that,” he said. “One of the main attributes of object storage is, it is not connected to the actual storage media. It could be tape, disk, or cloud, and that is one of the great things about object storage. But it does include metadata that is abstracted from the application that brings portability and also allows for a simple API so that things can be automated when desired.”
Key is an agile environment that can change as needs change, ZiaShakeri added. “That is so important because things are changing quite a bit. And the key to automation is having the right tools. How do you learn about and catalog your data, and then how do you take advantage of it so you can apply the right automation process? That’s where we are putting a lot of our resources.”
Bassier noted that it is helpful to break out what is the right medium to use and then how to manage the data on it. Tape, digital tape, disks, and hard and solid-state drives are the mediums of choice, price making solid-state drives less attractive. Cloud-based systems also rely on those mediums.
“To keep content for 30 years, we believe tape is a great storage medium, but it takes time, sometimes minutes, to get files,” he said, adding, “But a lot of characteristics of tape as a storage medium are very good. And, as metadata gets tied together with the content data, it becomes easier to write software that can manage that data across different storage mediums. And that is a development that will help solve our problems.”
Taylor noted another piece that has to be considered: how to manage the most-valuable content because customers don’t want to be beholden to a cloud vendor to get that content back. “Do you send proxies to the cloud but keep the asset under your own control?”
Butler said MLB Network’s approach to the cloud involves AWS and has helped out greatly because AWS is also a CDN for the network’s needs.
“It puts our content at the right resolution close to our consumer, whoever it is,” he said. “We’ve worked with them since 2011, and all of the games are in the cloud and highly searchable for internal operations. Those files are proxy [versions], but we also have all of the proxy files on premises.”
One attraction of a big archive in the cloud is that machine learning can be applied to the content and a rich set of metadata created. That allows the content to be correlated with things like social media, and the correlations can be used to change local workflows, helping create automated tasks and speeding searches and discoverability.
ZiaShakeri noted that some of his clients have migrated to the cloud because of all the promises surrounding it but have returned to more-traditional storage.
“If you can visualize a virtual entity that encompasses on-premises storage, off-premises storage, and cloud but with the right tools,” he explained, “then there is no reason certain intelligence aspects of a storage platform cannot exist on premises and in the cloud and in sync.
:But the key is object-based storage,” he continued. “Once you have that, there are so many different things you can do because the system has visibility of all of the assets, regardless of where they are. A framework like what we are used to when [we] google something can work with a perpetual-storage platform, and then you can decide how you want to use the cloud.”
Gold added that the personnel and the processes that are developed are as important as the tech stack under it: “They need just as much focus but are often overlooked.”