Security Information and Event Management (SIEM) is an essential cybersecurity tool for organizations. According to Gartner Research, the SIEM market has been disrupted by external forces, including the growing appetite for more security data (cost bloat). Of course, data continues to grow exponentially around the world, with AI and microservices only increasing the exponent. This is forcing leading organizations to reconsider the role of SIEM due to unsustainable costs, whether resource-based or ingestion-based pricing, not to mention the storage costs of keeping all that data and the necessary compute costs to process it.
The core principles of SIEM require data to be collected from everywhere so it can be organized, correlated, and made available for downstream processes. One consideration to solve these challenges is data lakes. Data lakes have been around for years, and they may hold the key to efficient and effective SIEMs by forcing behavioral changes based on economics. First, organizations must understand how data and data storage affect cost because this will impact their cybersecurity strategy.
The Disconnect Between the Value of Data and Cost
There’s no doubt that data use is growing worldwide. We have mobile phones with more memory, our photos are bigger, and we watch more high-definition video. The management of this data is critical for security processes, including SIEM, in particular. It begins with the ability to bring in data from a variety of data sources, parse it, correlate it, and route it so that it is available for use in an organization’s security workflows.
While data is required, the cost of storing that data in security systems is quite expensive. This disconnect can be alleviated by giving businesses more choice of when and how to manage their data to better balance cost and risk. This includes whether to store data in the cloud, when or if that data gets processed, and if that data is available in real-time, and if so, for how long. Giving businesses choice changes the dynamic of the security market and increases the likelihood that security practices will be more effective.
Enter data lakes. Data can be stored in a data lake and then pulled into a system or process when it’s most needed, lessening the financial impact but keeping the valuable data. How does it work? Data lakes enable security teams to store the data that they don’t need right away and bring in the data at a later time. Teams then don’t have to pay for the data until it is extracted. This enables security operations to still have the data needed to efficiently and effectively drive daily operations but not lose data or have too much of it.
Data lakes live in the IT and security realms, but the concept is mirrored in the U.S. car dealership business. Car dealerships like to showcase their top-of-the-line cars. However, they often have more cars but have no room or can’t afford more space to house them. One solution dealerships leverage is housing these other cars in a less expensive, distant parking lot and bringing them into the mix at the dealership when they are needed.
The Economic Lessons that the History of Data Lakes Can Teach Us
There is a lot of chatter in the security world about data lakes. Data lakes are not new; they have been around for more than ten years. Let’s take a step back. Prior to data lakes, people had to make multiple copies of their data to avoid a single point of failure. Organizations had not just one but several hard drives—an expensive option. This meant double, or even triple, the cost to replicate data. For example, for every terabyte or petabyte of data, you had two or three times that amount when you factored in data backups.
Organizations used old-fashioned math to calculate the cost of these hard drives and data storage against the percentage of data that could be lost. Think of data spread over multiple hard drives with teams calculating the total cost for the redundancies. People relied on existing CPU technology and math calculations developed in the fifties to determine what data had been lost. They applied this thinking to data technologies, hard drives, and storage, which was the genesis of data lakes.
Data lakes create lower-cost storage that allows you to then respond to storage increases. When hard drive costs came down, people were able to store more data. Fast forward to modern times and our ability to store data and photos on our mobile phones. Most people don't have to delete photos or even think about the cost of storing a photo or the need to limit the amount of photos we take. When the cost of storage comes down to a point where it's almost free, you naturally start to store more things.
This history has taught us that behaviors can change based on economics. When costs outweigh the value, there is a disconnect prompting behaviors to adapt. In our personal lives, another example is the rising cost of streaming services when there is less video content to watch. The cost does not equate to the value. In the security world, this is what is happening with SIEM solutions.
Future of the Role of Data Lakes in SIEM
As cybersecurity takes a front-and-center role thanks to the staggering number of cyberattacks, security information and event management could accurately identify early signs of an attack, understand its effect, and provide valuable information to mitigate the attack. Data and access to data is critical to this management task, where security teams are faced with securing data from numerous applications, systems, and platforms. Data lakes have their challenges—they are often slower and have the potential to create data silos, but they nevertheless play a critical role in the process and give organizations choices in their data routing needs, spending, and strategies so they can do what is most important—protect their business and customers. Having a choice changes the dynamic of the security market and increases the likelihood that security practices will be seamless and more effective.