Integrating Streaming Data into Snowflake Snowpipe with Cribl Stream

Cribl Stream users have successfully built security data lakes either as supplements to or alternatives for their traditional SIEM systems. Their key objectives are to decrease latency and reduce costs, regardless of the architecture in use. Snowflake has emerged as a preferred choice for constructing security data lakes due to its scalability and ease of use. Recently, Snowflake has rolled out a new feature for streaming data ingestion, which Stream is fully equipped to utilize. 

Cribl’s Integration with Snowflake 

Snowflake’s new Snowpipe Streaming feature simplifies the process of loading data by eliminating the need for file staging, as required by the original Snowpipe. Instead, the data is written directly into Snowflake as queryable rows. Currently, Apache Kafka serves as an intermediary, with AWS’s Managed Streaming for Apache Kafka (MSK) service being supported. 

By shifting from batch processing to real-time data streaming, latency is dramatically reduced—from minutes to seconds—while also lowering costs significantly. Snowflake’s internal tests show that ingesting 1 TB of data through Snowpipe Streaming costs under $50. 

How to Set Up Snowpipe Streaming 

Setting up streaming from Cribl Stream to Snowflake is relatively simple, aided by comprehensive documentation. You can consult the Quick Start Guide for Snowpipe Streaming with AWS MSK to utilize AWS’s managed Kafka service. Alternatively, you have the option to configure your own Kafka cluster or use Confluent’s managed service, as per Snowflake’s guidance. 

After setting up Kafka and Snowpipe Streaming to accept incoming data, configure Snowflake as the destination within Stream. In the Stream setup, you’ll need to enter Kafka’s broker and topic details to establish a connection between Cribl and Snowflake. 

Validating Data Ingestion 

Once the Kafka connection is live within Cribl Stream, data will start streaming into Snowflake. You can verify the process by accessing the Snowflake UI and inspecting the destination tables to ensure data is being ingested correctly. 

With this integration, you can efficiently stream security logs from various environments into Snowflake through Stream. Some users leverage this setup for managing large data volumes from sources like forensic endpoint logs and cloud activity. Since Snowflake doesn’t impose retention limits, it works well alongside existing SIEMs. Meanwhile, other users have fully shifted away from traditional SIEMs, using Stream to direct data into SOC platforms built on Snowflake. 

Real World Examples: A Fortune 500 Company’s Transformation 

A Fortune 500 consumer goods company recently overhauled its security infrastructure to enhance visibility and automate workflows after struggling with data ingestion limits in its legacy SIEM system. The security team sought to consolidate log data from cloud, SaaS, and on-premises environments, using data science to improve threat detection, especially for lateral movement across systems. By implementing Cribl Stream, they gained control over their data pipeline and routed their logs to AWS. From there, an alternative SIEM solution processed, normalized, and stored the data in Snowflake. This new SOC platform could detect threats across all normalized data sources in Snowflake, which was not possible when analyzing them separately. 

By adopting this architecture, the company tripled its security data analysis capacity, extended data retention from 90 days to one year, and cut its overall costs by half, saving over $1 million annually. These benefits are frequently experienced by organizations that transition from traditional SIEMs to an open security data lake architecture utilizing Cribl Stream and Snowflake. 

About Cribl 

Cribl enables organizations to transform their data strategies by offering solutions for collecting, processing, routing, and analyzing IT and security data. Cribl’s suite of products provides the flexibility and control necessary to address evolving challenges. 

Cribl Stream stands at the forefront of multiple data source integration technology, transforming the way organizations approach data management. By offering robust support for various data sources and flexible integration capabilities, Cribl not only simplifies the complexity of data aggregation but also enhances data quality and accessibility.Â