The MessageGears marketing console is an enterprise Java application that connects directly to customer data sources to allow our users to segment users and personalize messages. Traditionally, we’ve leveraged JDBC (Java Database Connectivity) drivers to standardize connectivity to dozens of traditional and modern databases — which work well across nearly any standard data source and give us the flexibility to connect to dozens of customer data stores. Today, however, I’m happy to share that MessageGears now leverages parallel processing in Snowflake extracts to support faster and more efficient data transfer to power our Segment, Message, and Engage products.
MessageGears uses live customer data to Segment, Message, and Engage with CRM-based recipients. Our customers generate thousands of segments and deliver personalized messages with rich contextual data to billions of recipients every month. During the process, large amounts of data is extracted from customer data stores. Our experience has shown that only leveraging JDBC drivers will get the job done, but not with great efficiency and they typically don’t follow preferred design patterns.
Java database connectivity has been a standard for the past 20 years, and every emerging data store offers a JDBC driver to support adoption, even if there are limitations. Modern Cloud data stores (Snowflake, Google BigQuery, among others) are designed to “break the mold” of traditional database engines and offer methods to load, explore, manipulate, and extract the ever-growing amount of data large enterprises store today. Typically, full capabilities for modern cloud data stores are provided via direct APIs. Additionally, cloud data store providers offer preferred design patterns for increased performance on large operations. However, to fit into the data store market, these vendors also provide JDBC drivers to ease integration and increase usage. Basic JDBC drivers will offer standard connectivity and basic SQL (structured query language) support.
MessageGears has and always will embrace modern data store technologies. With such commitment, we’ve integrated Snowflake Data Cloud connection using a mixture of native JDBC connection and native API patterns. Our goal was to speed up counting, extracting and loading large amounts of data. Our results exceeded expectations and empowered our customers to manipulate tens of millions of rows of data in record time.
The key power to a native Snowflake integration is leveraging the true nature of a massively parallel processing (MPP) database. Snowflake stores and processes data on thousands of servers in a very distributed manner. Typical JDBC drivers will require the datastore to funnel all results back through a single connection, thereby making something that starts off parallel back into a serial process. Further JDBC drivers funnel data through a generic data store connection that isn’t configured specifically for the data store vendor and is rarely recommended for large data extracts. Our observations showed that, as result sets grew in width (more columns returned), the overall query time grew linearly regardless of data size.
However, MessageGears utilizes Snowflake’s MPP power through a mixture of JDBC and API connection patterns in a simple three-step process:
- Utilize Snowflake’s Amazon S3 Integration to create a Snowflake destination with S3 as the data backend
- Place data into stage storage using Snowflake’s COPY INTO command, utilizing Snowflake’s MPP to create temporary storage
- Process that resulting dataset in parallel, using MessageGears’ API Data Processing to quickly extract and prepare the recipient data set
When using our Snowflake integration, MessageGears is enabling Snowflake to execute commands in the most efficient manner, with a net result of a 5-10X improvement for large data extracts.
Benchmarks using the JDBC connector showed 50 million recipient record sets with 200 attributes taking over three hours to extract for processing to deliver personalized messages. After converting to the native integration, our team observed the exact same extract taking 12 minutes to perform the same operation. The dramatic speed improvement provides our customer’s more real-time personalization information and nimble UI operations to count and segment audiences for improved marketing.
Giving our users a best-in-class product is always our primary concern, and integrating newer technologies within our product is just one way we deliver value to our customers. Have a data store you’d like to see us integrate with? Let us know @Messagegears.