Bringing ETL to the Masses with APIs

I’m spending a lot of time lately thinking about emerging trends in API usage. One area I’m tracking on, is around companies that are offering simple services that provide interoperability and automation between cloud platforms, using APIs. The best known examples are from companies like If This Then That (IFTTT) and Zapier, who provide end users with simple icon based tools for defining tasks that move data between SaaS platforms.

Its tough to define this space, which in enterprise speak is really just Extract Transform Load (ETL), which is historically a process to migrate information between systems in an enterprise, using data sources and web services. But with the popularity of web APIs, we need to rethink ETL in the context of the cloud and update how we approach interoperability between the growing number of API driven platforms.

Mike Reich of Seabourne, and I were discussing this last time I was in Washington D.C., and he currently has a great blog post on rethinking ETL for the API age. If you don’t know Seabourne, they are behind high profile government projects like MyFCC and GovInfo. They have been doing a lot of thinking about how to acquire, process and publish content with APIs.

To better help me better understand the space, I’m watching 12 service providers that I’m putting in a bucket of what I consider the next generation of ETL providers:

Cloudwork - Cloudwork is a service that allows users to automate tasks between Google Apps, Salesforce, Evernote, Zoho, Twitter, Freshbooks, MailChimp, Zendesk, Dropbox, WordPress and others.

 

 

Elastic.io - elastic.io helps you to automate routine operations and connect multiple cloud APIs.

 

 

Foxweave - FoxWeave is a Cloud-Based Data Integration Platform that lets you easily migrate and synchronize data across all your Cloud and On-Premise applications and databases, without having to write any code.

 

 

If This Then That (IFTTT) - IFTTT is a service that enables customers to connect channels (i.e. Facebook, Evernote, Weather, Dropbox, etc.) with personally created or publicly shared Recipes.
itDuzzit - itDuzzit is a cloud integration platform like no other; simple enough for non-technical users, yet powerful enough to support the most complex integration.

 

 

MashableLogic - MashableLogic is a mashup development platform that provides a system for leveraging API's by turning them into re-usable components that can be combined to compose software solutions.
Rules.io - Overwhelmed with too much data and too many metrics?  The rules.io team drills into your data with you, giving you a concrete idea of which users you should talk to and how to engage them.At the core of rules.io is a rules and segmentation engine which captures user-centric data such as usage and behavior, purchases, and technical problems, and allows you to act on this information in real-time or via triggered automation.
SnapLogic - SnapLogic's modern web architecture, "containerized" Snaps, and thriving SnapStore ecosystem make it the only practical way to continually connect your company to the burgeoning number and diversity of data sources and cloud applications. 
SortMyBox - Like e-mail filters, for your files in the cloud. SortMyBox is a magic folder that moves files to folders based on your rules.
Wappwolf - Wappwolf is focused on deconstructing the barriers of the Cloud, by connecting your Evernote, Facebook, Flickr, and other web services / apps to Dropbox, allowing users to drag & drop files into a predefined folder on Dropbox and automatically convert and sync to your favorite places.
Yahoo Pipes - Pipes is a composition tool to aggregate, manipulate, and mashup content from around the web. Like Unix pipes, simple commands can be combined together to create output that meets your needs.
Zapier - Zapier lets SaaS users create integrations that push data between hundreds of best-in-breed web applications without having to write any code or wrangle APIs.

I feel that this new group of ETL providers are focused sufficiently on the new world of API driven platforms, beyond just "data sources" and "web services". There are more providers like Apatar and Jitterbit, but I feel these providers are more ETL than, than reflecting new breed of API interoperability providers (which I don’t have proper name for yet). One characteristic I’m looking for is ease of use. I think this new breed should be more for end-users, SaaS users and ultimately empowering the masses--not IT, which I feel ETL is designed for.

This new generation of interoperability providers should be about solving the problems average users face when operating online and in the cloud, and not just extracting, transforming and loading--making ETL something anyone can understand and put to use.  Further democratizing IT resources, which is a common theme with SaaS, APIs, etc.

I think the new players like Zapier and IFTTT have to be concerned with getting too complex for everyday users, become more ETL, than being about solving end-user problems. I think we will need very specialized, niche implementations--much like we are seeing with BaaS and the emergence of specialized versions like BaaS for gaming. Very niche versions of ETL like SortMyBox and WapWolf Dropbox Automator, which focus only on solving file management problem users are facing, will be successful, because they are easy for users to understand and put to work.

I think there is an opportunity to quickly build the next generation ETL solutions that solve a set of automation, integration and interoperability problems for a niche audience using the simple icon based format we are seeing from providers like IFTT and Elastic.io. If you are considering building one of these new solutions, make sure and evaluate existing open source ETL solutions like:

AutomateIt - AutomateIt is an open source tool for automating the setup and maintenance of servers, applications and their dependencies, providing a way to manage files, packages, services, networks, accounts, roles, templates and more. 
CloverETL - CloverETL is open source data integration platform based on Java. It can be used for data migration, data cleansing and other data transformation tasks.
Pentaho - Kettle - Delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach. With an intuitive, graphical, drag and drop design environment, and a proven, scalable, standards-based architecture, Pentaho Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools.
Talend - Talend offers open source middleware solutions that address big data integration, data management and application integration needs for businesses of all sizes.

Don’t reinvent the wheel, by rolling out your own ETL framework if at all possible. This will allow you to focus on the specific acquire, transform and publish needs of your niche audience. Even some of the new breed of providers are providing open source tools, like the Geekier project from Rules.io, so do your homework. Also consider putting existing API aggregation providers like Singly to use, providing faster acquire and publish API connections with popular APIs, while also standardizing the objects across all platforms.

I can envision video, photo, quantified-self and other niche services emerge that will empower any user to put APIs to use and acquire, transform and publish the valuable data, information and other content that are central to our increasingly fragmented online lives. These new solutions will be framed in easy to understand terms that speak to specific audiences, putting valuable API resources within reach of the masses.