Today I want to show a little bit about the project that I have been working on the last 3 years called “Red Sqirl”.
Red Sqirl is a web based big data application that simplifies the analysis of large data sets.
I’m going to talk a little bit about the Architecture but you can have a look here: http://www.redsqirl.com to see all other details.
Red Sqirl is a web application that you can install directly on top of a Hadoop Cluster. Current available only for Tomcat.
Uses Tomcat as a web service, but when you are logging in, it will create another process owned by the logged in user and make key components available on RMI. Every action on the application is run through the users’ process to avoid permission conflicts.
Architecture – java web application based in JSF framework and HTML5. You can see all the source on github. https://github.com/idiro/redsqirl
Red Sqirl contains a drag & drop view. The user drag objects to a canvas to build a workflow. The technology here is kinetic.js – you can see a basic intro: https://igfasouza.wordpress.com/2014/01/08/html5-the-future-kineticjs/
The canvas is where a workflow is contained which is used in Red Sqirl to manage a jobs processes or flow. A workflow is a build up of processes that chain together and perform corrective actions that produce a desired output. It is a way of managing a job so that each aspect of the job can be modified to use desired parameters.
Basic the user double click one object and configure fill parameters to performance a task. This task are submitted to Oozie and Oozie manages workflows so they can be run in parallel to other jobs.
More about Oozie – http://oozie.apache.org/
Red Sqirl runs in parallel with the Hadoop platform and other Hadoop Technologies. The main technologies for storage are Hive and HDFS. These jobs can be saved and be used again in the future. Saved jobs can be open and modified to be run with different parameters. The output of these jobs will be saved to appropriate storage facility (Hive or HDFS)
Hadoop is a distributed system that allows for MapReduce processes to be run over the data that is stored in these technologies.
More about Hadoop – http://hadoop.apache.org/
Once you finish you can share your canvas. We call this a Model. A model can be shared in a marketplace to others users. http://marketplace.redsqirl.com/
Just need fill a form with some information about and upload the zip file.
Red Sqirl is extensible, so you can create a new plug-in using a new technology. We call this a Package. Packages are groups of actions that are used to perform specific processes on Red Sqirl. You can see here how to install or upload a Package: http://www.redsqirl.com/packagemanagement.html and here how to create : http://www.redsqirl.com/pckdev.html
You can see all Model an Packages here: http://www.redsqirl.com/search.html
Red Sqirl support the all trending Hadoop Ecosystem APIs. The idea is a online that you can do Data Analysis in a simple way. You don’t need know the PIG syntax and the Spark syntax to use than.
In a future I’ll do a first step Red Sqirl tutorial.