Story Horse?
I was talking with some friends in IT and found that most did not know what “SNA” means, but at the same time I have seen several articles about “Big Data” is the future. “Big Data” is a hot topic but, in many ways, we are still trying to define what the phrase “Big Data” means. Would be “Big Data” and related the future? (this is a topic for a new post.) I’ll just give a brief explanation of this new world.
Would be just another Buzzword?
Ok SNA means “Social network analysis”
A Brief Introduction
Social network analysis: A methodological introduction
Big Data
What it is and why it matters
To complete the alphabet soup we will see two more acronyms. “NoSQL” and “HADOOP”
NoSQL
Martinfowler NoSQL
Hadoop
Hadoop
Facebook claims to have over 30 petabytes information. If you were to store all of this data on 1TB hard disks and stack them on top of one another, you would have a tower twice as high as the Empire State building in New York. The height of this tower illustrates that processing and analyzing such data need to take place in a distributed process on multiple machines rather than on a single system. However, this kind of processing has always been very complex, and much time is spent solving recurring problems, like processing in parallel, distributing data to the compute nodes, and, in particular, handling errors during processing. To free developers from these repetitive tasks, Google introduced the “MapReduce” framework.
An example of MapReduce
Google developed an abstraction layer that splits the data flow into two main phases: the map phase and the reduce phase. In a style similar to functional programming, computations can take place in parallel on multiple computers in the map phase. The same thing also applies to the reduce phase, so that MapReduce applications can be massively parallelized on a computer cluster.
The popularity of MapReduce become wich Apache Hadoop open source implementation. Hadoop can be installed on standard hardware and possesses excellent scaling characteristics, which means you can run it on a cluster and then extend the cluster dynamically by purchasing more computers.
Apache Hadoop Project is an open source implementation of Google’s distributed filesystem (Google File System, GFS) and the MapReduce Framework. One of the most important supporters promoting this project over the years is Yahoo. Today, many other well-known enterprises, such as Facebook and IBM, as well as an active community, contribute to development.
The hadoop has one Ecosystem (but again this is a topic for a new post.)
More alphabet soup;
HDFS, Hive, Hbase, Cassandra, Pig, ZooKeeper, Phoenix, Oozie, Presto and more … a world without end!
The era of Polyglot Persistence has begun (again this is a topic for a new post.)
Books;
Models and Methods in Social Network Analysis – John Scott;
Social Network Analysis: Methods and Applications – Stanley Wasserman;
Social Network Analysis (Quantitative Applications in the Social Sciences) – David Knoke