In this article, we will discuss
about Big Data processing service in Alibaba cloud, E-MapReduce. E-MapReduce is
solution for Big Data processing on Alibaba cloud platforms. Basically it is
part of Alibaba cloud ECS service and which is based on open source Apache
Hadoop clusters and Apache Spark – in memory processing service. On E-MapReduce
we can also run queries of Apache hive, Apache Pig and HBase to analyse big
data and processing of big data on Alibaba cloud. Also Alibaba cloud
E-MapReduce provide us facility to import big data and also we can export big
data from many other public cloud data storage systems and other database
systems and of course it is well connected with OSS and Cloud RDS. E-MapReduce
is providing integrated Big Data solutions to manage your clusters using tools
like selection of Host, Deployment of environment, Building clusters,
Configuration of Clusters, Configuration of Jobs, Running Jobs, Management of
Clusters and monitoring of performance. Using E-MapReduce we can process procurement,
preparation, operations, maintenance of clusters etc we can manage so that user
can focus more on the application and its logic etc. As we know in Big Data
processing we have different types of processing such as Batch Processing, Real
time data processing, stream oriented data processing etc. So E-MapReduce
Service of Alibaba cloud we have flexible modes are available by which we can
select Hadoop services for daily statistics and batch processing and also we
can choose Spark services for stream oriented computation and real time
computations.
The main point in E-MapReduce is
clusters, Cluster is a basically of Spark or Hadoop Cluster on Alibaba Cloud
ECS. In Apache Hadoop we know there is combination of master and slave nodes
like namenode, datanode, resource manager and node manager etc. So namenode and
resource manager is Master nodes and datanode and nodemanager is slave nodes.
Image Ref: https://www.alibabacloud.com/help/doc-detail/28068.htm?spm=a2c63.p38356.b99.2.6c933d19BiCI36
In Alibaba Cloud, E-MapReduce
clusters is set of multiple layers which is built on ECS of Alibaba Cloud
Instance. There is HDFS layer above E-MapReduce Agent layer for distributed
file system. YARN is for resource management, complete spark core engine and
other spark libraries, Hbase, pig, hive, storm and notebooks like zeppelin is
integrated and top layer is E-MapReduce Web User Admin for configuration and
management.
Image Ref: https://www.alibabacloud.com/help/doc-detail/28068.htm?spm=a2c63.p38356.b99.2.6c933d19BiCI36
So, Alibaba Cloud E-MapReduce
clusters enough capable to implement various scenarios like offline big data
processing, ad-hoc data analysis queries, online massive scale data processing
services etc. E-MapReduce is deeply integrated with other Alibaba Cloud services
and offerings so that we ca use that as an input source or output source. Also
E-MapReduce is integrated with Resource Access and Permission management
systems so that we can isolate team access with primary and sub accounts.