Tuesday, September 18, 2018

Big Data Processing using E-MapReduce in Alibaba Cloud


                   


             In this article, we will discuss about Big Data processing service in Alibaba cloud, E-MapReduce. E-MapReduce is solution for Big Data processing on Alibaba cloud platforms. Basically it is part of Alibaba cloud ECS service and which is based on open source Apache Hadoop clusters and Apache Spark – in memory processing service. On E-MapReduce we can also run queries of Apache hive, Apache Pig and HBase to analyse big data and processing of big data on Alibaba cloud. Also Alibaba cloud E-MapReduce provide us facility to import big data and also we can export big data from many other public cloud data storage systems and other database systems and of course it is well connected with OSS and Cloud RDS. E-MapReduce is providing integrated Big Data solutions to manage your clusters using tools like selection of Host, Deployment of environment, Building clusters, Configuration of Clusters, Configuration of Jobs, Running Jobs, Management of Clusters and monitoring of performance. Using E-MapReduce we can process procurement, preparation, operations, maintenance of clusters etc we can manage so that user can focus more on the application and its logic etc. As we know in Big Data processing we have different types of processing such as Batch Processing, Real time data processing, stream oriented data processing etc. So E-MapReduce Service of Alibaba cloud we have flexible modes are available by which we can select Hadoop services for daily statistics and batch processing and also we can choose Spark services for stream oriented computation and real time computations.


                    The main point in E-MapReduce is clusters, Cluster is a basically of Spark or Hadoop Cluster on Alibaba Cloud ECS. In Apache Hadoop we know there is combination of master and slave nodes like namenode, datanode, resource manager and node manager etc. So namenode and resource manager is Master nodes and datanode and nodemanager is slave nodes.




Image Ref: https://www.alibabacloud.com/help/doc-detail/28068.htm?spm=a2c63.p38356.b99.2.6c933d19BiCI36




                In Alibaba Cloud, E-MapReduce clusters is set of multiple layers which is built on ECS of Alibaba Cloud Instance. There is HDFS layer above E-MapReduce Agent layer for distributed file system. YARN is for resource management, complete spark core engine and other spark libraries, Hbase, pig, hive, storm and notebooks like zeppelin is integrated and top layer is E-MapReduce Web User Admin for configuration and management.





Image Ref: https://www.alibabacloud.com/help/doc-detail/28068.htm?spm=a2c63.p38356.b99.2.6c933d19BiCI36


                So, Alibaba Cloud E-MapReduce clusters enough capable to implement various scenarios like offline big data processing, ad-hoc data analysis queries, online massive scale data processing services etc. E-MapReduce is deeply integrated with other Alibaba Cloud services and offerings so that we ca use that as an input source or output source. Also E-MapReduce is integrated with Resource Access and Permission management systems so that we can isolate team access with primary and sub accounts.

1 comment:

b2bcontactlists said...

Very impressive article! The blog is highly informative and has answered all my questions.
To introduce about our company andthe activities, B2B contact list is a database provider
that helps you to boost your sales& grow your business through well-build
Alibaba Cloud Users Email list.