Suketu Nayak's Blog: Serverless Data Lake Analytics service in Alibaba Cloud

Alibaba Cloud having so many DTPlus service among all Data Lake Analytics is demanding service by industry, Alibaba Cloud Data Lake Analytics is a serverless interactive cloud native and analytics service which is fully managed by Alibaba Cloud using Massive Parallel Processing compute nodes managed by Alibaba Cloud So, no need to maintain it its 0 maintenance service provided by Alibaba cloud and made available for enterprise users on Pay As You Go mode. Alibaba Cloud Data Lake Analytics service offering cloud native query using standard SQL interface with SQL Compatibility and comprehensive built in functions. You can connect various data sources using JDBC and ODBC connectors. Also Data Lake Analytics on Alibaba cloud can integrate with BI product which help this service to turn in to Big Data Insights and visualization. This also helps customer to provide them cloud migration process in low migration cost. Alibaba Cloud Data Lake Analytics offers to do complex analytics on data which may come from different sources and in its formats. Using Alibaba Cloud Data Lake Analytics we can analyse data which is stored on Alibaba Cloud Object Storage (OSS) or Table Storage or we can also join the results and can generate new insights. Alibaba Cloud Data Lake Analytics is powered by full Massive Parallel Processing architecture (See Fig.) and can provide vectorized execution optimization, operator pipelined execution optimization and multi tenancy resource allocation and priority scheduling.

Using Alibaba Cloud Data Lake Analytics we can analyse OSS Raw data like Logs, Csv, Json, Avro etc we can execute query against specific OSS file folder and we can create table, search the query and can integrate BI as well. Using Data Lake Analytics we can query time series data, pipeline data, logs and Post ETL Data which is stored in TableStore. Using DLA we can query single table store table or we may join across multiple tables. Also we can Join across heterogeneous data sources like we have data in OSS and Table Store both than we can JOIN query from data sources and turn in to insights. Here data is isolated so only visible to data owner once you activate data lake analytics the system will grant your account to access permissions to the database.

Alibaba Cloud Data Lake Analytics offers so may types of inbuilt functions like aggregation functions which ignore null values and return null without input, also have binary functions and operators, Bitwise functions, conversion functions which helps to convert numeric and character values to the required type casting, Date and Time Functions and Operators, JSON Functions and Operators, Mathematical Functions and Operators, String Functions and Operators and Window Functions. All the tables which we create in DLA must have parent database schema and it must be unique in each of your Alibaba cloud regions. Below is sample Table creation query which is more like syntax of Hive query:

1. CREATE EXTERNAL TABLE nation_text_string (

2. N_NATIONKEY INT COMMENT 'column N_NATIONKEY',

3. N_NAME STRING,

4. N_REGIONKEY INT,

5. N_COMMENT STRING

6. )

7. ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'

8. STORED AS TEXTFILE LOCATION 'oss://your-bucket/path/to/nation_text';

Alibaba Cloud Data Lake Analytics is compatible with Serialization and Deserialization data records mechanism of Hive including data files in CSV, Parquet, ORC, RCFile, Avro and JSON formats. So make sure whenever we are creating any table from csv we need to take appropriate SerDe ( Serialization and Deserialization data records) on the basis on contents of CSV file.

For Ex:

1.     CREATE EXTERNAL TABLE test_csv_opencsvserde
2. (id STRING,
3.       name STRING,
4.       location STRING,
5.       create_date STRING,
6.       create_timestamp STRING,
7.       longitude STRING,
8.       latitude STRING
9.       )
10.     ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
11.     with serdeproperties(
12.     "separatorChar"=",",
13.     "quoteChar"="\"",
14.     "escapeChar"="\\"
15.     )
16.     STORED AS TEXTFILE LOCATION 'oss://test-bucket-julian-1/test_csv_serde_1';

Suketu Nayak's Blog

Tuesday, September 11, 2018

Serverless Data Lake Analytics service in Alibaba Cloud

No comments:

About Me