Alibaba Cloud having so many DTPlus
service among all Data Lake Analytics is demanding service by industry, Alibaba
Cloud Data Lake Analytics is a serverless interactive cloud native and
analytics service which is fully managed by Alibaba Cloud using Massive
Parallel Processing compute nodes managed by Alibaba Cloud So, no need to
maintain it its 0 maintenance service provided by Alibaba cloud and made
available for enterprise users on Pay As You Go mode. Alibaba Cloud Data Lake
Analytics service offering cloud native query using standard SQL interface with
SQL Compatibility and comprehensive built in functions. You can connect various
data sources using JDBC and ODBC connectors. Also Data Lake Analytics on
Alibaba cloud can integrate with BI product which help this service to turn in
to Big Data Insights and visualization. This also helps customer to provide
them cloud migration process in low migration cost. Alibaba Cloud Data Lake
Analytics offers to do complex analytics on data which may come from different
sources and in its formats. Using Alibaba Cloud Data Lake Analytics we can
analyse data which is stored on Alibaba Cloud Object Storage (OSS) or Table
Storage or we can also join the results and can generate new insights. Alibaba
Cloud Data Lake Analytics is powered by full Massive Parallel Processing
architecture (See Fig.) and can provide vectorized execution optimization,
operator pipelined execution optimization and multi tenancy resource allocation
and priority scheduling.
Using Alibaba Cloud Data Lake
Analytics we can analyse OSS Raw data like Logs, Csv, Json, Avro etc we can
execute query against specific OSS file folder and we can create table, search
the query and can integrate BI as well. Using Data Lake Analytics we can query
time series data, pipeline data, logs and Post ETL Data which is stored in
TableStore. Using DLA we can query single table store table or we may join
across multiple tables. Also we can Join across heterogeneous data sources like
we have data in OSS and Table Store both than we can JOIN query from data sources
and turn in to insights. Here data is isolated so only visible to data owner
once you activate data lake analytics the system will grant your account to
access permissions to the database.
Alibaba Cloud Data Lake Analytics offers so may
types of inbuilt functions like aggregation functions which ignore null values
and return null without input, also have binary functions and operators, Bitwise functions, conversion functions which helps to convert numeric and
character values to the required type casting, Date and Time Functions and
Operators, JSON Functions and Operators, Mathematical Functions and Operators,
String Functions and Operators and Window Functions. All the tables which we
create in DLA must have parent database schema and it must be unique in each of
your Alibaba cloud regions. Below is sample Table creation query which is more
like syntax of Hive query:
1.
CREATE
EXTERNAL TABLE nation_text_string (
2. N_NATIONKEY INT COMMENT 'column N_NATIONKEY',
3.
N_NAME STRING,
4. N_REGIONKEY INT,
5.
N_COMMENT STRING
6. )
7.
ROW
FORMAT DELIMITED FIELDS TERMINATED BY '|'
8. STORED AS TEXTFILE LOCATION 'oss://your-bucket/path/to/nation_text';
Alibaba Cloud Data Lake Analytics
is compatible with Serialization and Deserialization data records mechanism of
Hive including data files in CSV, Parquet, ORC, RCFile, Avro and JSON formats.
So make sure whenever we are creating any table from csv we need to take
appropriate SerDe ( Serialization and Deserialization data records) on the
basis on contents of CSV file.
For Ex:
1. CREATE EXTERNAL TABLE
test_csv_opencsvserde
2. (id STRING,
3. name STRING,
4. location STRING,
5. create_date STRING,
6. create_timestamp STRING,
7. longitude STRING,
8. latitude STRING
9. )
10. ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
11. with serdeproperties(
12. "separatorChar"=",",
13. "quoteChar"="\"",
14. "escapeChar"="\\"
15. )
16. STORED AS TEXTFILE LOCATION 'oss://test-bucket-julian-1/test_csv_serde_1';
2. (id STRING,
3. name STRING,
4. location STRING,
5. create_date STRING,
6. create_timestamp STRING,
7. longitude STRING,
8. latitude STRING
9. )
10. ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
11. with serdeproperties(
12. "separatorChar"=",",
13. "quoteChar"="\"",
14. "escapeChar"="\\"
15. )
16. STORED AS TEXTFILE LOCATION 'oss://test-bucket-julian-1/test_csv_serde_1';
No comments:
Post a Comment