Commit 7ec3fb68 authored by shf's avatar shf

v0.0

parents
![Datax-logo](https://github.com/alibaba/DataX/blob/master/images/DataX-logo.jpg)
# DataX
DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能。
# Features
DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。
# DataX详细介绍
##### 请参考:[DataX-Introduction](https://github.com/alibaba/DataX/wiki/DataX-Introduction)
# Quick Start
##### Download [DataX下载地址](http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz)
##### 请点击:[Quick Start](https://github.com/alibaba/DataX/wiki/Quick-Start)
* [配置示例:从MySQL读取数据 写入ODPS](https://github.com/alibaba/DataX/wiki/Quick-Start)
* [配置定时任务](https://github.com/alibaba/DataX/wiki/%E9%85%8D%E7%BD%AE%E5%AE%9A%E6%97%B6%E4%BB%BB%E5%8A%A1%EF%BC%88Linux%E7%8E%AF%E5%A2%83%EF%BC%89)
* [动态传入参数](https://github.com/alibaba/DataX/wiki/%E5%8A%A8%E6%80%81%E4%BC%A0%E5%85%A5%E5%8F%82%E6%95%B0)
# Support Data Channels
DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:[DataX数据源参考指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels)
| 类型 | 数据源 | Reader(读) | Writer(写) |
| ------------ | ---------- | :-------: | :-------: |
| RDBMS 关系型数据库 | Mysql | √ | √ |
| | Oracle | √ | √ |
| | SqlServer | √ | √ |
| | Postgresql | √ | √ |
| | DRDS | √ | √ |
| | 达梦 | √ | √ |
| 阿里云数仓数据存储 | ODPS | √ | √ |
| | ADS | | √ |
| | OSS | √ | √ |
| | OCS | √ | √ |
| NoSQL数据存储 | OTS | √ | √ |
| | Hbase0.94 | √ | √ |
| | Hbase1.1 | √ | √ |
| | MongoDB | √ | √ |
| 无结构化数据存储 | TxtFile | √ | √ |
| | FTP | √ | √ |
| | HDFS | √ | √ |
# 我要开发新的插件
请点击:[DataX插件开发宝典](https://github.com/alibaba/DataX/wiki/DataX%E6%8F%92%E4%BB%B6%E5%BC%80%E5%8F%91%E5%AE%9D%E5%85%B8)
# 项目成员
核心Contributions: 光戈、一斅、祁然、云时
感谢天烬、巴真、静行对DataX做出的贡献。
# License
This software is free to use under the Apache License [Apache license](https://github.com/alibaba/DataX/blob/master/license.txt).
#
请及时提出issue给我们。请前往:[DataxIssue](https://github.com/alibaba/DataX/issues)
# 开源版DataX企业用户
![Datax-logo](https://github.com/alibaba/DataX/blob/master/images/datax-enterprise-users.jpg)
```
长期招聘 联系邮箱:hanfa.shf@alibaba-inc.com
【JAVA开发职位】
职位名称:JAVA资深开发工程师/专家/高级专家
工作年限 : 2年以上
学历要求 : 本科(如果能力靠谱,这些都不是条件)
期望层级 : P6/P7/P8
岗位描述:
1. 负责阿里云大数据平台(数加)的开发设计。
2. 负责面向政企客户的大数据相关产品开发;
3. 利用大规模机器学习算法挖掘数据之间的联系,探索数据挖掘技术在实际场景中的产品应用 ;
4. 一站式大数据开发平台
5. 大数据任务调度引擎
6. 任务执行引擎
7. 任务监控告警
8. 海量异构数据同步
岗位要求:
1. 拥有3年以上JAVA Web开发经验;
2. 熟悉Java的基础技术体系。包括JVM、类装载、线程、并发、IO资源管理、网络;
3. 熟练使用常用Java技术框架、对新技术框架有敏锐感知能力;深刻理解面向对象、设计原则、封装抽象;
4. 熟悉HTML/HTML5和JavaScript;熟悉SQL语言;
5. 执行力强,具有优秀的团队合作精神、敬业精神;
6. 深刻理解设计模式及应用场景者加分;
7. 具有较强的问题分析和处理能力、比较强的动手能力,对技术有强烈追求者优先考虑;
8. 对高并发、高稳定可用性、高性能、大数据处理有过实际项目及产品经验者优先考虑;
9. 有大数据产品、云产品、中间件技术解决方案者优先考虑。
````
钉钉用户请扫描以下二维码进行讨论:
![DataX-OpenSource-Dingding](https://raw.githubusercontent.com/alibaba/DataX/master/images/datax-opensource-dingding.png)
# DataX ADS写入
---
## 1 快速介绍
<br />
欢迎ADS加入DataX生态圈!ADSWriter插件实现了其他数据源向ADS写入功能,现有DataX所有的数据源均可以无缝接入ADS,实现数据快速导入ADS。
ADS写入预计支持两种实现方式:
* ADSWriter 支持向ODPS中转落地导入ADS方式,优点在于当数据量较大时(>1KW),可以以较快速度进行导入,缺点引入了ODPS作为落地中转,因此牵涉三方系统(DataX、ADS、ODPS)鉴权认证。
* ADSWriter 同时支持向ADS直接写入的方式,优点在于小批量数据写入能够较快完成(<1KW),缺点在于大数据导入较慢。
注意:
> 如果从ODPS导入数据到ADS,请用户提前在源ODPS的Project中授权ADS Build账号具有读取你源表ODPS的权限,同时,ODPS源表创建人和ADS写入属于同一个阿里云账号。
-
> 如果从非ODPS导入数据到ADS,请用户提前在目的端ADS空间授权ADS Build账号具备Load data权限。
以上涉及ADS Build账号请联系ADS管理员提供。
## 2 实现原理
ADS写入预计支持两种实现方式:
### 2.1 Load模式
DataX 将数据导入ADS为当前导入任务分配的ADS项目表,随后DataX通知ADS完成数据加载。该类数据导入方式实际上是写ADS完成数据同步,由于ADS是分布式存储集群,因此该通道吞吐量较大,可以支持TB级别数据导入。
![中转导入](http://aligitlab.oss-cn-hangzhou-zmf.aliyuncs.com/uploads/cdp/cdp/f805dea46b/_____2015-04-10___12.06.21.png)
1. CDP底层得到明文的 jdbc://host:port/dbname + username + password + table, 以此连接ADS, 执行show grants; 前置检查该用户是否有ADS中目标表的Load Data或者更高的权限。注意,此时ADSWriter使用用户填写的ADS用户名+密码信息完成登录鉴权工作。
2. 检查通过后,通过ADS中目标表的元数据反向生成ODPS DDL,在ODPS中间project中,以ADSWriter的账户建立ODPS表(非分区表,生命周期设为1-2Day), 并调用ODPSWriter把数据源的数据写入该ODPS表中。
注意,这里需要使用中转ODPS的账号AK向中转ODPS写入数据。
3. 写入完成后,以中转ODPS账号连接ADS,发起Load Data From ‘odps://中转project/中转table/' [overwrite] into adsdb.adstable [partition (xx,xx=xx)]; 这个命令返回一个Job ID需要记录。
注意,此时ADS使用自己的Build账号访问中转ODPS,因此需要中转ODPS对这个Build账号提前开放读取权限。
4. 连接ADS一分钟一次轮询执行 select state from information_schema.job_instances where job_id like ‘$Job ID’,查询状态,注意这个第一个一分钟可能查不到状态记录。
5. Success或者Fail后返回给用户,然后删除中转ODPS表,任务结束。
上述流程是从其他非ODPS数据源导入ADS流程,对于ODPS导入ADS流程使用如下流程:
![直接导入](http://aligitlab.oss-cn-hangzhou-zmf.aliyuncs.com/uploads/cdp/cdp/b3a76459d1/_____2015-04-10___12.06.25.png)
### 2.2 Insert模式
DataX 将数据直连ADS接口,利用ADS暴露的INSERT接口直写到ADS。该类数据导入方式写入吞吐量较小,不适合大批量数据写入。有如下注意点:
* ADSWriter使用JDBC连接直连ADS,并只使用了JDBC Statement进行数据插入。ADS不支持PreparedStatement,故ADSWriter只能单行多线程进行写入。
* ADSWriter支持筛选部分列,列换序等功能,即用户可以填写列。
* 考虑到ADS负载问题,建议ADSWriter Insert模式建议用户使用TPS限流,最高在1W TPS。
* ADSWriter在所有Task完成写入任务后,Job Post单例执行flush工作,保证数据在ADS整体更新。
## 3 功能说明
### 3.1 配置样例
* 这里使用一份从内存产生到ADS,使用Load模式进行导入的数据。
```
{
"job": {
"setting": {
"speed": {
"channel": 2
}
},
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [
{
"value": "DataX",
"type": "string"
},
{
"value": "test",
"type": "bytes"
}
],
"sliceRecordCount": 100000
}
},
"writer": {
"name": "adswriter",
"parameter": {
"odps": {
"accessId": "xxx",
"accessKey": "xxx",
"account": "xxx@aliyun.com",
"odpsServer": "xxx",
"tunnelServer": "xxx",
"accountType": "aliyun",
"project": "transfer_project"
},
"writeMode": "load",
"url": "127.0.0.1:3306",
"schema": "schema",
"table": "table",
"username": "username",
"password": "password",
"partition": "",
"lifeCycle": 2,
"overWrite": true,
}
}
}
]
}
}
```
* 这里使用一份从内存产生到ADS,使用Insert模式进行导入的数据。
```
{
"job": {
"setting": {
"speed": {
"channel": 2
}
},
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [
{
"value": "DataX",
"type": "string"
},
{
"value": "test",
"type": "bytes"
}
],
"sliceRecordCount": 100000
}
},
"writer": {
"name": "adswriter",
"parameter": {
"writeMode": "insert",
"url": "127.0.0.1:3306",
"schema": "schema",
"table": "table",
"column": ["*"],
"username": "username",
"password": "password",
"partition": "id,ds=2015"
}
}
}
]
}
}
```
### 3.2 参数说明 (用户配置规格)
* **url**
* 描述:ADS连接信息,格式为"ip:port"。
* 必选:是 <br />
* 默认值:无 <br />
* **schema**
* 描述:ADS的schema名称。
* 必选:是 <br />
* 默认值:无 <br />
* **username**
* 描述:ADS对应的username,目前就是accessId <br />
* 必选:是 <br />
* 默认值:无 <br />
* **password**
* 描述:ADS对应的password,目前就是accessKey <br />
* 必选:是 <br />
* 默认值:无 <br />
* **table**
* 描述:目的表的表名称。
* 必选:是 <br />
* 默认值:无 <br />
* **partition**
* 描述:目标表的分区名称,当目标表为分区表,需要指定该字段。
* 必选:否 <br />
* 默认值:无 <br />
* **writeMode**
* 描述:支持Load和Insert两种写入模式
* 必选:是 <br />
* 默认值:无 <br />
* **column**
* 描述:目的表字段列表,可以为["*"],或者具体的字段列表,例如["a", "b", "c"]
* 必选:是 <br />
* 默认值:无 <br />
* **overWrite**
* 描述:ADS写入是否覆盖当前写入的表,true为覆盖写入,false为不覆盖(追加)写入。当writeMode为Load,该值才会生效。
* 必选:是 <br />
* 默认值:无 <br />
* **lifeCycle**
* 描述:ADS 临时表生命周期。当writeMode为Load时,该值才会生效。
* 必选:是 <br />
* 默认值:无 <br />
* **batchSize**
* 描述:ADS 提交数据写的批量条数,当writeMode为insert时,该值才会生效。
* 必选:writeMode为insert时才有用 <br />
* 默认值:32 <br />
* **bufferSize**
* 描述:DataX数据收集缓冲区大小,缓冲区的目的是攒一个较大的buffer,源头的数据首先进入到此buffer中进行排序,排序完成后再提交ads写。排序是根据ads的分区列模式进行的,排序的目的是数据顺序对ADS服务端更友好,出于性能考虑。bufferSize缓冲区中的数据会经过batchSize批量提交到ADS中,一般如果要设置bufferSize,设置bufferSize为batchSize数量的多倍。当writeMode为insert时,该值才会生效。
* 必选:writeMode为insert时才有用 <br />
* 默认值:默认不配置不开启此功能 <br />
### 3.3 类型转换
| DataX 内部类型| ADS 数据类型 |
| -------- | ----- |
| Long |int, tinyint, smallint, int, bigint|
| Double |float, double, decimal|
| String |varchar |
| Date |date |
| Boolean |bool |
| Bytes |无 |
注意:
* multivalue ADS支持multivalue类型,DataX对于该类型支持待定?
## 4 插件约束
如果Reader为ODPS,且ADSWriter写入模式为Load模式时,ODPS的partition只支持如下三种配置方式(以两级分区为例):
```
"partition":["pt=*,ds=*"] (读取test表所有分区的数据)
"partition":["pt=1,ds=*"] (读取test表下面,一级分区pt=1下面的所有二级分区)
"partition":["pt=1,ds=hangzhou"] (读取test表下面,一级分区pt=1下面,二级分区ds=hz的数据)
```
## 5 性能报告(线上环境实测)
### 5.1 环境准备
### 5.2 测试报告
## 6 FAQ
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-all</artifactId>
<version>0.0.1-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>adswriter</artifactId>
<name>adswriter</name>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-common</artifactId>
<version>${datax-project-version}</version>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-core</artifactId>
<version>${datax-project-version}</version>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>plugin-rdbms-util</artifactId>
<version>${datax-project-version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-exec</artifactId>
<version>1.3</version>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>odpswriter</artifactId>
<version>${datax-project-version}</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.31</version>
</dependency>
<dependency>
<groupId>commons-configuration</groupId>
<artifactId>commons-configuration</artifactId>
<version>1.10</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- compiler plugin -->
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>${project-sourceEncoding}</encoding>
</configuration>
</plugin>
<!-- assembly plugin -->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptors>
<descriptor>src/main/assembly/package.xml</descriptor>
</descriptors>
<finalName>datax</finalName>
</configuration>
<executions>
<execution>
<id>dwzip</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
<assembly
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
<id></id>
<formats>
<format>dir</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<fileSets>
<fileSet>
<directory>src/main/resources</directory>
<includes>
<include>plugin.json</include>
<include>config.properties</include>
<include>plugin_job_template.json</include>
</includes>
<outputDirectory>plugin/writer/adswriter</outputDirectory>
</fileSet>
<fileSet>
<directory>target/</directory>
<includes>
<include>adswriter-0.0.1-SNAPSHOT.jar</include>
</includes>
<outputDirectory>plugin/writer/adswriter</outputDirectory>
</fileSet>
</fileSets>
<dependencySets>
<dependencySet>
<useProjectArtifact>false</useProjectArtifact>
<outputDirectory>plugin/writer/adswriter/libs</outputDirectory>
<scope>runtime</scope>
</dependencySet>
</dependencySets>
</assembly>
package com.alibaba.datax.plugin.writer.adswriter;
public class AdsException extends Exception {
private static final long serialVersionUID = 1080618043484079794L;
public final static int ADS_CONN_URL_NOT_SET = -100;
public final static int ADS_CONN_USERNAME_NOT_SET = -101;
public final static int ADS_CONN_PASSWORD_NOT_SET = -102;
public final static int ADS_CONN_SCHEMA_NOT_SET = -103;
public final static int JOB_NOT_EXIST = -200;
public final static int JOB_FAILED = -201;
public final static int ADS_LOADDATA_SCHEMA_NULL = -300;
public final static int ADS_LOADDATA_TABLE_NULL = -301;
public final static int ADS_LOADDATA_SOURCEPATH_NULL = -302;
public final static int ADS_LOADDATA_JOBID_NOT_AVAIL = -303;
public final static int ADS_LOADDATA_FAILED = -304;
public final static int ADS_TABLEMETA_SCHEMA_NULL = -404;
public final static int ADS_TABLEMETA_TABLE_NULL = -405;
public final static int OTHER = -999;
private int code = OTHER;
private String message;
public AdsException(int code, String message, Throwable e) {
super(message, e);
this.code = code;
this.message = message;
}
@Override
public String getMessage() {
return "Code=" + this.code + " Message=" + this.message;
}
}
package com.alibaba.datax.plugin.writer.adswriter;
import com.alibaba.datax.common.spi.ErrorCode;
public enum AdsWriterErrorCode implements ErrorCode {
REQUIRED_VALUE("AdsWriter-00", "您缺失了必须填写的参数值."),
NO_ADS_TABLE("AdsWriter-01", "ADS表不存在."),
ODPS_CREATETABLE_FAILED("AdsWriter-02", "创建ODPS临时表失败,请联系ADS 技术支持"),
ADS_LOAD_TEMP_ODPS_FAILED("AdsWriter-03", "ADS从ODPS临时表导数据失败,请联系ADS 技术支持"),
TABLE_TRUNCATE_ERROR("AdsWriter-04", "清空 ODPS 目的表时出错."),
CREATE_ADS_HELPER_FAILED("AdsWriter-05", "创建ADSHelper对象出错,请联系ADS 技术支持"),
ODPS_PARTITION_FAILED("AdsWriter-06", "ODPS Reader不允许配置多个partition,目前只支持三种配置方式,\"partition\":[\"pt=*,ds=*\"](读取test表所有分区的数据); \n" +
"\"partition\":[\"pt=1,ds=*\"](读取test表下面,一级分区pt=1下面的所有二级分区); \n" +
"\"partition\":[\"pt=1,ds=hangzhou\"](读取test表下面,一级分区pt=1下面,二级分区ds=hz的数据)"),
ADS_LOAD_ODPS_FAILED("AdsWriter-07", "ADS从ODPS导数据失败,请联系ADS 技术支持,先检查ADS账号是否已加到该ODPS Project中。ADS账号为:"),
INVALID_CONFIG_VALUE("AdsWriter-08", "不合法的配置值."),
GET_ADS_TABLE_MEATA_FAILED("AdsWriter-11", "获取ADS table原信息失败");
private final String code;
private final String description;
private String adsAccount;
private AdsWriterErrorCode(String code, String description) {
this.code = code;
this.description = description;
}
public void setAdsAccount(String adsAccount) {
this.adsAccount = adsAccount;
}
@Override
public String getCode() {
return this.code;
}
@Override
public String getDescription() {
return this.description;
}
@Override
public String toString() {
if (this.code.equals("AdsWriter-07")){
return String.format("Code:[%s], Description:[%s][%s]. ", this.code,
this.description,adsAccount);
}else{
return String.format("Code:[%s], Description:[%s]. ", this.code,
this.description);
}
}
}
package com.alibaba.datax.plugin.writer.adswriter.ads;
/**
* ADS column meta.<br>
* <p>
* select ordinal_position,column_name,data_type,type_name,column_comment <br>
* from information_schema.columns <br>
* where table_schema='db_name' and table_name='table_name' <br>
* and is_deleted=0 <br>
* order by ordinal_position limit 1000 <br>
* </p>
*
* @since 0.0.1
*/
public class ColumnInfo {
private int ordinal;
private String name;
private ColumnDataType dataType;
private boolean isDeleted;
private String comment;
public int getOrdinal() {
return ordinal;
}
public void setOrdinal(int ordinal) {
this.ordinal = ordinal;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public ColumnDataType getDataType() {
return dataType;
}
public void setDataType(ColumnDataType dataType) {
this.dataType = dataType;
}
public boolean isDeleted() {
return isDeleted;
}
public void setDeleted(boolean isDeleted) {
this.isDeleted = isDeleted;
}
public String getComment() {
return comment;
}
public void setComment(String comment) {
this.comment = comment;
}
@Override
public String toString() {
StringBuilder builder = new StringBuilder();
builder.append("ColumnInfo [ordinal=").append(ordinal).append(", name=").append(name).append(", dataType=")
.append(dataType).append(", isDeleted=").append(isDeleted).append(", comment=").append(comment)
.append("]");
return builder.toString();
}
}
package com.alibaba.datax.plugin.writer.adswriter.ads;
import java.util.ArrayList;
import java.util.List;
/**
* ADS table meta.<br>
* <p>
* select table_schema, table_name,comments <br>
* from information_schema.tables <br>
* where table_schema='alimama' and table_name='click_af' limit 1 <br>
* </p>
* <p>
* select ordinal_position,column_name,data_type,type_name,column_comment <br>
* from information_schema.columns <br>
* where table_schema='db_name' and table_name='table_name' <br>
* and is_deleted=0 <br>
* order by ordinal_position limit 1000 <br>
* </p>
*
* @since 0.0.1
*/
public class TableInfo {
private String tableSchema;
private String tableName;
private List<ColumnInfo> columns;
private String comments;
private String tableType;
private String updateType;
private String partitionType;
private String partitionColumn;
private int partitionCount;
private List<String> primaryKeyColumns;
@Override
public String toString() {
StringBuilder builder = new StringBuilder();
builder.append("TableInfo [tableSchema=").append(tableSchema).append(", tableName=").append(tableName)
.append(", columns=").append(columns).append(", comments=").append(comments).append(",updateType=").append(updateType)
.append(",partitionType=").append(partitionType).append(",partitionColumn=").append(partitionColumn).append(",partitionCount=").append(partitionCount)
.append(",primaryKeyColumns=").append(primaryKeyColumns).append("]");
return builder.toString();
}
public String getTableSchema() {
return tableSchema;
}
public void setTableSchema(String tableSchema) {
this.tableSchema = tableSchema;
}
public String getTableName() {
return tableName;
}
public void setTableName(String tableName) {
this.tableName = tableName;
}
public List<ColumnInfo> getColumns() {
return columns;
}
public List<String> getColumnsNames() {
List<String> columnNames = new ArrayList<String>();
for (ColumnInfo column : this.getColumns()) {
columnNames.add(column.getName());
}
return columnNames;
}
public void setColumns(List<ColumnInfo> columns) {
this.columns = columns;
}
public String getComments() {
return comments;
}
public void setComments(String comments) {
this.comments = comments;
}
public String getTableType() {
return tableType;
}
public void setTableType(String tableType) {
this.tableType = tableType;
}
public String getUpdateType() {
return updateType;
}
public void setUpdateType(String updateType) {
this.updateType = updateType;
}
public String getPartitionType() {
return partitionType;
}
public void setPartitionType(String partitionType) {
this.partitionType = partitionType;
}
public String getPartitionColumn() {
return partitionColumn;
}
public void setPartitionColumn(String partitionColumn) {
this.partitionColumn = partitionColumn;
}
public int getPartitionCount() {
return partitionCount;
}
public void setPartitionCount(int partitionCount) {
this.partitionCount = partitionCount;
}
public List<String> getPrimaryKeyColumns() {
return primaryKeyColumns;
}
public void setPrimaryKeyColumns(List<String> primaryKeyColumns) {
this.primaryKeyColumns = primaryKeyColumns;
}
}
/**
* ADS meta and service.
*
* @since 0.0.1
*/
package com.alibaba.datax.plugin.writer.adswriter.ads;
\ No newline at end of file
package com.alibaba.datax.plugin.writer.adswriter.insert;
import com.alibaba.datax.common.exception.DataXException;
import com.alibaba.datax.common.util.Configuration;
import com.alibaba.datax.common.util.ListUtil;
import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode;
import com.alibaba.datax.plugin.writer.adswriter.AdsException;
import com.alibaba.datax.plugin.writer.adswriter.AdsWriterErrorCode;
import com.alibaba.datax.plugin.writer.adswriter.ads.ColumnInfo;
import com.alibaba.datax.plugin.writer.adswriter.ads.TableInfo;
import com.alibaba.datax.plugin.writer.adswriter.load.AdsHelper;
import com.alibaba.datax.plugin.writer.adswriter.util.AdsUtil;
import com.alibaba.datax.plugin.writer.adswriter.util.Constant;
import com.alibaba.datax.plugin.writer.adswriter.util.Key;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.tuple.ImmutablePair;
import org.apache.commons.lang3.tuple.Pair;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class AdsInsertUtil {
private static final Logger LOG = LoggerFactory
.getLogger(AdsInsertUtil.class);
public static TableInfo getAdsTableInfo(Configuration conf) {
AdsHelper adsHelper = AdsUtil.createAdsHelper(conf);
TableInfo tableInfo= null;
try {
tableInfo = adsHelper.getTableInfo(conf.getString(Key.ADS_TABLE));
} catch (AdsException e) {
throw DataXException.asDataXException(AdsWriterErrorCode.GET_ADS_TABLE_MEATA_FAILED, e);
}
return tableInfo;
}
/*
* 返回列顺序为ads建表列顺序
* */
public static List<String> getAdsTableColumnNames(Configuration conf) {
List<String> tableColumns = new ArrayList<String>();
AdsHelper adsHelper = AdsUtil.createAdsHelper(conf);
TableInfo tableInfo= null;
String adsTable = conf.getString(Key.ADS_TABLE);
try {
tableInfo = adsHelper.getTableInfo(adsTable);
} catch (AdsException e) {
throw DataXException.asDataXException(AdsWriterErrorCode.GET_ADS_TABLE_MEATA_FAILED, e);
}
List<ColumnInfo> columnInfos = tableInfo.getColumns();
for(ColumnInfo columnInfo: columnInfos) {
tableColumns.add(columnInfo.getName());
}
LOG.info("table:[{}] all columns:[\n{}\n].", adsTable, StringUtils.join(tableColumns, ","));
return tableColumns;
}
public static Map<String, Pair<Integer,String>> getColumnMetaData
(Configuration configuration, List<String> userColumns) {
Map<String, Pair<Integer,String>> columnMetaData = new HashMap<String, Pair<Integer,String>>();
List<ColumnInfo> columnInfoList = getAdsTableColumns(configuration);
for(String column : userColumns) {
if (column.startsWith(Constant.ADS_QUOTE_CHARACTER) && column.endsWith(Constant.ADS_QUOTE_CHARACTER)) {
column = column.substring(1, column.length() - 1);
}
for (ColumnInfo columnInfo : columnInfoList) {
if(column.equalsIgnoreCase(columnInfo.getName())) {
Pair<Integer,String> eachPair = new ImmutablePair<Integer, String>(columnInfo.getDataType().sqlType, columnInfo.getDataType().name);
columnMetaData.put(columnInfo.getName(), eachPair);
}
}
}
return columnMetaData;
}
public static Map<String, Pair<Integer,String>> getColumnMetaData(TableInfo tableInfo, List<String> userColumns){
Map<String, Pair<Integer,String>> columnMetaData = new HashMap<String, Pair<Integer,String>>();
List<ColumnInfo> columnInfoList = tableInfo.getColumns();
for(String column : userColumns) {
if (column.startsWith(Constant.ADS_QUOTE_CHARACTER) && column.endsWith(Constant.ADS_QUOTE_CHARACTER)) {
column = column.substring(1, column.length() - 1);
}
for (ColumnInfo columnInfo : columnInfoList) {
if(column.equalsIgnoreCase(columnInfo.getName())) {
Pair<Integer,String> eachPair = new ImmutablePair<Integer, String>(columnInfo.getDataType().sqlType, columnInfo.getDataType().name);
columnMetaData.put(columnInfo.getName(), eachPair);
}
}
}
return columnMetaData;
}
/*
* 返回列顺序为ads建表列顺序
* */
public static List<ColumnInfo> getAdsTableColumns(Configuration conf) {
AdsHelper adsHelper = AdsUtil.createAdsHelper(conf);
TableInfo tableInfo= null;
String adsTable = conf.getString(Key.ADS_TABLE);
try {
tableInfo = adsHelper.getTableInfo(adsTable);
} catch (AdsException e) {
throw DataXException.asDataXException(AdsWriterErrorCode.GET_ADS_TABLE_MEATA_FAILED, e);
}
List<ColumnInfo> columnInfos = tableInfo.getColumns();
return columnInfos;
}
public static void dealColumnConf(Configuration originalConfig, List<String> tableColumns) {
List<String> userConfiguredColumns = originalConfig.getList(Key.COLUMN, String.class);
if (null == userConfiguredColumns || userConfiguredColumns.isEmpty()) {
throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE,
"您的配置文件中的列配置信息有误. 因为您未配置写入数据库表的列名称,DataX获取不到列信息. 请检查您的配置并作出修改.");
} else {
if (1 == userConfiguredColumns.size() && "*".equals(userConfiguredColumns.get(0))) {
LOG.warn("您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.");
// 回填其值,需要以 String 的方式转交后续处理
originalConfig.set(Key.COLUMN, tableColumns);
} else if (userConfiguredColumns.size() > tableColumns.size()) {
throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE,
String.format("您的配置文件中的列配置信息有误. 因为您所配置的写入数据库表的字段个数:%s 大于目的表的总字段总个数:%s. 请检查您的配置并作出修改.",
userConfiguredColumns.size(), tableColumns.size()));
} else {
// 确保用户配置的 column 不重复
ListUtil.makeSureNoValueDuplicate(userConfiguredColumns, false);
// 检查列是否都为数据库表中正确的列(通过执行一次 select column from table 进行判断)
// ListUtil.makeSureBInA(tableColumns, userConfiguredColumns, true);
// 支持关键字和保留字, ads列是不区分大小写的
List<String> removeQuotedColumns = new ArrayList<String>();
for (String each : userConfiguredColumns) {
if (each.startsWith(Constant.ADS_QUOTE_CHARACTER) && each.endsWith(Constant.ADS_QUOTE_CHARACTER)) {
removeQuotedColumns.add(each.substring(1, each.length() - 1));
} else {
removeQuotedColumns.add(each);
}
}
ListUtil.makeSureBInA(tableColumns, removeQuotedColumns, false);
}
}
}
}
package com.alibaba.datax.plugin.writer.adswriter.insert;
public enum OperationType {
// i: insert uo:before image uu:before image un: after image d: delete
// u:update
I("i"), UO("uo"), UU("uu"), UN("un"), D("d"), U("u"), UNKNOWN("unknown"), ;
private OperationType(String type) {
this.type = type;
}
private String type;
public String getType() {
return this.type;
}
public static OperationType asOperationType(String type) {
if ("i".equalsIgnoreCase(type)) {
return I;
} else if ("uo".equalsIgnoreCase(type)) {
return UO;
} else if ("uu".equalsIgnoreCase(type)) {
return UU;
} else if ("un".equalsIgnoreCase(type)) {
return UN;
} else if ("d".equalsIgnoreCase(type)) {
return D;
} else if ("u".equalsIgnoreCase(type)) {
return U;
} else {
return UNKNOWN;
}
}
public boolean isInsertTemplate() {
switch (this) {
// 建议merge 过后应该只有I和U两类
case I:
case UO:
case UU:
case UN:
case U:
return true;
case D:
return false;
default:
return false;
}
}
public boolean isDeleteTemplate() {
switch (this) {
// 建议merge 过后应该只有I和U两类
case I:
case UO:
case UU:
case UN:
case U:
return false;
case D:
return true;
default:
return false;
}
}
public boolean isLegal() {
return this.type != UNKNOWN.getType();
}
@Override
public String toString() {
return this.name();
}
}
package com.alibaba.datax.plugin.writer.adswriter.load;
import com.alibaba.datax.plugin.writer.adswriter.ads.ColumnDataType;
import com.alibaba.datax.plugin.writer.adswriter.ads.ColumnInfo;
import com.alibaba.datax.plugin.writer.adswriter.ads.TableInfo;
import com.alibaba.datax.plugin.writer.adswriter.odps.DataType;
import com.alibaba.datax.plugin.writer.adswriter.odps.FieldSchema;
import com.alibaba.datax.plugin.writer.adswriter.odps.TableMeta;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
/**
* Table meta helper for ADS writer.
*
* @since 0.0.1
*/
public class TableMetaHelper {
private TableMetaHelper() {
}
/**
* Create temporary ODPS table.
*
* @param tableMeta table meta
* @param lifeCycle for temporary table
* @return ODPS temporary table meta
*/
public static TableMeta createTempODPSTable(TableInfo tableMeta, int lifeCycle) {
TableMeta tempTable = new TableMeta();
tempTable.setComment(tableMeta.getComments());
tempTable.setLifeCycle(lifeCycle);
String tableSchema = tableMeta.getTableSchema();
String tableName = tableMeta.getTableName();
tempTable.setTableName(generateTempTableName(tableSchema, tableName));
List<FieldSchema> tempColumns = new ArrayList<FieldSchema>();
List<ColumnInfo> columns = tableMeta.getColumns();
for (ColumnInfo column : columns) {
FieldSchema tempColumn = new FieldSchema();
tempColumn.setName(column.getName());
tempColumn.setType(toODPSDataType(column.getDataType()));
tempColumn.setComment(column.getComment());
tempColumns.add(tempColumn);
}
tempTable.setCols(tempColumns);
tempTable.setPartitionKeys(null);
return tempTable;
}
private static String toODPSDataType(ColumnDataType columnDataType) {
int type;
switch (columnDataType.type) {
case ColumnDataType.BOOLEAN:
type = DataType.STRING;
break;
case ColumnDataType.BYTE:
case ColumnDataType.SHORT:
case ColumnDataType.INT:
case ColumnDataType.LONG:
type = DataType.INTEGER;
break;
case ColumnDataType.DECIMAL:
case ColumnDataType.DOUBLE:
case ColumnDataType.FLOAT:
type = DataType.DOUBLE;
break;
case ColumnDataType.DATE:
case ColumnDataType.TIME:
case ColumnDataType.TIMESTAMP:
case ColumnDataType.STRING:
case ColumnDataType.MULTI_VALUE:
type = DataType.STRING;
break;
default:
throw new IllegalArgumentException("columnDataType=" + columnDataType);
}
return DataType.toString(type);
}
private static String generateTempTableName(String tableSchema, String tableName) {
int randNum = 1000 + new Random(System.currentTimeMillis()).nextInt(1000);
return tableSchema + "__" + tableName + "_" + System.currentTimeMillis() + randNum;
}
}
package com.alibaba.datax.plugin.writer.adswriter.load;
import com.alibaba.datax.common.util.Configuration;
/**
* Created by xiafei.qiuxf on 15/4/13.
*/
public class TransferProjectConf {
public final static String KEY_ACCESS_ID = "odps.accessId";
public final static String KEY_ACCESS_KEY = "odps.accessKey";
public final static String KEY_ACCOUNT = "odps.account";
public final static String KEY_ODPS_SERVER = "odps.odpsServer";
public final static String KEY_ODPS_TUNNEL = "odps.tunnelServer";
public final static String KEY_ACCOUNT_TYPE = "odps.accountType";
public final static String KEY_PROJECT = "odps.project";
private String accessId;
private String accessKey;
private String account;
private String odpsServer;
private String odpsTunnel;
private String accountType;
private String project;
public static TransferProjectConf create(Configuration adsWriterConf) {
TransferProjectConf res = new TransferProjectConf();
res.accessId = adsWriterConf.getString(KEY_ACCESS_ID);
res.accessKey = adsWriterConf.getString(KEY_ACCESS_KEY);
res.account = adsWriterConf.getString(KEY_ACCOUNT);
res.odpsServer = adsWriterConf.getString(KEY_ODPS_SERVER);
res.odpsTunnel = adsWriterConf.getString(KEY_ODPS_TUNNEL);
res.accountType = adsWriterConf.getString(KEY_ACCOUNT_TYPE, "aliyun");
res.project = adsWriterConf.getString(KEY_PROJECT);
return res;
}
public String getAccessId() {
return accessId;
}
public String getAccessKey() {
return accessKey;
}
public String getAccount() {
return account;
}
public String getOdpsServer() {
return odpsServer;
}
public String getOdpsTunnel() {
return odpsTunnel;
}
public String getAccountType() {
return accountType;
}
public String getProject() {
return project;
}
}
package com.alibaba.datax.plugin.writer.adswriter.odps;
/**
* ODPS 数据类型.
* <p>
* 当前定义了如下类型:
* <ul>
* <li>INTEGER
* <li>DOUBLE
* <li>BOOLEAN
* <li>STRING
* <li>DATETIME
* </ul>
* </p>
*
* @since 0.0.1
*/
public class DataType {
public final static byte INTEGER = 0;
public final static byte DOUBLE = 1;
public final static byte BOOLEAN = 2;
public final static byte STRING = 3;
public final static byte DATETIME = 4;
public static String toString(int type) {
switch (type) {
case INTEGER:
return "bigint";
case DOUBLE:
return "double";
case BOOLEAN:
return "boolean";
case STRING:
return "string";
case DATETIME:
return "datetime";
default:
throw new IllegalArgumentException("type=" + type);
}
}
/**
* 字符串的数据类型转换为byte常量定义的数据类型.
* <p>
* 转换规则:
* <ul>
* <li>tinyint, int, bigint, long - {@link #INTEGER}
* <li>double, float - {@link #DOUBLE}
* <li>string - {@link #STRING}
* <li>boolean, bool - {@link #BOOLEAN}
* <li>datetime - {@link #DATETIME}
* </ul>
* </p>
*
* @param type 字符串的数据类型
* @return byte常量定义的数据类型
* @throws IllegalArgumentException
*/
public static byte convertToDataType(String type) throws IllegalArgumentException {
type = type.toLowerCase().trim();
if ("string".equals(type)) {
return STRING;
} else if ("bigint".equals(type) || "int".equals(type) || "tinyint".equals(type) || "long".equals(type)) {
return INTEGER;
} else if ("boolean".equals(type) || "bool".equals(type)) {
return BOOLEAN;
} else if ("double".equals(type) || "float".equals(type)) {
return DOUBLE;
} else if ("datetime".equals(type)) {
return DATETIME;
} else {
throw new IllegalArgumentException("unkown type: " + type);
}
}
}
package com.alibaba.datax.plugin.writer.adswriter.odps;
/**
* ODPS列属性,包含列名和类型 列名和类型与SQL的DESC表或分区显示的列名和类型一致
*
* @since 0.0.1
*/
public class FieldSchema {
/** 列名 */
private String name;
/** 列类型,如:string, bigint, boolean, datetime等等 */
private String type;
private String comment;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public String getComment() {
return comment;
}
public void setComment(String comment) {
this.comment = comment;
}
@Override
public String toString() {
StringBuilder builder = new StringBuilder();
builder.append("FieldSchema [name=").append(name).append(", type=").append(type).append(", comment=")
.append(comment).append("]");
return builder.toString();
}
/**
* @return "col_name data_type [COMMENT col_comment]"
*/
public String toDDL() {
StringBuilder builder = new StringBuilder();
builder.append(name).append(" ").append(type);
String comment = this.comment;
if (comment != null && comment.length() > 0) {
builder.append(" ").append("COMMENT \"" + comment + "\"");
}
return builder.toString();
}
}
package com.alibaba.datax.plugin.writer.adswriter.odps;
import java.util.Iterator;
import java.util.List;
/**
* ODPS table meta.
*
* @since 0.0.1
*/
public class TableMeta {
private String tableName;
private List<FieldSchema> cols;
private List<FieldSchema> partitionKeys;
private int lifeCycle;
private String comment;
public String getTableName() {
return tableName;
}
public void setTableName(String tableName) {
this.tableName = tableName;
}
public List<FieldSchema> getCols() {
return cols;
}
public void setCols(List<FieldSchema> cols) {
this.cols = cols;
}
public List<FieldSchema> getPartitionKeys() {
return partitionKeys;
}
public void setPartitionKeys(List<FieldSchema> partitionKeys) {
this.partitionKeys = partitionKeys;
}
public int getLifeCycle() {
return lifeCycle;
}
public void setLifeCycle(int lifeCycle) {
this.lifeCycle = lifeCycle;
}
public String getComment() {
return comment;
}
public void setComment(String comment) {
this.comment = comment;
}
@Override
public String toString() {
StringBuilder builder = new StringBuilder();
builder.append("TableMeta [tableName=").append(tableName).append(", cols=").append(cols)
.append(", partitionKeys=").append(partitionKeys).append(", lifeCycle=").append(lifeCycle)
.append(", comment=").append(comment).append("]");
return builder.toString();
}
/**
* @return <br>
* "CREATE TABLE [IF NOT EXISTS] table_name <br>
* [(col_name data_type [COMMENT col_comment], ...)] <br>
* [COMMENT table_comment] <br>
* [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] <br>
* [LIFECYCLE days] <br>
* [AS select_statement] " <br>
*/
public String toDDL() {
StringBuilder builder = new StringBuilder();
builder.append("CREATE TABLE " + tableName).append(" ");
List<FieldSchema> cols = this.cols;
if (cols != null && cols.size() > 0) {
builder.append("(").append(toDDL(cols)).append(")").append(" ");
}
String comment = this.comment;
if (comment != null && comment.length() > 0) {
builder.append("COMMENT \"" + comment + "\" ");
}
List<FieldSchema> partitionKeys = this.partitionKeys;
if (partitionKeys != null && partitionKeys.size() > 0) {
builder.append("PARTITIONED BY ");
builder.append("(").append(toDDL(partitionKeys)).append(")").append(" ");
}
if (lifeCycle > 0) {
builder.append("LIFECYCLE " + lifeCycle).append(" ");
}
builder.append(";");
return builder.toString();
}
private String toDDL(List<FieldSchema> cols) {
StringBuilder builder = new StringBuilder();
Iterator<FieldSchema> iter = cols.iterator();
builder.append(iter.next().toDDL());
while (iter.hasNext()) {
builder.append(", ").append(iter.next().toDDL());
}
return builder.toString();
}
}
/**
* ODPS meta.
*
* @since 0.0.1
*/
package com.alibaba.datax.plugin.writer.adswriter.odps;
\ No newline at end of file
/**
* ADS Writer.
*
* @since 0.0.1
*/
package com.alibaba.datax.plugin.writer.adswriter;
\ No newline at end of file
package com.alibaba.datax.plugin.writer.adswriter.util;
import com.alibaba.datax.common.exception.DataXException;
import com.alibaba.datax.common.util.Configuration;
import com.alibaba.datax.plugin.rdbms.util.DBUtil;
import com.alibaba.datax.plugin.rdbms.util.DataBaseType;
import com.alibaba.datax.plugin.writer.adswriter.load.AdsHelper;
import com.alibaba.datax.plugin.writer.adswriter.AdsWriterErrorCode;
import com.alibaba.datax.plugin.writer.adswriter.load.TransferProjectConf;
import com.alibaba.datax.plugin.writer.adswriter.odps.FieldSchema;
import com.alibaba.datax.plugin.writer.adswriter.odps.TableMeta;
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.Connection;
import java.util.ArrayList;
import java.util.List;
public class AdsUtil {
private static final Logger LOG = LoggerFactory.getLogger(AdsUtil.class);
/*检查配置文件中必填的配置项是否都已填
* */
public static void checkNecessaryConfig(Configuration originalConfig, String writeMode) {
//检查ADS必要参数
originalConfig.getNecessaryValue(Key.ADS_URL,
AdsWriterErrorCode.REQUIRED_VALUE);
originalConfig.getNecessaryValue(Key.USERNAME,
AdsWriterErrorCode.REQUIRED_VALUE);
originalConfig.getNecessaryValue(Key.PASSWORD,
AdsWriterErrorCode.REQUIRED_VALUE);
originalConfig.getNecessaryValue(Key.SCHEMA,
AdsWriterErrorCode.REQUIRED_VALUE);
if(Constant.LOADMODE.equals(writeMode)) {
originalConfig.getNecessaryValue(Key.Life_CYCLE,
AdsWriterErrorCode.REQUIRED_VALUE);
Integer lifeCycle = originalConfig.getInt(Key.Life_CYCLE);
if (lifeCycle <= 0) {
throw DataXException.asDataXException(AdsWriterErrorCode.INVALID_CONFIG_VALUE, "配置项[lifeCycle]的值必须大于零.");
}
originalConfig.getNecessaryValue(Key.ADS_TABLE,
AdsWriterErrorCode.REQUIRED_VALUE);
Boolean overwrite = originalConfig.getBool(Key.OVER_WRITE);
if (overwrite == null) {
throw DataXException.asDataXException(AdsWriterErrorCode.REQUIRED_VALUE, "配置项[overWrite]是必填项.");
}
}
if (Constant.STREAMMODE.equalsIgnoreCase(writeMode)) {
originalConfig.getNecessaryValue(Key.OPIndex, AdsWriterErrorCode.REQUIRED_VALUE);
}
}
/*生成AdsHelp实例
* */
public static AdsHelper createAdsHelper(Configuration originalConfig){
//Get adsUrl,userName,password,schema等参数,创建AdsHelp实例
String adsUrl = originalConfig.getString(Key.ADS_URL);
String userName = originalConfig.getString(Key.USERNAME);
String password = originalConfig.getString(Key.PASSWORD);
String schema = originalConfig.getString(Key.SCHEMA);
Long socketTimeout = originalConfig.getLong(Key.SOCKET_TIMEOUT, Constant.DEFAULT_SOCKET_TIMEOUT);
String suffix = originalConfig.getString(Key.JDBC_URL_SUFFIX, "");
return new AdsHelper(adsUrl,userName,password,schema,socketTimeout,suffix);
}
public static AdsHelper createAdsHelperWithOdpsAccount(Configuration originalConfig) {
String adsUrl = originalConfig.getString(Key.ADS_URL);
String userName = originalConfig.getString(TransferProjectConf.KEY_ACCESS_ID);
String password = originalConfig.getString(TransferProjectConf.KEY_ACCESS_KEY);
String schema = originalConfig.getString(Key.SCHEMA);
Long socketTimeout = originalConfig.getLong(Key.SOCKET_TIMEOUT, Constant.DEFAULT_SOCKET_TIMEOUT);
String suffix = originalConfig.getString(Key.JDBC_URL_SUFFIX, "");
return new AdsHelper(adsUrl, userName, password, schema,socketTimeout,suffix);
}
/*生成ODPSWriter Plugin所需要的配置文件
* */
public static Configuration generateConf(Configuration originalConfig, String odpsTableName, TableMeta tableMeta, TransferProjectConf transConf){
Configuration newConfig = originalConfig.clone();
newConfig.set(Key.ODPSTABLENAME, odpsTableName);
newConfig.set(Key.ODPS_SERVER, transConf.getOdpsServer());
newConfig.set(Key.TUNNEL_SERVER,transConf.getOdpsTunnel());
newConfig.set(Key.ACCESS_ID,transConf.getAccessId());
newConfig.set(Key.ACCESS_KEY,transConf.getAccessKey());
newConfig.set(Key.PROJECT,transConf.getProject());
newConfig.set(Key.TRUNCATE, true);
newConfig.set(Key.PARTITION,null);
// newConfig.remove(Key.PARTITION);
List<FieldSchema> cols = tableMeta.getCols();
List<String> allColumns = new ArrayList<String>();
if(cols != null && !cols.isEmpty()){
for(FieldSchema col:cols){
allColumns.add(col.getName());
}
}
newConfig.set(Key.COLUMN,allColumns);
return newConfig;
}
/*生成ADS数据导入时的source_path
* */
public static String generateSourcePath(String project, String tmpOdpsTableName, String odpsPartition){
StringBuilder builder = new StringBuilder();
String partition = transferOdpsPartitionToAds(odpsPartition);
builder.append("odps://").append(project).append("/").append(tmpOdpsTableName);
if(odpsPartition != null && !odpsPartition.isEmpty()){
builder.append("/").append(partition);
}
return builder.toString();
}
public static String transferOdpsPartitionToAds(String odpsPartition){
if(odpsPartition == null || odpsPartition.isEmpty())
return null;
String adsPartition = formatPartition(odpsPartition);;
String[] partitions = adsPartition.split("/");
for(int last = partitions.length; last > 0; last--){
String partitionPart = partitions[last-1];
String newPart = partitionPart.replace(".*", "*").replace("*", ".*");
if(newPart.split("=")[1].equals(".*")){
adsPartition = adsPartition.substring(0,adsPartition.length()-partitionPart.length());
}else{
break;
}
if(adsPartition.endsWith("/")){
adsPartition = adsPartition.substring(0,adsPartition.length()-1);
}
}
if (adsPartition.contains("*"))
throw DataXException.asDataXException(AdsWriterErrorCode.ODPS_PARTITION_FAILED, "");
return adsPartition;
}
public static String formatPartition(String partition) {
return partition.trim().replaceAll(" *= *", "=")
.replaceAll(" */ *", ",").replaceAll(" *, *", ",")
.replaceAll("'", "").replaceAll(",", "/");
}
public static String prepareJdbcUrl(Configuration conf) {
String adsURL = conf.getString(Key.ADS_URL);
String schema = conf.getString(Key.SCHEMA);
Long socketTimeout = conf.getLong(Key.SOCKET_TIMEOUT,
Constant.DEFAULT_SOCKET_TIMEOUT);
String suffix = conf.getString(Key.JDBC_URL_SUFFIX, "");
return AdsUtil.prepareJdbcUrl(adsURL, schema, socketTimeout, suffix);
}
public static String prepareJdbcUrl(String adsURL, String schema,
Long socketTimeout, String suffix) {
String jdbcUrl = null;
// like autoReconnect=true&failOverReadOnly=false&maxReconnects=10
if (StringUtils.isNotBlank(suffix)) {
jdbcUrl = String
.format("jdbc:mysql://%s/%s?useUnicode=true&characterEncoding=UTF-8&socketTimeout=%s&%s",
adsURL, schema, socketTimeout, suffix);
} else {
jdbcUrl = String
.format("jdbc:mysql://%s/%s?useUnicode=true&characterEncoding=UTF-8&socketTimeout=%s",
adsURL, schema, socketTimeout);
}
return jdbcUrl;
}
public static Connection getAdsConnect(Configuration conf) {
String userName = conf.getString(Key.USERNAME);
String passWord = conf.getString(Key.PASSWORD);
String jdbcUrl = AdsUtil.prepareJdbcUrl(conf);
Connection connection = DBUtil.getConnection(DataBaseType.ADS, jdbcUrl, userName, passWord);
return connection;
}
}
package com.alibaba.datax.plugin.writer.adswriter.util;
public class Constant {
public static final String LOADMODE = "load";
public static final String INSERTMODE = "insert";
public static final String DELETEMODE = "delete";
public static final String REPLACEMODE = "replace";
public static final String STREAMMODE = "stream";
public static final int DEFAULT_BATCH_SIZE = 32;
public static final long DEFAULT_SOCKET_TIMEOUT = 3600000L;
public static final int DEFAULT_RETRY_TIMES = 2;
public static final String INSERT_TEMPLATE = "insert into %s ( %s ) values ";
public static final String DELETE_TEMPLATE = "delete from %s where ";
public static final String ADS_TABLE_INFO = "adsTableInfo";
public static final String ADS_QUOTE_CHARACTER = "`";
}
package com.alibaba.datax.plugin.writer.adswriter.util;
public final class Key {
public final static String ADS_URL = "url";
public final static String USERNAME = "username";
public final static String PASSWORD = "password";
public final static String SCHEMA = "schema";
public final static String ADS_TABLE = "table";
public final static String Life_CYCLE = "lifeCycle";
public final static String OVER_WRITE = "overWrite";
public final static String WRITE_MODE = "writeMode";
public final static String COLUMN = "column";
public final static String OPIndex = "opIndex";
public final static String EMPTY_AS_NULL = "emptyAsNull";
public final static String BATCH_SIZE = "batchSize";
public final static String BUFFER_SIZE = "bufferSize";
public final static String PRE_SQL = "preSql";
public final static String POST_SQL = "postSql";
public final static String SOCKET_TIMEOUT = "socketTimeout";
public final static String RETRY_CONNECTION_TIME = "retryTimes";
public final static String JDBC_URL_SUFFIX = "urlSuffix";
/**
* 以下是odps writer的key
*/
public final static String PARTITION = "partition";
public final static String ODPSTABLENAME = "table";
public final static String ODPS_SERVER = "odpsServer";
public final static String TUNNEL_SERVER = "tunnelServer";
public final static String ACCESS_ID = "accessId";
public final static String ACCESS_KEY = "accessKey";
public final static String PROJECT = "project";
public final static String TRUNCATE = "truncate";
}
\ No newline at end of file
{
"name": "adswriter",
"class": "com.alibaba.datax.plugin.writer.adswriter.AdsWriter",
"description": "",
"developer": "alibaba"
}
\ No newline at end of file
{
"name": "adswriter",
"parameter": {
"url": "",
"username": "",
"password": "",
"schema": "",
"table": "",
"partition": "",
"overWrite": "",
"lifeCycle": 2
}
}
\ No newline at end of file
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-all</artifactId>
<version>0.0.1-SNAPSHOT</version>
</parent>
<artifactId>datax-common</artifactId>
<name>datax-common</name>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>fluent-hc</artifactId>
<version>4.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
<version>3.1.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>${project-sourceEncoding}</encoding>
</configuration>
</plugin>
</plugins>
</build>
</project>
package com.alibaba.datax.common.base;
import org.apache.commons.lang3.builder.EqualsBuilder;
import org.apache.commons.lang3.builder.HashCodeBuilder;
import org.apache.commons.lang3.builder.ToStringBuilder;
import org.apache.commons.lang3.builder.ToStringStyle;
public class BaseObject {
@Override
public int hashCode() {
return HashCodeBuilder.reflectionHashCode(this, false);
}
@Override
public boolean equals(Object object) {
return EqualsBuilder.reflectionEquals(this, object, false);
}
@Override
public String toString() {
return ToStringBuilder.reflectionToString(this,
ToStringStyle.MULTI_LINE_STYLE);
}
}
package com.alibaba.datax.common.constant;
public final class CommonConstant {
/**
* 用于插件对自身 split 的每个 task 标识其使用的资源,以告知core 对 reader/writer split 之后的 task 进行拼接时需要根据资源标签进行更有意义的 shuffle 操作
*/
public static String LOAD_BALANCE_RESOURCE_MARK = "loadBalanceResourceMark";
}
package com.alibaba.datax.common.constant;
/**
* Created by jingxing on 14-8-31.
*/
public enum PluginType {
//pluginType还代表了资源目录,很难扩展,或者说需要足够必要才扩展。先mark Handler(其实和transformer一样),再讨论
READER("reader"), TRANSFORMER("transformer"), WRITER("writer"), HANDLER("handler");
private String pluginType;
private PluginType(String pluginType) {
this.pluginType = pluginType;
}
@Override
public String toString() {
return this.pluginType;
}
}
package com.alibaba.datax.common.element;
import com.alibaba.datax.common.exception.CommonErrorCode;
import com.alibaba.datax.common.exception.DataXException;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Date;
/**
* Created by jingxing on 14-8-24.
*/
public class BoolColumn extends Column {
public BoolColumn(Boolean bool) {
super(bool, Column.Type.BOOL, 1);
}
public BoolColumn(final String data) {
this(true);
this.validate(data);
if (null == data) {
this.setRawData(null);
this.setByteSize(0);
} else {
this.setRawData(Boolean.valueOf(data));
this.setByteSize(1);
}
return;
}
public BoolColumn() {
super(null, Column.Type.BOOL, 1);
}
@Override
public Boolean asBoolean() {
if (null == super.getRawData()) {
return null;
}
return (Boolean) super.getRawData();
}
@Override
public Long asLong() {
if (null == this.getRawData()) {
return null;
}
return this.asBoolean() ? 1L : 0L;
}
@Override
public Double asDouble() {
if (null == this.getRawData()) {
return null;
}
return this.asBoolean() ? 1.0d : 0.0d;
}
@Override
public String asString() {
if (null == super.getRawData()) {
return null;
}
return this.asBoolean() ? "true" : "false";
}
@Override
public BigInteger asBigInteger() {
if (null == this.getRawData()) {
return null;
}
return BigInteger.valueOf(this.asLong());
}
@Override
public BigDecimal asBigDecimal() {
if (null == this.getRawData()) {
return null;
}
return BigDecimal.valueOf(this.asLong());
}
@Override
public Date asDate() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Bool类型不能转为Date .");
}
@Override
public byte[] asBytes() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Boolean类型不能转为Bytes .");
}
private void validate(final String data) {
if (null == data) {
return;
}
if ("true".equalsIgnoreCase(data) || "false".equalsIgnoreCase(data)) {
return;
}
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("String[%s]不能转为Bool .", data));
}
}
package com.alibaba.datax.common.element;
import com.alibaba.datax.common.exception.CommonErrorCode;
import com.alibaba.datax.common.exception.DataXException;
import org.apache.commons.lang3.ArrayUtils;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Date;
/**
* Created by jingxing on 14-8-24.
*/
public class BytesColumn extends Column {
public BytesColumn() {
this(null);
}
public BytesColumn(byte[] bytes) {
super(ArrayUtils.clone(bytes), Column.Type.BYTES, null == bytes ? 0
: bytes.length);
}
@Override
public byte[] asBytes() {
if (null == this.getRawData()) {
return null;
}
return (byte[]) this.getRawData();
}
@Override
public String asString() {
if (null == this.getRawData()) {
return null;
}
try {
return ColumnCast.bytes2String(this);
} catch (Exception e) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("Bytes[%s]不能转为String .", this.toString()));
}
}
@Override
public Long asLong() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为Long .");
}
@Override
public BigDecimal asBigDecimal() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为BigDecimal .");
}
@Override
public BigInteger asBigInteger() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为BigInteger .");
}
@Override
public Double asDouble() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为Long .");
}
@Override
public Date asDate() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为Date .");
}
@Override
public Boolean asBoolean() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为Boolean .");
}
}
package com.alibaba.datax.common.element;
import com.alibaba.fastjson.JSON;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Date;
/**
* Created by jingxing on 14-8-24.
* <p/>
*/
public abstract class Column {
private Type type;
private Object rawData;
private int byteSize;
public Column(final Object object, final Type type, int byteSize) {
this.rawData = object;
this.type = type;
this.byteSize = byteSize;
}
public Object getRawData() {
return this.rawData;
}
public Type getType() {
return this.type;
}
public int getByteSize() {
return this.byteSize;
}
protected void setType(Type type) {
this.type = type;
}
protected void setRawData(Object rawData) {
this.rawData = rawData;
}
protected void setByteSize(int byteSize) {
this.byteSize = byteSize;
}
public abstract Long asLong();
public abstract Double asDouble();
public abstract String asString();
public abstract Date asDate();
public abstract byte[] asBytes();
public abstract Boolean asBoolean();
public abstract BigDecimal asBigDecimal();
public abstract BigInteger asBigInteger();
@Override
public String toString() {
return JSON.toJSONString(this);
}
public enum Type {
BAD, NULL, INT, LONG, DOUBLE, STRING, BOOL, DATE, BYTES
}
}
package com.alibaba.datax.common.element;
import com.alibaba.datax.common.exception.CommonErrorCode;
import com.alibaba.datax.common.exception.DataXException;
import com.alibaba.datax.common.util.Configuration;
import org.apache.commons.lang3.time.DateFormatUtils;
import org.apache.commons.lang3.time.FastDateFormat;
import java.io.UnsupportedEncodingException;
import java.text.ParseException;
import java.util.*;
public final class ColumnCast {
public static void bind(final Configuration configuration) {
StringCast.init(configuration);
DateCast.init(configuration);
BytesCast.init(configuration);
}
public static Date string2Date(final StringColumn column)
throws ParseException {
return StringCast.asDate(column);
}
public static byte[] string2Bytes(final StringColumn column)
throws UnsupportedEncodingException {
return StringCast.asBytes(column);
}
public static String date2String(final DateColumn column) {
return DateCast.asString(column);
}
public static String bytes2String(final BytesColumn column)
throws UnsupportedEncodingException {
return BytesCast.asString(column);
}
}
class StringCast {
static String datetimeFormat = "yyyy-MM-dd HH:mm:ss";
static String dateFormat = "yyyy-MM-dd";
static String timeFormat = "HH:mm:ss";
static List<String> extraFormats = Collections.emptyList();
static String timeZone = "GMT+8";
static FastDateFormat dateFormatter;
static FastDateFormat timeFormatter;
static FastDateFormat datetimeFormatter;
static TimeZone timeZoner;
static String encoding = "UTF-8";
static void init(final Configuration configuration) {
StringCast.datetimeFormat = configuration.getString(
"common.column.datetimeFormat", StringCast.datetimeFormat);
StringCast.dateFormat = configuration.getString(
"common.column.dateFormat", StringCast.dateFormat);
StringCast.timeFormat = configuration.getString(
"common.column.timeFormat", StringCast.timeFormat);
StringCast.extraFormats = configuration.getList(
"common.column.extraFormats", Collections.<String>emptyList(), String.class);
StringCast.timeZone = configuration.getString("common.column.timeZone",
StringCast.timeZone);
StringCast.timeZoner = TimeZone.getTimeZone(StringCast.timeZone);
StringCast.datetimeFormatter = FastDateFormat.getInstance(
StringCast.datetimeFormat, StringCast.timeZoner);
StringCast.dateFormatter = FastDateFormat.getInstance(
StringCast.dateFormat, StringCast.timeZoner);
StringCast.timeFormatter = FastDateFormat.getInstance(
StringCast.timeFormat, StringCast.timeZoner);
StringCast.encoding = configuration.getString("common.column.encoding",
StringCast.encoding);
}
static Date asDate(final StringColumn column) throws ParseException {
if (null == column.asString()) {
return null;
}
try {
return StringCast.datetimeFormatter.parse(column.asString());
} catch (ParseException ignored) {
}
try {
return StringCast.dateFormatter.parse(column.asString());
} catch (ParseException ignored) {
}
ParseException e;
try {
return StringCast.timeFormatter.parse(column.asString());
} catch (ParseException ignored) {
e = ignored;
}
for (String format : StringCast.extraFormats) {
try{
return FastDateFormat.getInstance(format, StringCast.timeZoner).parse(column.asString());
} catch (ParseException ignored){
e = ignored;
}
}
throw e;
}
static byte[] asBytes(final StringColumn column)
throws UnsupportedEncodingException {
if (null == column.asString()) {
return null;
}
return column.asString().getBytes(StringCast.encoding);
}
}
/**
* 后续为了可维护性,可以考虑直接使用 apache 的DateFormatUtils.
*
* 迟南已经修复了该问题,但是为了维护性,还是直接使用apache的内置函数
*/
class DateCast {
static String datetimeFormat = "yyyy-MM-dd HH:mm:ss";
static String dateFormat = "yyyy-MM-dd";
static String timeFormat = "HH:mm:ss";
static String timeZone = "GMT+8";
static TimeZone timeZoner = TimeZone.getTimeZone(DateCast.timeZone);
static void init(final Configuration configuration) {
DateCast.datetimeFormat = configuration.getString(
"common.column.datetimeFormat", datetimeFormat);
DateCast.timeFormat = configuration.getString(
"common.column.timeFormat", timeFormat);
DateCast.dateFormat = configuration.getString(
"common.column.dateFormat", dateFormat);
DateCast.timeZone = configuration.getString("common.column.timeZone",
DateCast.timeZone);
DateCast.timeZoner = TimeZone.getTimeZone(DateCast.timeZone);
return;
}
static String asString(final DateColumn column) {
if (null == column.asDate()) {
return null;
}
switch (column.getSubType()) {
case DATE:
return DateFormatUtils.format(column.asDate(), DateCast.dateFormat,
DateCast.timeZoner);
case TIME:
return DateFormatUtils.format(column.asDate(), DateCast.timeFormat,
DateCast.timeZoner);
case DATETIME:
return DateFormatUtils.format(column.asDate(),
DateCast.datetimeFormat, DateCast.timeZoner);
default:
throw DataXException
.asDataXException(CommonErrorCode.CONVERT_NOT_SUPPORT,
"时间类型出现不支持类型,目前仅支持DATE/TIME/DATETIME。该类型属于编程错误,请反馈给DataX开发团队 .");
}
}
}
class BytesCast {
static String encoding = "utf-8";
static void init(final Configuration configuration) {
BytesCast.encoding = configuration.getString("common.column.encoding",
BytesCast.encoding);
return;
}
static String asString(final BytesColumn column)
throws UnsupportedEncodingException {
if (null == column.asBytes()) {
return null;
}
return new String(column.asBytes(), encoding);
}
}
\ No newline at end of file
package com.alibaba.datax.common.element;
import com.alibaba.datax.common.exception.CommonErrorCode;
import com.alibaba.datax.common.exception.DataXException;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Date;
/**
* Created by jingxing on 14-8-24.
*/
public class DateColumn extends Column {
private DateType subType = DateType.DATETIME;
public static enum DateType {
DATE, TIME, DATETIME
}
/**
* 构建值为null的DateColumn,使用Date子类型为DATETIME
* */
public DateColumn() {
this((Long)null);
}
/**
* 构建值为stamp(Unix时间戳)的DateColumn,使用Date子类型为DATETIME
* 实际存储有date改为long的ms,节省存储
* */
public DateColumn(final Long stamp) {
super(stamp, Column.Type.DATE, (null == stamp ? 0 : 8));
}
/**
* 构建值为date(java.util.Date)的DateColumn,使用Date子类型为DATETIME
* */
public DateColumn(final Date date) {
this(date == null ? null : date.getTime());
}
/**
* 构建值为date(java.sql.Date)的DateColumn,使用Date子类型为DATE,只有日期,没有时间
* */
public DateColumn(final java.sql.Date date) {
this(date == null ? null : date.getTime());
this.setSubType(DateType.DATE);
}
/**
* 构建值为time(java.sql.Time)的DateColumn,使用Date子类型为TIME,只有时间,没有日期
* */
public DateColumn(final java.sql.Time time) {
this(time == null ? null : time.getTime());
this.setSubType(DateType.TIME);
}
/**
* 构建值为ts(java.sql.Timestamp)的DateColumn,使用Date子类型为DATETIME
* */
public DateColumn(final java.sql.Timestamp ts) {
this(ts == null ? null : ts.getTime());
this.setSubType(DateType.DATETIME);
}
@Override
public Long asLong() {
return (Long)this.getRawData();
}
@Override
public String asString() {
try {
return ColumnCast.date2String(this);
} catch (Exception e) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("Date[%s]类型不能转为String .", this.toString()));
}
}
@Override
public Date asDate() {
if (null == this.getRawData()) {
return null;
}
return new Date((Long)this.getRawData());
}
@Override
public byte[] asBytes() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Date类型不能转为Bytes .");
}
@Override
public Boolean asBoolean() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Date类型不能转为Boolean .");
}
@Override
public Double asDouble() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Date类型不能转为Double .");
}
@Override
public BigInteger asBigInteger() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Date类型不能转为BigInteger .");
}
@Override
public BigDecimal asBigDecimal() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Date类型不能转为BigDecimal .");
}
public DateType getSubType() {
return subType;
}
public void setSubType(DateType subType) {
this.subType = subType;
}
}
\ No newline at end of file
package com.alibaba.datax.common.element;
import com.alibaba.datax.common.exception.CommonErrorCode;
import com.alibaba.datax.common.exception.DataXException;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Date;
public class DoubleColumn extends Column {
public DoubleColumn(final String data) {
this(data, null == data ? 0 : data.length());
this.validate(data);
}
public DoubleColumn(Long data) {
this(data == null ? (String) null : String.valueOf(data));
}
public DoubleColumn(Integer data) {
this(data == null ? (String) null : String.valueOf(data));
}
/**
* Double无法表示准确的小数数据,我们不推荐使用该方法保存Double数据,建议使用String作为构造入参
*
* */
public DoubleColumn(final Double data) {
this(data == null ? (String) null
: new BigDecimal(String.valueOf(data)).toPlainString());
}
/**
* Float无法表示准确的小数数据,我们不推荐使用该方法保存Float数据,建议使用String作为构造入参
*
* */
public DoubleColumn(final Float data) {
this(data == null ? (String) null
: new BigDecimal(String.valueOf(data)).toPlainString());
}
public DoubleColumn(final BigDecimal data) {
this(null == data ? (String) null : data.toPlainString());
}
public DoubleColumn(final BigInteger data) {
this(null == data ? (String) null : data.toString());
}
public DoubleColumn() {
this((String) null);
}
private DoubleColumn(final String data, int byteSize) {
super(data, Column.Type.DOUBLE, byteSize);
}
@Override
public BigDecimal asBigDecimal() {
if (null == this.getRawData()) {
return null;
}
try {
return new BigDecimal((String) this.getRawData());
} catch (NumberFormatException e) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("String[%s] 无法转换为Double类型 .",
(String) this.getRawData()));
}
}
@Override
public Double asDouble() {
if (null == this.getRawData()) {
return null;
}
String string = (String) this.getRawData();
boolean isDoubleSpecific = string.equals("NaN")
|| string.equals("-Infinity") || string.equals("+Infinity");
if (isDoubleSpecific) {
return Double.valueOf(string);
}
BigDecimal result = this.asBigDecimal();
OverFlowUtil.validateDoubleNotOverFlow(result);
return result.doubleValue();
}
@Override
public Long asLong() {
if (null == this.getRawData()) {
return null;
}
BigDecimal result = this.asBigDecimal();
OverFlowUtil.validateLongNotOverFlow(result.toBigInteger());
return result.longValue();
}
@Override
public BigInteger asBigInteger() {
if (null == this.getRawData()) {
return null;
}
return this.asBigDecimal().toBigInteger();
}
@Override
public String asString() {
if (null == this.getRawData()) {
return null;
}
return (String) this.getRawData();
}
@Override
public Boolean asBoolean() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Double类型无法转为Bool .");
}
@Override
public Date asDate() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Double类型无法转为Date类型 .");
}
@Override
public byte[] asBytes() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Double类型无法转为Bytes类型 .");
}
private void validate(final String data) {
if (null == data) {
return;
}
if (data.equalsIgnoreCase("NaN") || data.equalsIgnoreCase("-Infinity")
|| data.equalsIgnoreCase("Infinity")) {
return;
}
try {
new BigDecimal(data);
} catch (Exception e) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("String[%s]无法转为Double类型 .", data));
}
}
}
\ No newline at end of file
package com.alibaba.datax.common.element;
import com.alibaba.datax.common.exception.CommonErrorCode;
import com.alibaba.datax.common.exception.DataXException;
import org.apache.commons.lang3.math.NumberUtils;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Date;
public class LongColumn extends Column {
/**
* 从整形字符串表示转为LongColumn,支持Java科学计数法
*
* NOTE: <br>
* 如果data为浮点类型的字符串表示,数据将会失真,请使用DoubleColumn对接浮点字符串
*
* */
public LongColumn(final String data) {
super(null, Column.Type.LONG, 0);
if (null == data) {
return;
}
try {
BigInteger rawData = NumberUtils.createBigDecimal(data)
.toBigInteger();
super.setRawData(rawData);
// 当 rawData 为[0-127]时,rawData.bitLength() < 8,导致其 byteSize = 0,简单起见,直接认为其长度为 data.length()
// super.setByteSize(rawData.bitLength() / 8);
super.setByteSize(data.length());
} catch (Exception e) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("String[%s]不能转为Long .", data));
}
}
public LongColumn(Long data) {
this(null == data ? (BigInteger) null : BigInteger.valueOf(data));
}
public LongColumn(Integer data) {
this(null == data ? (BigInteger) null : BigInteger.valueOf(data));
}
public LongColumn(BigInteger data) {
this(data, null == data ? 0 : 8);
}
private LongColumn(BigInteger data, int byteSize) {
super(data, Column.Type.LONG, byteSize);
}
public LongColumn() {
this((BigInteger) null);
}
@Override
public BigInteger asBigInteger() {
if (null == this.getRawData()) {
return null;
}
return (BigInteger) this.getRawData();
}
@Override
public Long asLong() {
BigInteger rawData = (BigInteger) this.getRawData();
if (null == rawData) {
return null;
}
OverFlowUtil.validateLongNotOverFlow(rawData);
return rawData.longValue();
}
@Override
public Double asDouble() {
if (null == this.getRawData()) {
return null;
}
BigDecimal decimal = this.asBigDecimal();
OverFlowUtil.validateDoubleNotOverFlow(decimal);
return decimal.doubleValue();
}
@Override
public Boolean asBoolean() {
if (null == this.getRawData()) {
return null;
}
return this.asBigInteger().compareTo(BigInteger.ZERO) != 0 ? true
: false;
}
@Override
public BigDecimal asBigDecimal() {
if (null == this.getRawData()) {
return null;
}
return new BigDecimal(this.asBigInteger());
}
@Override
public String asString() {
if (null == this.getRawData()) {
return null;
}
return ((BigInteger) this.getRawData()).toString();
}
@Override
public Date asDate() {
if (null == this.getRawData()) {
return null;
}
return new Date(this.asLong());
}
@Override
public byte[] asBytes() {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, "Long类型不能转为Bytes .");
}
}
package com.alibaba.datax.common.element;
import java.math.BigDecimal;
import java.math.BigInteger;
import com.alibaba.datax.common.exception.CommonErrorCode;
import com.alibaba.datax.common.exception.DataXException;
public final class OverFlowUtil {
public static final BigInteger MAX_LONG = BigInteger
.valueOf(Long.MAX_VALUE);
public static final BigInteger MIN_LONG = BigInteger
.valueOf(Long.MIN_VALUE);
public static final BigDecimal MIN_DOUBLE_POSITIVE = new BigDecimal(
String.valueOf(Double.MIN_VALUE));
public static final BigDecimal MAX_DOUBLE_POSITIVE = new BigDecimal(
String.valueOf(Double.MAX_VALUE));
public static boolean isLongOverflow(final BigInteger integer) {
return (integer.compareTo(OverFlowUtil.MAX_LONG) > 0 || integer
.compareTo(OverFlowUtil.MIN_LONG) < 0);
}
public static void validateLongNotOverFlow(final BigInteger integer) {
boolean isOverFlow = OverFlowUtil.isLongOverflow(integer);
if (isOverFlow) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_OVER_FLOW,
String.format("[%s] 转为Long类型出现溢出 .", integer.toString()));
}
}
public static boolean isDoubleOverFlow(final BigDecimal decimal) {
if (decimal.signum() == 0) {
return false;
}
BigDecimal newDecimal = decimal;
boolean isPositive = decimal.signum() == 1;
if (!isPositive) {
newDecimal = decimal.negate();
}
return (newDecimal.compareTo(MIN_DOUBLE_POSITIVE) < 0 || newDecimal
.compareTo(MAX_DOUBLE_POSITIVE) > 0);
}
public static void validateDoubleNotOverFlow(final BigDecimal decimal) {
boolean isOverFlow = OverFlowUtil.isDoubleOverFlow(decimal);
if (isOverFlow) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_OVER_FLOW,
String.format("[%s]转为Double类型出现溢出 .",
decimal.toPlainString()));
}
}
}
package com.alibaba.datax.common.element;
/**
* Created by jingxing on 14-8-24.
*/
public interface Record {
public void addColumn(Column column);
public void setColumn(int i, final Column column);
public Column getColumn(int i);
public String toString();
public int getColumnNumber();
public int getByteSize();
public int getMemorySize();
}
package com.alibaba.datax.common.element;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Date;
import com.alibaba.datax.common.exception.CommonErrorCode;
import com.alibaba.datax.common.exception.DataXException;
/**
* Created by jingxing on 14-8-24.
*/
public class StringColumn extends Column {
public StringColumn() {
this((String) null);
}
public StringColumn(final String rawData) {
super(rawData, Column.Type.STRING, (null == rawData ? 0 : rawData
.length()));
}
@Override
public String asString() {
if (null == this.getRawData()) {
return null;
}
return (String) this.getRawData();
}
private void validateDoubleSpecific(final String data) {
if ("NaN".equals(data) || "Infinity".equals(data)
|| "-Infinity".equals(data)) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("String[\"%s\"]属于Double特殊类型,不能转为其他类型 .", data));
}
return;
}
@Override
public BigInteger asBigInteger() {
if (null == this.getRawData()) {
return null;
}
this.validateDoubleSpecific((String) this.getRawData());
try {
return this.asBigDecimal().toBigInteger();
} catch (Exception e) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, String.format(
"String[\"%s\"]不能转为BigInteger .", this.asString()));
}
}
@Override
public Long asLong() {
if (null == this.getRawData()) {
return null;
}
this.validateDoubleSpecific((String) this.getRawData());
try {
BigInteger integer = this.asBigInteger();
OverFlowUtil.validateLongNotOverFlow(integer);
return integer.longValue();
} catch (Exception e) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("String[\"%s\"]不能转为Long .", this.asString()));
}
}
@Override
public BigDecimal asBigDecimal() {
if (null == this.getRawData()) {
return null;
}
this.validateDoubleSpecific((String) this.getRawData());
try {
return new BigDecimal(this.asString());
} catch (Exception e) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT, String.format(
"String [\"%s\"] 不能转为BigDecimal .", this.asString()));
}
}
@Override
public Double asDouble() {
if (null == this.getRawData()) {
return null;
}
String data = (String) this.getRawData();
if ("NaN".equals(data)) {
return Double.NaN;
}
if ("Infinity".equals(data)) {
return Double.POSITIVE_INFINITY;
}
if ("-Infinity".equals(data)) {
return Double.NEGATIVE_INFINITY;
}
BigDecimal decimal = this.asBigDecimal();
OverFlowUtil.validateDoubleNotOverFlow(decimal);
return decimal.doubleValue();
}
@Override
public Boolean asBoolean() {
if (null == this.getRawData()) {
return null;
}
if ("true".equalsIgnoreCase(this.asString())) {
return true;
}
if ("false".equalsIgnoreCase(this.asString())) {
return false;
}
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("String[\"%s\"]不能转为Bool .", this.asString()));
}
@Override
public Date asDate() {
try {
return ColumnCast.string2Date(this);
} catch (Exception e) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("String[\"%s\"]不能转为Date .", this.asString()));
}
}
@Override
public byte[] asBytes() {
try {
return ColumnCast.string2Bytes(this);
} catch (Exception e) {
throw DataXException.asDataXException(
CommonErrorCode.CONVERT_NOT_SUPPORT,
String.format("String[\"%s\"]不能转为Bytes .", this.asString()));
}
}
}
package com.alibaba.datax.common.exception;
import com.alibaba.datax.common.spi.ErrorCode;
/**
*
*/
public enum CommonErrorCode implements ErrorCode {
CONFIG_ERROR("Common-00", "您提供的配置文件存在错误信息,请检查您的作业配置 ."),
CONVERT_NOT_SUPPORT("Common-01", "同步数据出现业务脏数据情况,数据类型转换错误 ."),
CONVERT_OVER_FLOW("Common-02", "同步数据出现业务脏数据情况,数据类型转换溢出 ."),
RETRY_FAIL("Common-10", "方法调用多次仍旧失败 ."),
RUNTIME_ERROR("Common-11", "运行时内部调用错误 ."),
HOOK_INTERNAL_ERROR("Common-12", "Hook运行错误 ."),
SHUT_DOWN_TASK("Common-20", "Task收到了shutdown指令,为failover做准备"),
WAIT_TIME_EXCEED("Common-21", "等待时间超出范围"),
TASK_HUNG_EXPIRED("Common-22", "任务hung住,Expired");
private final String code;
private final String describe;
private CommonErrorCode(String code, String describe) {
this.code = code;
this.describe = describe;
}
@Override
public String getCode() {
return this.code;
}
@Override
public String getDescription() {
return this.describe;
}
@Override
public String toString() {
return String.format("Code:[%s], Describe:[%s]", this.code,
this.describe);
}
}
package com.alibaba.datax.common.plugin;
/**
* 这里只是一个标示类
* */
public interface PluginCollector {
}
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment