Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
D
DataX
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
risk-feature
DataX
Commits
e0d6d661
Commit
e0d6d661
authored
Jun 07, 2018
by
云时
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'master' of
https://github.com/alibaba/DataX
parents
e47dc1d7
068b1db9
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
49 additions
and
50 deletions
+49
-50
README.md
README.md
+24
-22
DFSUtil.java
...a/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java
+1
-6
introduction.md
introduction.md
+21
-20
userGuid.md
userGuid.md
+3
-2
No files found.
README.md
View file @
e0d6d661
...
@@ -16,7 +16,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
...
@@ -16,7 +16,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
# DataX详细介绍
# DataX详细介绍
##### 请参考:[DataX-Introduction](https://github.com/alibaba/DataX/
wiki/DataX-Introduction
)
##### 请参考:[DataX-Introduction](https://github.com/alibaba/DataX/
blob/master/introduction.md
)
...
@@ -28,31 +28,32 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
...
@@ -28,31 +28,32 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
# Support Data Channels
# Support Data Channels
DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:
[
DataX数据源参考指南
](
https://github.com/alibaba/DataX/wiki/DataX-all-data-channels
)
DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:
[
DataX数据源参考指南
](
https://github.com/alibaba/DataX/wiki/DataX-all-data-channels
)
| 类型 | 数据源 | Reader(读) | Writer(写) |文档|
| 类型 | 数据源 | Reader(读) | Writer(写) |文档|
| ------------ | ---------- | :-------: | :-------: |:-------: |
| ------------ | ---------- | :-------: | :-------: |:-------: |
| RDBMS 关系型数据库 | MySQL | √ | √ |!
[
读
](
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
)
、!
[
写
](
https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md
)
|
| RDBMS 关系型数据库 | MySQL | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md
)
|
| | Oracle | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | Oracle | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md
)
|
| | SQLServer | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | SQLServer | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md
)
|
| | PostgreSQL | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | PostgreSQL | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md
)
|
| | DRDS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | DRDS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md
)
|
| | 达梦 | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | 达梦 | √ | √ |
[
读
](
)
、
[
写
](
)|
| | 通用RDBMS(支持所有关系型数据库) | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | 通用RDBMS(支持所有关系型数据库) | √ | √ |
[
读
](
)
、
[
写
](
)|
| 阿里云数仓数据存储 | ODPS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| 阿里云数仓数据存储 | ODPS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md
)
|
| | ADS | | √ |!
[
读
](
)
、!
[
写
](
)|
| | ADS | | √ |
[
写
](
https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md
)
|
| | OSS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | OSS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md
)
|
| | OCS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | OCS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/ocsreader/doc/ocsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md
)
|
| NoSQL数据存储 | OTS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| NoSQL数据存储 | OTS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md
)
|
| | Hbase0.94 | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | Hbase0.94 | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md
)
|
| | Hbase1.1 | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | Hbase1.1 | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md
)
|
| | MongoDB | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | MongoDB | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/mongoreader/doc/mongoreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/mongowriter/doc/mongowriter.md
)
|
| 无结构化数据存储 | TxtFile | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | Hive | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md
)
|
| | FTP | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| 无结构化数据存储 | TxtFile | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md
)
|
| | HDFS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | FTP | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md
)
|
| | Elasticsearch | | √ |!
[
读
](
)
、!
[
写
](
)|
| | HDFS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md
)
|
| | Elasticsearch | | √ |
[
写
](
https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md
)
|
# 我要开发新的插件
# 我要开发新的插件
请点击:
[
DataX插件开发宝典
](
https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md
)
请点击:
[
DataX插件开发宝典
](
https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md
)
...
@@ -105,6 +106,7 @@ This software is free to use under the Apache License [Apache license](https://g
...
@@ -105,6 +106,7 @@ This software is free to use under the Apache License [Apache license](https://g
```
`
```
`
钉钉用户请扫描以下二维码进行讨论:
钉钉用户请扫描以下二维码进行讨论:


hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java
View file @
e0d6d661
...
@@ -486,15 +486,10 @@ public class DFSUtil {
...
@@ -486,15 +486,10 @@ public class DFSUtil {
}
}
private
int
getAllColumnsCount
(
String
filePath
)
{
private
int
getAllColumnsCount
(
String
filePath
)
{
int
columnsCount
;
final
String
colFinal
=
"_col"
;
Path
path
=
new
Path
(
filePath
);
Path
path
=
new
Path
(
filePath
);
try
{
try
{
Reader
reader
=
OrcFile
.
createReader
(
path
,
OrcFile
.
readerOptions
(
hadoopConf
));
Reader
reader
=
OrcFile
.
createReader
(
path
,
OrcFile
.
readerOptions
(
hadoopConf
));
String
type_struct
=
reader
.
getObjectInspector
().
getTypeName
();
return
reader
.
getTypes
().
get
(
0
).
getSubtypesCount
();
columnsCount
=
(
type_struct
.
length
()
-
type_struct
.
replace
(
colFinal
,
""
).
length
())
/
colFinal
.
length
();
return
columnsCount
;
}
catch
(
IOException
e
)
{
}
catch
(
IOException
e
)
{
String
message
=
"读取orcfile column列数失败,请联系系统管理员"
;
String
message
=
"读取orcfile column列数失败,请联系系统管理员"
;
throw
DataXException
.
asDataXException
(
HdfsReaderErrorCode
.
READ_FILE_ERROR
,
message
);
throw
DataXException
.
asDataXException
(
HdfsReaderErrorCode
.
READ_FILE_ERROR
,
message
);
...
...
introduction.md
View file @
e0d6d661
...
@@ -34,25 +34,26 @@ DataX本身作为离线数据同步框架,采用Framework + plugin架构构建
...
@@ -34,25 +34,26 @@ DataX本身作为离线数据同步框架,采用Framework + plugin架构构建
| 类型 | 数据源 | Reader(读) | Writer(写) |文档|
| 类型 | 数据源 | Reader(读) | Writer(写) |文档|
| ------------ | ---------- | :-------: | :-------: |:-------: |
| ------------ | ---------- | :-------: | :-------: |:-------: |
| RDBMS 关系型数据库 | MySQL | √ | √ |!
[
读
](
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
)
、!
[
写
](
https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md
)
|
| RDBMS 关系型数据库 | MySQL | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md
)
|
| | Oracle | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | Oracle | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md
)
|
| | SQLServer | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | SQLServer | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md
)
|
| | PostgreSQL | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | PostgreSQL | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md
)
|
| | DRDS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | DRDS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md
)
|
| | 达梦 | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | 达梦 | √ | √ |
[
读
](
)
、
[
写
](
)|
| | 通用RDBMS(支持所有关系型数据库) | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | 通用RDBMS(支持所有关系型数据库) | √ | √ |
[
读
](
)
、
[
写
](
)|
| 阿里云数仓数据存储 | ODPS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| 阿里云数仓数据存储 | ODPS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/odpsswriter/doc/odpswriter.md
)
|
| | ADS | | √ |!
[
读
](
)
、!
[
写
](
)|
| | ADS | | √ |
[
写
](
https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md
)
|
| | OSS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | OSS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md
)
|
| | OCS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | OCS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/ocsreader/doc/ocsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md
)
|
| NoSQL数据存储 | OTS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| NoSQL数据存储 | OTS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md
)
|
| | Hbase0.94 | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | Hbase0.94 | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md
)
|
| | Hbase1.1 | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | Hbase1.1 | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md
)
|
| | MongoDB | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | MongoDB | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/mongoreader/doc/mongoreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/mongowriter/doc/mongowriter.md
)
|
| 无结构化数据存储 | TxtFile | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | Hive | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md
)
|
| | FTP | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| 无结构化数据存储 | TxtFile | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md
)
|
| | HDFS | √ | √ |!
[
读
](
)
、!
[
写
](
)|
| | FTP | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md
)
|
| | Elasticsearch | | √ |!
[
读
](
)
、!
[
写
](
)|
| | HDFS | √ | √ |
[
读
](
https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md
)
、
[
写
](
https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md
)
|
| | Elasticsearch | | √ |
[
写
](
https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md
)
|
DataX Framework提供了简单的接口与插件交互,提供简单的插件接入机制,只需要任意加上一种插件,就能无缝对接其他数据源。详情请看:
[
DataX数据源指南
](
https://github.com/alibaba/DataX/wiki/DataX-all-data-channels
)
DataX Framework提供了简单的接口与插件交互,提供简单的插件接入机制,只需要任意加上一种插件,就能无缝对接其他数据源。详情请看:
[
DataX数据源指南
](
https://github.com/alibaba/DataX/wiki/DataX-all-data-channels
)
...
@@ -147,4 +148,4 @@ DataX 3.0 开源版本支持单机多线程模式完成同步作业运行,本
...
@@ -147,4 +148,4 @@ DataX 3.0 开源版本支持单机多线程模式完成同步作业运行,本
- 在任务结束之后,打印总体运行情况
- 在任务结束之后,打印总体运行情况
!
[
datax_end_info
](
https://cloud.githubusercontent.com/assets/1067175/17850930/0484d3ac-6892-11e6-9c1d-b102ad210a32.png
)

\ No newline at end of file
userGuid.md
View file @
e0d6d661
...
@@ -17,7 +17,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
...
@@ -17,7 +17,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
*
工具部署
*
工具部署
*
方法一、直接下载DataX工具包:
[
DataX
](
https://github.com/alibaba/DataX
)
*
方法一、直接下载DataX工具包:
[
DataX
下载地址
](
http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
)
下载后解压至本地某个目录,进入bin目录,即可运行同步作业:
下载后解压至本地某个目录,进入bin目录,即可运行同步作业:
...
@@ -25,7 +25,8 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
...
@@ -25,7 +25,8 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
$
cd
{
YOUR_DATAX_HOME
}
/bin
$
cd
{
YOUR_DATAX_HOME
}
/bin
$
python datax.py
{
YOUR_JOB.json
}
$
python datax.py
{
YOUR_JOB.json
}
```
```
自检脚本:
python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json
*
方法二、下载DataX源码,自己编译:
[
DataX源码
](
https://github.com/alibaba/DataX
)
*
方法二、下载DataX源码,自己编译:
[
DataX源码
](
https://github.com/alibaba/DataX
)
(1)、下载DataX源码:
(1)、下载DataX源码:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment