Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Q
qg-dbc-spark
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
data-spider
qg-dbc-spark
Commits
8cb9fb73
Commit
8cb9fb73
authored
Dec 27, 2019
by
data爬虫-冯 军凯
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
transactionLog 数据清洗 main类修改12345678912
parent
b296ffe7
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
1 deletion
+3
-1
CleanningTransactionLogMain.java
...dbc/spark/transactionlog/CleanningTransactionLogMain.java
+3
-1
No files found.
src/main/java/cn/quantgroup/dbc/spark/transactionlog/CleanningTransactionLogMain.java
View file @
8cb9fb73
...
@@ -27,6 +27,7 @@ public class CleanningTransactionLogMain {
...
@@ -27,6 +27,7 @@ public class CleanningTransactionLogMain {
System
.
out
.
println
(
"读取hdfsPath完毕: "
+
JSON
.
toJSONString
(
hdfsArr
));
System
.
out
.
println
(
"读取hdfsPath完毕: "
+
JSON
.
toJSONString
(
hdfsArr
));
Dataset
<
String
>
dataset
=
ss
.
read
().
textFile
(
hdfsArr
);
Dataset
<
String
>
dataset
=
ss
.
read
().
textFile
(
hdfsArr
);
System
.
out
.
println
(
"dataset: "
+
dataset
.
count
());
dataset
.
repartition
(
4
).
foreachPartition
(
func
->
{
dataset
.
repartition
(
4
).
foreachPartition
(
func
->
{
System
.
out
.
println
(
"开始执行数据清洗"
);
System
.
out
.
println
(
"开始执行数据清洗"
);
ArrayList
<
TransactionLog
>
transactionLogs
=
new
ArrayList
<>();
ArrayList
<
TransactionLog
>
transactionLogs
=
new
ArrayList
<>();
...
@@ -44,12 +45,13 @@ public class CleanningTransactionLogMain {
...
@@ -44,12 +45,13 @@ public class CleanningTransactionLogMain {
transactionLog
.
setUuid
(
split
[
1
]);
transactionLog
.
setUuid
(
split
[
1
]);
transactionLog
.
setUrl_type
(
split
[
2
]);
transactionLog
.
setUrl_type
(
split
[
2
]);
transactionLog
.
setUpdated_at
(
timestamp
);
transactionLog
.
setUpdated_at
(
timestamp
);
transactionLogs
.
add
(
transactionLog
);
if
(
transactionLogs
.
size
()
!=
0
&&
transactionLogs
.
size
()
%
200
==
0
)
{
if
(
transactionLogs
.
size
()
!=
0
&&
transactionLogs
.
size
()
%
200
==
0
)
{
System
.
out
.
println
(
"执行sql集合: "
+
transactionLogs
.
size
());
System
.
out
.
println
(
"执行sql集合: "
+
transactionLogs
.
size
());
JdbcExecuters
.
prepareBatchUpdateExecuteTransactionid
(
sql
,
transactionLogs
);
JdbcExecuters
.
prepareBatchUpdateExecuteTransactionid
(
sql
,
transactionLogs
);
transactionLogs
.
clear
();
transactionLogs
.
clear
();
}
}
}
catch
(
Exception
e
)
{
}
catch
(
Exception
e
)
{
System
.
out
.
println
(
"单个数据拼装异常: "
+
item
);
System
.
out
.
println
(
"单个数据拼装异常: "
+
item
);
e
.
printStackTrace
();
e
.
printStackTrace
();
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment