如何转换Data<Row> в javaRDD<User>?
我试图从数据库中提取一个包含大量列的列表并遇到这样一个问题,我必须指定每列的位置及其值,如果没有这一切,我怎么能得到它?
这是我的代码
SparkConf sparkConf = new SparkConf()
.setMaster("local[*]")
.setAppName("test");
SparkContext sparkContext = new SparkContext(sparkConf);
SparkSession session = new SparkSession(sparkContext);
DataFrameReader dataFrameReader = session
.read()
.format("jdbc")
.option("url", "jdbc:postgresql://10.100.0.4:5432/refund_service")
.option("driver", "org.postgresql.Driver")
.option("dbtable", "refunds")
.option("user", "smartplaza")
.option("password", "smartplaza");
Encoder<Something> somethingEncoder = Encoders.bean(Refund.class);
Dataset<Row> response = dataFrameReader.load();
JavaRDD<Something> rsomethingJavaRDD = response.javaRDD().map(new Function<Row, Refund>() {
@Override
public Something call(Row row) throws Exception {
return new Something(row.getLong(0),
row.getTimestamp(1), row.getTimestamp(2),row.getTimestamp(3),
row.getDouble(4),
row.getDouble(5),
row.getDouble(6),
row.getLong(7),
row.getLong(8),
row.getDouble(9),
row.getLong(10),
row.getDouble(11),
(Long) row.get(12),
(Long) row.get(13),
(Long) row.get(14),
row.getBoolean(15),
(Long) row.get(16),
(Long) row.get(17),
row.getDouble(18),
row.getDouble(19),
(Long) row.get(20),
(Long)row.get(21));
}
});
Dataset<Something> somethingDataset = session.createDataset(refundJavaRDD.rdd(),somethingEncoder);
您只需要使用文档
.as(encoder)中描述的功能。就像是: