spark大数据分析:spark SQL (14)RDD , DataFrame,DataSet之间转换

阅读 68

2022-02-10



文章目录


  • ​​RDD 与DataFrame转换​​
  • ​​RDD 与dataSet 转换​​
  • ​​DataFrame 与 DataSet转换​​


RDD 与DataFrame转换

RDD 通过toDF函数转换 DataFrame

val rddData1 = spark.sparkContext.parallelize(Array(("Alice", "18", "Female"), ("Mathew", "20", "Male")))
val df1 = rddData1.toDF("name", "age", "sex")
df1.show

DataFrame 通过调用rdd方法转换为RDD

df1.rdd.collect

RDD 与dataSet 转换

RDD 通过toDS函数转换 DataFrame

import org.apache.spark.sql.SparkSession

object TestSQL2 {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local[*]")
.appName("test")
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
val rddData2 = spark.sparkContext.parallelize(Array(("Alice", "18", "Female"), ("Mathew", "20", "Male")))
val rddData3 = rddData2.map(t => User(t._1, t._2.toInt, t._3))
val ds1 = rddData3.toDS()
ds1.show
spark.stop()
}
}

case class User(name: String, age: Int, sex: String)

dataSet 通过调用rdd方法转换为RDD

ds1.rdd.count()

DataFrame 与 DataSet转换

val df2 = spark.createDataFrame(List(
("Alice", "Female", "20"),
("Tom", "Male", "25"),
("Boris", "Male", "18"))).toDF("name", "sex", "age")
val ds2 = df2.as[Person]
ds2.show
case class Person(name: String, age: String, sex: String)

Dataset 通过toDF DataFrame

ds2.toDF().show

由于DataSet数据强数据类型,DataFrame中数据转换DataSet时,对应column中要求个数,类型强一致



精彩评论(0)

0 0 举报