Spark将数据压缩存储

夹胡碰

关注

阅读 63

2023-04-11


/tmp/dj/20170622.1498060818603为json数据
将数据压缩存储成parquet

val logs = spark.read.json("/tmp/dj/20170622.1498060818603")
//logs.coalesce(2).write.option("compression","gzip").json("/tmp/dj/json2")
logs.coalesce(2).write.parquet("/tmp/dj/parquet2")

读取parquet文件

val logs1 = spark.read.parquet("/tmp/dj/parquet2/*")
//now logs1 is DataFrame with some fields of previous json field


精彩评论(0)

0 0 举报