0
点赞
收藏
分享

微信扫一扫

Spark将数据压缩存储


/tmp/dj/20170622.1498060818603为json数据
将数据压缩存储成parquet

val logs = spark.read.json("/tmp/dj/20170622.1498060818603")
//logs.coalesce(2).write.option("compression","gzip").json("/tmp/dj/json2")
logs.coalesce(2).write.parquet("/tmp/dj/parquet2")

读取parquet文件

val logs1 = spark.read.parquet("/tmp/dj/parquet2/*")
//now logs1 is DataFrame with some fields of previous json field


举报

相关推荐

0 条评论