spark 一个dataframe的两个列的编辑距离join

慕犹清

关注

阅读 67

2022-07-27


import org.apache.spark.sql.functions

val actualDF = sourceDF.withColumn(
"word1_word2_levenshtein",
functions.levenshtein(sourceDF.col("word1"), sourceDF.col("word2"))
)

actualDF.show()

+------+-------+-----------------------+
| word1| word2|word1_word2_levenshtein|
+------+-------+-----------------------+
| blah| blah| 0|
| cat| bat| 1|
| phat| fat| 2|
|kitten|sitting| 3|
+------+-------+-----------------------+


精彩评论(0)

0 0 举报