ScalaNote18-集合的应用-CFANZ编程社区

前面介绍了各种集合和对应的增改删查的操作，本文主要介绍集合的其他操作。

map映射操作

先看个实际案例：请将List(3,5,7) 中的所有元素都 * 2 ，将其结果放到一个新的集合中返回，即返回一个新的List(6,10,14)。在Python里可以直接用列表推导式完成，Scala中也可以用循环的方式完成，看一个繁琐的方法：

import scala.collection.mutable.ListBuffer  
val list1 = List(3,5,7)
val list2 = ListBuffer[Any]()
list1.foreach((i=>list2.append(i*2)))
println("list2 = "+list2)

list2 = ListBuffer(6, 10, 14)





import scala.collection.mutable.ListBuffer
list1: List[Int] = List(3, 5, 7)
list2: scala.collection.mutable.ListBuffer[Any] = ListBuffer(6, 10, 14)

val list1 = List(3, 5, 7)
var list2 = List[Int]()
for (item <- list1) { //遍历
list2 = list2 :+ item * 2
}
println(list2)

List(6, 10, 14)





list1: List[Int] = List(3, 5, 7)
list2: List[Int] = List(6, 10, 14)

在Scala中可以通过map映射操作来解决：将集合中的每一个元素通过指定功能（函数）映射（转换）成新的结果集合这里其实就是所谓的将函数作为参数传递给另外一个函数,这是函数式编程的特点!

def f1(x:Int)={
    x*2
}
val list1 = List(3, 5, 7)
val list2 = list1.map(f1)
println(list2)

List(6, 10, 14)





f1: (x: Int)Int
list1: List[Int] = List(3, 5, 7)
list2: List[Int] = List(6, 10, 14)

map函数Pandas里好像也有
map是所谓的高阶函数，即可以接受函数作为参数
上面的case需要注意的是f1的参数和List的类型要一致

flatmap映射

flatmap：flat即压扁，压平，扁平化，效果就是将集合中的每个元素的子元素映射到某个函数并返回新的集合。看个Demo:

val names = List("Alice", "Bob", "Nick")
def upper( s : String ) : String = {
    s. toUpperCase //首字母大写
}
println(names.map(upper)) 
//注意：每个字符串也是char集合
println(names.flatMap(upper))

List(ALICE, BOB, NICK)
List(A, L, I, C, E, B, O, B, N, I, C, K)





names: List[String] = List(Alice, Bob, Nick)
upper: (s: String)String

这里有个问题，字符串和字符串元素本身类型都是String，所以做flatMap时，函数的参数类型是一致的。

过滤

过滤是指将符合要求的数据(筛选)放置到新的集合中!符合要求一般是函数返回结果为true。看个两个小Demo:

// 筛选首字母为A的单词
val names = List("Alice", "Bob", "Nick")
def startA(s:String): Boolean = {
s.startsWith("A")
}
val names2 = names.filter(startA)
println("names=" + names2)
// 筛选奇数  
val num = List(1,2,3,4,5)
def isOdd(x:Int)={
    x%2==1
}
println(num.filter(isOdd))

names=List(Alice)
List(1, 3, 5)





names: List[String] = List(Alice, Bob, Nick)
startA: (s: String)Boolean
names2: List[String] = List(Alice)
num: List[Int] = List(1, 2, 3, 4, 5)
isOdd: (x: Int)Boolean

化简

化简：将二元函数引用于集合中的函数!化简常用的是reduceLeft和reduceRight，即从左OR右开始，先计算第一个元素和第二个元素，计算后的结果作为第一个元素，和后面的元素再传入函数中。看个Demo:

val arr = Array(1,3,5)
def sum(x1:Int,x2:Int)={
    x1+x2
}
def min(x1:Int,x2:Int)={
    if(x1>x2) x2 else x1
}
println("arr.reduceLeft(sum) = "+arr.reduceLeft(sum))
println("arr.reduceLeft(min) = "+arr.reduceLeft(min))

arr.reduceLeft(sum) = 9
arr.reduceLeft(min) = 1





arr: Array[Int] = Array(1, 3, 5)
sum: (x1: Int, x2: Int)Int
min: (x1: Int, x2: Int)Int

折叠

感觉折叠和化简差不多啊，fold函数将上一步返回的值作为函数的第一个参数继续传递参与运算，直到list中的所有元素被遍历。常用的函数有：flodLeft和flodRight。看个Demo:

val arr = Array(1,3,5)
def sum(x1:Int,x2:Int)={
    x1+x2
}

println("arr.foldLeft(5)(sum) = "+arr.foldLeft(5)(sum))

arr.foldLeft(5)(sum) = 14





arr: Array[Int] = Array(1, 3, 5)
sum: (x1: Int, x2: Int)Int

上面是把5作为第一个元素传给sum函数
上面的操作等价于(5 /: arr)(sum)
foldRight则等于(arr:\3)(sum)
吐槽下这些花里胡哨的操作符哦

扫描

扫描，即对某个集合的所有元素做fold操作，但是会把产生的所有中间结果放置于一个集合中保存。这玩意倒是可以用来计算累计频数

val arr = Array(1,3,5)
def sum(x1:Int,x2:Int)={
    x1+x2
}
arr.scanLeft(0)(sum)

arr: Array[Int] = Array(1, 3, 5)
sum: (x1: Int, x2: Int)Int
res45: Array[Int] = Array(0, 1, 4, 9)

拉链

在开发中，当我们需要将两个集合进行对偶元组合并，可以使用拉链。Python好像也有这种用法，看个Demo:

// 拉链
val list1 = List(1, 2 ,3)
val list2 = List(4, 5,6)
val list3 = list1.zip(list2) // (1,4),(2,5),(3,6) 
println("list3=" + list3)

list3=List((1,4), (2,5), (3,6))





list1: List[Int] = List(1, 2, 3)
list2: List[Int] = List(4, 5, 6)
list3: List[(Int, Int)] = List((1,4), (2,5), (3,6))

拉链的本质就是两个集合的合并操作，合并后每个元素是一个对偶元组
如果两个集合个数不对应，会造成数据丢失
集合不限于List, 也可以是其它集合比如 Array
如果要取出合并后的各个对偶元组的数据，可以遍历

list3.foreach(i=>print(i+","))
println()
list3.foreach(i=>print(i._1+i._2+","))

(1,4),(2,5),(3,6),
5,7,9,

遍历出的元素是元组，如果取元组里元素，用._1和._2

迭代器

迭代器Python中也有，直接看Scala中的应用吧

var iterator = List(1, 2, 3, 4, 5).iterator // 得到迭代器
    println("-------- 1 -----------------")
    while (iterator.hasNext) {
        println(iterator.next())
    }
    println("-------- 2 -----------------")
val iterator1 = List(1, 2, 3, 4, 5).iterator // 得到迭代器
    for(enum <- iterator1) {
      println(enum)
    }

-------- 1 -----------------
1
2
3
4
5
-------- 2 -----------------
1
2
3
4
5





iterator: Iterator[Int] = empty iterator
iterator1: Iterator[Int] = empty iterator

hasNext方法返回true false
迭代器执行完之后，不可以复用。需要新建

视图View

view方法产出一个总是被懒执行的集合,view不会缓存数据，每次都要重新计算，比如遍历View时!看一个实际案例：

def double2(x:Int)={x*2}
val viewSquares2 = (1 to 5).view.map(double2)
println(viewSquares2)
viewSquares2.foreach(i=>println("item = "+ i))

SeqViewM(...)
item = 2
item = 4
item = 6
item = 8
item = 10





double2: (x: Int)Int
viewSquares2: scala.collection.SeqView[Int,Seq[_]] = SeqViewM(...)

println是相当于并没有完全执行代码
再进行遍历时，开始执行代码

并行

Scala为了充分使用多核CPU，提供了并行集合（有别于前面的串行集合），用于多核环境的并行计算！这个是不是有点像多线程操作？看个例子：

(1 to 5).foreach(println(_))
println("--------------------")
(1 to 5).par.foreach(println(_))

1
2
3
4
5
--------------------
1
2
3
4
5

可以看到并行print的结果不是依次顺序的，再看一个例子，查看并行集合中元素访问的线程：

val result1 = (0 to 100).map{case _ => Thread.currentThread.getName}.distinct
val result2 = (0 to 100).par.map{case _ => Thread.currentThread.getName}.distinct
println(result1,result1.length)
println(result2,result2.length)

(Vector(Thread-3),1)
(ParVector(ForkJoinPool-1-worker-7, ForkJoinPool-1-worker-1, ForkJoinPool-1-worker-25, ForkJoinPool-1-worker-13, ForkJoinPool-1-worker-31, ForkJoinPool-1-worker-17, ForkJoinPool-1-worker-27, ForkJoinPool-1-worker-19, ForkJoinPool-1-worker-23, ForkJoinPool-1-worker-3, ForkJoinPool-1-worker-29),11)





result1: scala.collection.immutable.IndexedSeq[String] = Vector(Thread-3)
result2: scala.collection.parallel.immutable.ParSeq[String] = ParVector(ForkJoinPool-1-worker-7, ForkJoinPool-1-worker-1, ForkJoinPool-1-worker-25, ForkJoinPool-1-worker-13, ForkJoinPool-1-worker-31, ForkJoinPool-1-worker-17, ForkJoinPool-1-worker-27, ForkJoinPool-1-worker-19, ForkJoinPool-1-worker-23, ForkJoinPool-1-worker-3, ForkJoinPool-1-worker-29)

我的电脑貌似12核~

2020-03-14 于南京市栖霞区