『Python核心技术与实战』Python将文本逐行写入到文件(csv,txt)(collection.Counter)-CFANZ编程社区

Python将文本逐行写入到文件(csv,txt)(collection.Counter)

文章目录

一. 逐行写入到csv文件
二. 逐行写入到txt文件
三. 逐行写入到json文件
四. collections包介绍(Counter)

4.1. collection.Counter介绍
4.2. 例子

一. 逐行写入到csv文件

使用csv包：如果一次写入数据量非常大，可能导致系统内存不足，下面是将数据逐行写入到文件中。

import csv

flag = False # 标志位，用来输出标题
a = np.array([[1, 2, 3], [2, 3, 4], [4, 5, 6], [7, 8, 9]])
with open("./one.csv", "w") as fw:
    writer = csv.writer(fw)
    for i in range(a.shape[0]):
        result = {"col1": a[i, 0],
                  "col2": a[i, 1],
                  "col3": a[i, 2]}
        print(result)
        if not flag:
            keys = result.keys()
            writer.writerow(keys)
            flag = True
        writer.writerow(result.values())

输出结果

『Python核心技术与实战』Python将文本逐行写入到文件(csv,txt)(collection.Counter)_txt文件

二. 逐行写入到txt文件

下面是逐行写入到csv文件或者txt文件中！

a = np.array([[1, 2, 3], [2, 3, 4], [4, 5, 6], [7, 8, 9]])
with open("./out.txt", "w", encoding="utf-8") as fw: # 可以写入txt文件
# with open("./out.csv", "w", encoding="utf-8") as fw:  # 可以写入csv文件
    for i in range(a.shape[0]):
#         fw.write(str(a[i, 0]) + "\t" + str(a[i, 1]) + "\t" + str(a[i, 2]) + "\n")
        fw.write(str(a[i, 0]) + "," + str(a[i, 1]) + "," + str(a[i, 2]) + "\n")
    fw.close()

输出结果

『Python核心技术与实战』Python将文本逐行写入到文件(csv,txt)(collection.Counter)_csv_02

三. 逐行写入到json文件

参考我之前的代码：javascript:void(0)

四. collections包介绍(Counter)

4.1. collection.Counter介绍

Counter({‘blue’: 3, ‘red’: 2, ‘green’: 1})
{‘red’: 2, ‘blue’: 3, ‘green’: 1}javascript:void(0)
collections在python官方文档中的解释是High-performance container datatypes，直接的中文翻译解释高性能容量数据类型。它总共包含五种数据类型：

『Python核心技术与实战』Python将文本逐行写入到文件(csv,txt)(collection.Counter)_csv_03

其中 Counter中文意思是计数器，也就是我们常用于统计的一种数据类型，在使用Counter之后可以让我们的代码更加简单易读。
我们先看一个简单的例子：

#统计词频
colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']
result = {}
for color in colors:
    # 方式1
    result[color] = result.get(color, 0) + 1 # get()函数返回指定键的值，如果值不在字典中返回默认值
    # 方式2
    # if result.get(color)==None:
    #     result[color]=1
    # else:
    #     result[color]+=1
print (result)
#{'red': 2, 'blue': 3, 'green': 1}

下面我们看用Counter怎么实现：

from collections import Counter

colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']
c = Counter(colors)
print(c)
print (dict(c))
# Counter({'blue': 3, 'red': 2, 'green': 1})
# {'red': 2, 'blue': 3, 'green': 1}

Counter(a).most_common(数字)： 经过Counter()方法排列后，获取数量最多的元素及准确数量，如果most_common()的参数是2，则获取数量排在前两位的元素及具体数量。

from collections import Counter

colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']
c = Counter(colors)
print(c)
# print (dict(c))
c.most_common(2)

# Counter({'blue': 3, 'red': 2, 'green': 1})
# [('blue', 3), ('red', 2)]

4.2. 例子

例子：读文件统计词频并按照出现次数排序，文件是以空格隔开的单词的诸多句子：

from collections import Counter
lines = open("./data/input.txt","r").read().splitlines()
lines = [lines[i].split(" ") for i in range(len(lines))]
words = []
for line in lines:
    words.extend(line)
result = Counter(words)
print (result.most_common(10))

当需要统计的文件比较大，使用read()一次读不完的情况：

from collections import Counter
result = Counter()
with open("./data/input.txt","r") as f:
    while True:
        lines = f.read(1024).splitlines()
        if lines==[]:
            break
        lines = [lines[i].split(" ") for i in range(len(lines))]
        words = []
        for line in lines:
            words.extend(line)
        tmp = Counter(words)
        result+=tmp

print (result.most_common(10))