Python将文本逐行写入到文件(csv,txt)(collection.Counter) |
文章目录
- 一. 逐行写入到csv文件
- 二. 逐行写入到txt文件
- 三. 逐行写入到json文件
- 四. collections包介绍(Counter)
- 4.1. collection.Counter介绍
- 4.2. 例子
一. 逐行写入到csv文件
- 使用csv包: 如果一次写入数据量非常大,可能导致系统内存不足,下面是将数据逐行写入到文件中。
import csv
flag = False # 标志位,用来输出标题
a = np.array([[1, 2, 3], [2, 3, 4], [4, 5, 6], [7, 8, 9]])
with open("./one.csv", "w") as fw:
writer = csv.writer(fw)
for i in range(a.shape[0]):
result = {"col1": a[i, 0],
"col2": a[i, 1],
"col3": a[i, 2]}
print(result)
if not flag:
keys = result.keys()
writer.writerow(keys)
flag = True
writer.writerow(result.values())
- 输出结果
二. 逐行写入到txt文件
- 下面是逐行写入到csv文件或者txt文件中!
a = np.array([[1, 2, 3], [2, 3, 4], [4, 5, 6], [7, 8, 9]])
with open("./out.txt", "w", encoding="utf-8") as fw: # 可以写入txt文件
# with open("./out.csv", "w", encoding="utf-8") as fw: # 可以写入csv文件
for i in range(a.shape[0]):
# fw.write(str(a[i, 0]) + "\t" + str(a[i, 1]) + "\t" + str(a[i, 2]) + "\n")
fw.write(str(a[i, 0]) + "," + str(a[i, 1]) + "," + str(a[i, 2]) + "\n")
fw.close()
- 输出结果
三. 逐行写入到json文件
- 参考我之前的代码:javascript:void(0)
四. collections包介绍(Counter)
4.1. collection.Counter介绍
- Counter({‘blue’: 3, ‘red’: 2, ‘green’: 1})
{‘red’: 2, ‘blue’: 3, ‘green’: 1}javascript:void(0) - collections在python官方文档中的解释是High-performance container datatypes,直接的中文翻译解释高性能容量数据类型。它总共包含五种数据类型:
- 其中 Counter中文意思是计数器,也就是我们常用于统计的一种数据类型,在使用Counter之后可以让我们的代码更加简单易读。
- 我们先看一个简单的例子:
#统计词频
colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']
result = {}
for color in colors:
# 方式1
result[color] = result.get(color, 0) + 1 # get()函数返回指定键的值,如果值不在字典中返回默认值
# 方式2
# if result.get(color)==None:
# result[color]=1
# else:
# result[color]+=1
print (result)
#{'red': 2, 'blue': 3, 'green': 1}
- 下面我们看用Counter怎么实现:
from collections import Counter
colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']
c = Counter(colors)
print(c)
print (dict(c))
# Counter({'blue': 3, 'red': 2, 'green': 1})
# {'red': 2, 'blue': 3, 'green': 1}
- Counter(a).most_common(数字): 经过Counter()方法排列后,获取数量最多的元素及准确数量,如果most_common()的参数是2,则获取数量排在前两位的元素及具体数量。
from collections import Counter
colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']
c = Counter(colors)
print(c)
# print (dict(c))
c.most_common(2)
# Counter({'blue': 3, 'red': 2, 'green': 1})
# [('blue', 3), ('red', 2)]
4.2. 例子
- 例子:读文件统计词频并按照出现次数排序,文件是以空格隔开的单词的诸多句子:
from collections import Counter
lines = open("./data/input.txt","r").read().splitlines()
lines = [lines[i].split(" ") for i in range(len(lines))]
words = []
for line in lines:
words.extend(line)
result = Counter(words)
print (result.most_common(10))
- 当需要统计的文件比较大,使用read()一次读不完的情况:
from collections import Counter
result = Counter()
with open("./data/input.txt","r") as f:
while True:
lines = f.read(1024).splitlines()
if lines==[]:
break
lines = [lines[i].split(" ") for i in range(len(lines))]
words = []
for line in lines:
words.extend(line)
tmp = Counter(words)
result+=tmp
print (result.most_common(10))