使用openpyxl读写excel文件-CFANZ编程社区

读写excel是日常工作中的高频操作，openpyxl模块支持对后缀为xlsx的excel文件进行读写，注意，它不支持后缀为xls的文件。

所有模块对excel进行操作，都会按照如下所示的层级进行处理

1. workbook

2. sheet

3. row

4. column

5. cell

workbook称之为工作薄，是读写excel的第一步，一个excel文件对应1个工作博；sheet是excel表格中的各个子集，1个excel表格可以有多个sheet; row和column分别表示从行和列的角度处理excel表格；cell表示一个具体的单元格。

按照上述分类，读取excel的常用操作如下

1. 创建workbook

通过laod_workbook方法来读取excel文件，创建工作薄，代码如下

>>> from openpyxl import load_workbook
>>> wb = load_workbook('input.xlsx')

2. 读取sheet

在工作薄的基础上，通过sheetnames属性可以获得各个sheet的名称，然后用字典的方式来访问各个sheet, 代码如下

>>> wb.sheetnames
['raw_genotype', 'format_genotype', 'input_NST']

>>> wb['raw_genotype']
<Worksheet "raw_genotype">

3. 读取row

通过代表行数的下标可以对整行进行访问，代码如下

>>> ws = wb['raw_genotype']

>>> ws[1]
(<Cell 'raw_genotype'.A1>, <Cell 'raw_genotype'.B1>, <Cell 'raw_genotype'.C1>)
# 切片操作
>>> ws[1:3]
((<Cell 'raw_genotype'.A1>, <Cell 'raw_genotype'.B1>, <Cell 'raw_genotype'.C1>), (<Cell 'raw_genotype'.A2>, <Cell 'raw_genotype'.B2>, <Cell 'raw_genotype'.C2>), (<Cell 'raw_genotype'.A3>, <Cell 'raw_genotype'.B3>, <Cell 'raw_genotype'.C3>))

如果需要按行遍历，可以通过iter_rows方法来操作，默认遍历所有的行，也可以通过参数限定行和列的范围，代码如下

>>> for row in ws.iter_rows():
...     print(row)

>>> for row in ws.iter_rows(min_row=1, max_row=3):
...     print(row)

>>> for row in ws.iter_rows(min_row=1, max_row=3, min_col=2, max_col=4):
...     print(row)

4. 读取column

与行对应，通过列名来访问整列，代码如下

>>> ws['A']
(<Cell 'raw_genotype'.A1>, <Cell 'raw_genotype'.A2>, <Cell 'raw_genotype'.A3>)
# 切片操作
>>> ws['A':'C']
((<Cell 'raw_genotype'.A1>, <Cell 'raw_genotype'.A2>, <Cell 'raw_genotype'.A3>), (<Cell 'raw_genotype'.B1>, <Cell 'raw_genotype'.B2>, <Cell 'raw_genotype'.B3>), (<Cell 'raw_genotype'.C1>, <Cell 'raw_genotype'.C2>, <Cell 'raw_genotype'.C3>>))

遍历列的方法为iter_cols, 代码如下

>>> for col in ws.iter_cols():
... print(col)

>>> for col in ws.iter_cols(min_col=1, max_col=3):
... print(col)

>>> for col in ws.iter_cols(min_col=1, max_col=3, min_row=2, max_row=4):
... print(col)

5. 读取cell

读取单元格有以下两种方式，通过方括号的索引，或者cell方法的数字下标，代码如下

# 访问cell
>>> ws['E2']
<Cell 'raw_genotype'.E2>
# 访问cell
>>> ws.cell(row=2, column=5)
<Cell 'raw_genotype'.E2>

对于单元格，我们的常用操作是获取和设置其值，代码如下

# 访问cell的值
>>> ws['E2'].value
'Allele 2'
>>> ws.cell(row=2, column=5).value
'Allele 2'
# 设置cell的值
>>> ws['E2'] = 3
>>> ws.cell(row=2, column=5).value = 10

将以上基本操作进行组合，就可以快速的读取一个excel文件，获得我们需要的信息，模板如下

from openpyxl import load_workbook
wb = load_workbook('input.xlsx')
ws = wb['sheet1']

for row in ws.iter_rows():
    for cell in row:
        print(cell.value)

wb.close()

对于写入excel而言，需要在workbook和sheet的生成过程中，稍加改变，模板如下

from openpyxl import Workbook
wb = Workbook()
ws = wb.create_sheet()
for row in range(10):
    ws.append([i for i in range(10)])

wb.save('out.xlsx')

写入excel时，首先创建workbook, 然后通过create_sheet创建sheet, append方法用于追加一行的内容，当然也可以依次指定各个单元格，最后用save方法保存为excel文件。

以上就是基本的读写excel的技巧，除此之外，该模块也支持excel表格的合并/拆分，插入图片，设置单元格样式等个性化操作，更多详细用法请参考官方文档。

·end·

使用openpyxl读写excel文件_excel表格

一个只分享干货的

生信公众号