Pandas数据清洗：处理缺失值-CFANZ编程社区

在Pandas中，可以使用dropa方法条件过滤缺失值，用isnull标记哪些是缺失值，用notnull方法标记哪些不是缺失值，用fillna方法填充缺失值。

import pandas as pd

frame = pd.DataFrame([[1,2,3,None], [4,7,None,3], [None, None, None, None]])
frame.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 3 entries, 0 to 2
# Data columns (total 4 columns):
#  #   Column  Non-Null Count  Dtype  
# ---  ------  --------------  -----  
#  0   0       2 non-null      float64
#  1   1       2 non-null      float64
#  2   2       1 non-null      float64
#  3   3       1 non-null      float64
# dtypes: float64(4)
# memory usage: 224.0 bytes

# 删除
# 任意元素是空，整行删掉
frame.dropna(how='any')
# Empty DataFrame
# Columns: [0, 1, 2, 3]
# Index: []

# 一行里全部是NaN，即删除整行
frame.dropna(how='all')
# 0    1    2    3
# 0  1.0  2.0  3.0  NaN
# 1  4.0  7.0  NaN  3.0

# 哪些数据是缺失值
frame.isnull()
# 0 1   2   3
# 0 False   False   False   True
# 1 False   False   True    False
# 2 True    True    True    True

# 常数填充
frame.fillna(1)
#     0 1   2   3
# 0 1.0 2.0 3.0 1.0
# 1 4.0 7.0 1.0 3.0
# 2 1.0 1.0 1.0 1.0

# 字典填充
# 0列填充1，1列填充5，2列填充10，3列填充20
frame.fillna({0:1, 1:5, 2:10, 3:20})
#   0   1   2   3
# 0 1.0 2.0 3.0 20.0
# 1 4.0 7.0 10.0    3.0
# 2 1.0 5.0 10.0    20.0

# 以列为单位，填充之前出现的值
frame.fillna(method='ffill')
# 0 1   2   3
# 0 1.0 2.0 3.0 NaN
# 1 4.0 7.0 3.0 3.0
# 2 4.0 7.0 3.0 3.0

# 对于缺失值，用前一个非缺失值去填充该缺失值
frame.fillna(method='pad')

# 直接替换原始数据
frame.fillna(1, inplace=True)
frame
#     0 1   2   3
# 0 1.0 2.0 3.0 1.0
# 1 4.0 7.0 1.0 3.0
# 2 1.0 1.0 1.0 1.0

参考：https://www.pypandas.cn/docs/user_guide/missing_data.html#values-considered-missing