0
点赞
收藏
分享

微信扫一扫

[lightGBM]使用LightGBM来实现恶意软件识别


最近正好用树模型,所以正好整理一下相关的示例代码,方便大家进行后面的修改。我这里使用的数据集是恶意软件的表格数据,下载地址为:

​​https://github.com/chihebchebbi/Mastering-Machine-Learning-for-Penetration-Testing/blob/master/Chapter03/MalwareData.csv.gz​​

下载完数据,解压放到data目录里面,然后使用随机森林的示例代码(lgb_demo.py)为:

# build the lightgbm model
import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import train_test_split
# view accuracy
from sklearn.metrics import accuracy_score

# pip install lightgbm

MalwareDataset = pd.read_csv('data/MalwareData.csv', sep='|')
Legit = MalwareDataset[0:41323].drop(['legitimate'], axis=1)
Malware = MalwareDataset[41323::].drop(['legitimate'], axis=1)

print('The Number of important features is %i \n' % Legit.shape[1])
Data = MalwareDataset.drop(['Name', 'md5', 'legitimate'], axis=1).values
Target = MalwareDataset['legitimate'].values


X_train, X_test, y_train, y_test = train_test_split(Data, Target ,test_size=0.2)

clf = lgb.LGBMClassifier()
clf.fit(X_train, y_train)

# predict the results
y_pred=clf.predict(X_test)


accuracy=accuracy_score(y_pred, y_test)
print('LightGBM Model accuracy score: {0:0.4f}'.format(accuracy_score(y_test, y_pred)))

from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

LightGBM的速度很快,下面是输出结果:

The Number of important features is 56 

LightGBM Model accuracy score: 0.9935
precision recall f1-score support

0 1.00 0.99 1.00 19296
1 0.99 0.99 0.99 8314

accuracy 0.99 27610
macro avg 0.99 0.99 0.99 27610
weighted avg 0.99 0.99 0.99 27610

准确率还挺高的。

参考文献

​​LightGBM Classifier in Python​​


举报

相关推荐

0 条评论