[lightGBM]使用LightGBM来实现恶意软件识别-CFANZ编程社区

[lightGBM]使用LightGBM来实现恶意软件识别

最近正好用树模型，所以正好整理一下相关的示例代码，方便大家进行后面的修改。我这里使用的数据集是恶意软件的表格数据，下载地址为：

https://github.com/chihebchebbi/Mastering-Machine-Learning-for-Penetration-Testing/blob/master/Chapter03/MalwareData.csv.gz

下载完数据，解压放到data目录里面，然后使用随机森林的示例代码（lgb_demo.py）为：

# build the lightgbm model
import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import train_test_split
# view accuracy
from sklearn.metrics import accuracy_score

# pip install lightgbm

MalwareDataset = pd.read_csv('data/MalwareData.csv', sep='|')
Legit = MalwareDataset[0:41323].drop(['legitimate'], axis=1)
Malware = MalwareDataset[41323::].drop(['legitimate'], axis=1)

print('The Number of important features is %i \n' % Legit.shape[1])
Data = MalwareDataset.drop(['Name', 'md5', 'legitimate'], axis=1).values
Target = MalwareDataset['legitimate'].values


X_train, X_test, y_train, y_test = train_test_split(Data, Target ,test_size=0.2)

clf = lgb.LGBMClassifier()
clf.fit(X_train, y_train)

# predict the results
y_pred=clf.predict(X_test)


accuracy=accuracy_score(y_pred, y_test)
print('LightGBM Model accuracy score: {0:0.4f}'.format(accuracy_score(y_test, y_pred)))

from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

LightGBM的速度很快，下面是输出结果：

The Number of important features is 56 

LightGBM Model accuracy score: 0.9935
              precision    recall  f1-score   support

           0       1.00      0.99      1.00     19296
           1       0.99      0.99      0.99      8314

    accuracy                           0.99     27610
   macro avg       0.99      0.99      0.99     27610
weighted avg       0.99      0.99      0.99     27610

准确率还挺高的。

参考文献

LightGBM Classifier in Python

0 条评论