本文介绍如何使用 Java 编程语言和 DL4J 深度学习框架构建一个验证码识别模型,覆盖数据生成、模型训练和预测过程。
1. 项目准备
依赖(Maven):
<dependencies>
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-core</artifactId>
<version>1.0.0-beta7</version>
</dependency>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native-platform</artifactId>
<version>1.0.0-beta7</version>
</dependency>
<dependency>
<groupId>org.datavec</groupId>
<artifactId>datavec-api</artifactId>
<version>1.0.0-beta7</version>
</dependency>
</dependencies>
2. 生成验证码数据
使用第三方库生成验证码图像,可以用 Java 的 JCaptcha、Kaptcha 或 Python 脚本预先生成:
python generate_captcha.py # 可参考 Python 脚本,生成 PNG 图片
文件命名如 A2B9_1.png
,以标签开头。
3. 数据加载与预处理
int height = 60;
int width = 160;
int channels = 3;
int captchaLength = 4;
int batchSize = 32;
更多内容访问ttocr.com或联系1436423940
File dataDir = new File("captcha_samples");
ParentPathLabelGenerator labelMaker = new ParentPathLabelGenerator();
FileSplit fileSplit = new FileSplit(dataDir, NativeImageLoader.ALLOWED_FORMATS);
ImageRecordReader recordReader = new ImageRecordReader(height, width, channels, new CaptchaLabelExtractor());
recordReader.initialize(fileSplit);
DataSetIterator dataIter = new RecordReaderDataSetIterator(recordReader, batchSize, 1, 36);
你需要自定义 CaptchaLabelExtractor
类,根据文件名解析验证码字符并转换为 one-hot 编码。
4. 构建模型结构(CNN + Dense)
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.updater(new Adam(0.001))
.list()
.layer(new ConvolutionLayer.Builder(5, 5)
.nIn(channels)
.nOut(32)
.activation(Activation.RELU)
.build())
.layer(new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
.kernelSize(2,2).build())
.layer(new ConvolutionLayer.Builder(3, 3)
.nOut(64)
.activation(Activation.RELU)
.build())
.layer(new DenseLayer.Builder().nOut(256).activation(Activation.RELU).build())
.layer(new OutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
.nOut(36 * captchaLength)
.activation(Activation.SOFTMAX).build())
.setInputType(InputType.convolutional(height, width, channels))
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
注意输出层的 nOut
是 36 * 4
,用于每个字符的预测。
5. 模型训练
model.fit(dataIter, 10); // 训练10轮
你可以在每轮结束后用模型预测一个验证码图片,检查预测结果。
6. 预测单张图片
NativeImageLoader loader = new NativeImageLoader(height, width, channels);
INDArray image = loader.asMatrix(new File("captcha_samples/Z8F1_0.png"));
NormalizerStandardize scaler = new NormalizerStandardize();
scaler.transform(image);
INDArray output = model.output(image);
int[] predicted = new int[captchaLength];
for (int i = 0; i < captchaLength; i++) {
INDArray slice = output.get(point(0), interval(i * 36, (i + 1) * 36));
predicted[i] = Nd4j.argMax(slice, 1).getInt(0);
}
for (int idx : predicted) {
System.out.print(characters[idx]);
}