0
点赞
收藏
分享

微信扫一扫

历史最全深度学习与NLP核心概念、模型、策略及最新论文整理分享


历史最全深度学习与NLP核心概念、模型、策略及最新论文整理分享_自然语言处理

    本资源整理了与自然语言处理(NLP)相关的深度学习技术核心概念,以及2019年概念相关最新的论文,涉及算法优化(Adam,Adagrad、AMS、Mini-batch SGD等),参数初始化(Glorot initialization、 He initialization),模型约束(Dropout、 Word Dropout、Patience、Weight Decay等),归一化,损失函数类型,网络训练方法,激活函数选择,CNN、RNN网络结构等核心概念。

    核心概念连个方面:1、梳理了深度学习、NLP相关技术核心概念;2、整理了这些概念相关最新论文。非常值得推荐。

    资源整理自网络,源地址:https://github.com/neulab/nn4nlp-concepts/blob/master/concepts.md

    带论文链接资源下载地址:

    链接: https://pan.baidu.com/s/1lC8DiPJnyzbxtvns-HXr_w     

    提取码: yv6g

参数优化/学习

    优化器与优化策略

    •Mini-batch SGD: optim-sgd

    •Adam: optim-adam (implies optim-sgd)

    •Adagrad: optim-adagrad (implies optim-sgd)

    •Adadelta: optim-adadelta (implies optim-sgd)

    •Adam with Specialized Transformer Learning Rate ("Noam" Schedule): optim-noam (implies optim-adam)

    •SGD with Momentum: optim-momentum (implies optim-sgd)

    •AMS: optim-amsgrad (implies optim-sgd)

    •Projection / Projected Gradient Descent: optim-projection (implies optim-sgd)

    参数初始化

    •Glorot/Xavier Initialization: init-glorot

    •He Initialization: init-he

    参数约束策略

    •Dropout: reg-dropout

    •Word Dropout: reg-worddropout (implies reg-dropout)

    •Norm (L1/L2) Regularization: reg-norm

    •Early Stopping: reg-stopping

    •Patience: reg-patience (implies reg-stopping)

    •Weight Decay: reg-decay

    •Label Smoothing: reg-labelsmooth

    归一化策略

    •Layer Normalization: norm-layer

    •Batch Normalization: norm-batch

    •Gradient Clipping: norm-gradient

    损失函数

    •Canonical Correlation Analysis (CCA): loss-cca

    •Singular Value Decomposition (SVD): loss-svd

    •Margin-based Loss Functions: loss-margin

    •Contrastive Loss: loss-cons

    •Noise Contrastive Estimation (NCE): loss-nce (implies loss-cons)

    •Triplet Loss: loss-triplet (implies loss-cons)

    训练方法

    •Multi-task Learning (MTL): train-mtl

    •Multi-lingual Learning (MLL): train-mll (implies train-mtl)

    •Transfer Learning: train-transfer

    •Active Learning: train-active

    •Data Augmentation: train-augment

    •Curriculum Learning: train-curriculum

    •Parallel Training: train-parallel

序列模型结构

    激活函数

    •Hyperbolic Tangent (tanh): activ-tanh

    •Rectified Linear Units (RelU): activ-relu

    池化操作

    •Max Pooling: pool-max

    •Mean Pooling: pool-mean

    •k-Max Pooling: pool-kmax

    循环结构

    •Recurrent Neural Network (RNN): arch-rnn

    •Bi-directional Recurrent Neural Network (Bi-RNN): arch-birnn (implies arch-rnn)

    •Long Short-term Memory (LSTM): arch-lstm (implies arch-rnn)

    •Bi-directional Long Short-term Memory (LSTM): arch-bilstm (implies arch-birnn, arch-lstm)

    •Gated Recurrent Units (GRU): arch-gru (implies arch-rnn)

    •Bi-directional Gated Recurrent Units (GRU): arch-bigru (implies arch-birnn, arch-gru)

    其他序列化/结构化结构

    •Bag-of-words, Bag-of-embeddings, Continuous Bag-of-words (BOW): arch-bow

    •Convolutional Neural Networks (CNN): arch-cnn

    •Attention: arch-att

    •Self Attention: arch-selfatt (implies arch-att)

    •Recursive Neural Network (RecNN): arch-recnn

    •Tree-structured Long Short-term Memory (TreeLSTM): arch-treelstm (implies arch-recnn)

    •Graph Neural Network (GNN): arch-gnn

    •Graph Convolutional Neural Network (GCNN): arch-gcnn (implies arch-gnn)

    结构优化技巧

    •Residual Connections (ResNet): arch-residual

    •Gating Connections, Highway Connections: arch-gating

    •Memory: arch-memo

    •Copy Mechanism: arch-copy

    •Bilinear, Biaffine Models: arch-bilinear

    •Coverage Vectors/Penalties: arch-coverage

    •Subword Units: arch-subword

    •Energy-based, Globally-normalized Mdels: arch-energy

    标准复合结构

    •Transformer: arch-transformer (implies arch-selfatt, arch-residual, arch-layernorm, optim-noam)

模型组合

    •Ensembling: comb-ensemble

寻优搜索算法

    •Greedy Search: search-greedy

    •Beam Search: search-beam

    •A* Search: search-astar

    •Viterbi Algorithm: search-viterbi

    •Ancestral Sampling: search-sampling

    •Gumbel Max: search-gumbel (implies search-sampling)

预测任务

    •Text Classification (text -> label): task-textclass

    •Text Pair Classification (two texts -> label: task-textpair

    •Sequence Labeling (text -> one label per token): task-seqlab

    •Extractive Summarization (text -> subset of text): task-extractive (implies text-seqlab)

    •Span Labeling (text -> labels on spans): task-spanlab

    •Language Modeling (predict probability of text): task-lm

    •Conditioned Language Modeling (some input -> text): task-condlm (implies task-lm)

    •Sequence-to-sequence Tasks (text -> text, including MT): task-seq2seq (implies task-condlm)

    •Cloze-style Prediction, Masked Language Modeling (right and left context -> word): task-cloze

    •Context Prediction (as in word2vec) (word -> right and left context): task-context

    •Relation Prediction (text -> graph of relations between words, including dependency parsing): task-relation

    •Tree Prediction (text -> tree, including syntactic and some semantic semantic parsing): task-tree

    •Graph Prediction (text -> graph not necessarily between nodes): task-graph

    •Lexicon Induction/Embedding Alignment (text/embeddings -> bi- or multi-lingual lexicon): task-lexicon

    •Word Alignment (parallel text -> alignment between words): task-alignment

预训练向量融合策略

    •word2vec: pre-word2vec (implies arch-cbow, task-cloze, task-context)

    •fasttext: pre-fasttext (implies arch-cbow, arch-subword, task-cloze, task-context)

    •GloVe: pre-glove

    •Paragraph Vector (ParaVec): pre-paravec

    •Skip-thought: pre-skipthought (implies arch-lstm, task-seq2seq)

    •ELMo: pre-elmo (implies arch-bilstm, task-lm)

    •BERT: pre-bert (implies arch-transformer, task-cloze, task-textpair)

    •Universal Sentence Encoder (USE): pre-use (implies arch-transformer, task-seq2seq)

结构化模型/算法

    •Hidden Markov Models (HMM): struct-hmm

    •Conditional Random Fields (CRF): struct-crf

    •Context-free Grammar (CFG): struct-cfg

    •Combinatorial Categorical Grammar (CCG): struct-ccg

不可导函数训练方法

    •Complete Enumeration: nondif-enum

    •Straight-through Estimator: nondif-straightthrough

    •Gumbel Softmax: nondif-gumbelsoftmax

    •Minimum Risk Training: nondif-minrisk

    •REINFORCE: nondif-reinforce

对抗方法

    •Generative Adversarial Networks (GAN): adv-gan

    •Adversarial Feature Learning: adv-feat

    •Adversarial Examples: adv-examp

    •Adversarial Training: adv-train (implies adv-examp)

隐变量模型

    •Variational Auto-encoder (VAE): latent-vae

    •Topic Model: latent-topic

元学习

    •Meta-learning Initialization: meta-init

    •Meta-learning Optimizers: meta-optim

    •Meta-learning Loss functions: meta-loss

    •Neural Architecture Search: meta-arch

举报

相关推荐

0 条评论