刘知远大模型
Basic tasks of NLP
- Part of speech
- Named entity recoginition
- Co-reference
- Basic dependencies
Word representation
Compute word similarity Infer word relation
Problems of synonym/hypernym representation
- Missing nuance
- Missing new meanings of words
- Subjective
- Data sparsity
-
Requires human labor to create and adapt
-
One-hot representation
-
Count-based representation
-
Increase in size with vocabulary
- Require a lot of storage
- Sparsity issues for those less frequent words
- Subsequent classification models will be less robust
- Word embedding
- Distributed representation
Model pre-training: large-scale unlabled data Model fine-tuning: task-specific training data
- Sigmoid: 二分类
- Softmax: 多分类
- Cross-entropy: 正确分类的-log的概率
- Stochastic gradient descent
Sub-sampling: rare words can be more likely to carry information, according to which, sub-sampling weights to more distant words
Soft sliding window
- RNN: 更长的文本,梯度消失
- Gated Recurrent Unit: 是否丢弃当前状态
- LSTM: cell state extra vector for capturing long-term dependency
- CNN: 局部模式,better parallelization within sentences
- Transformer
Byte Pair Encoding
Input encoding
- Feature-based approaches: word2vec
-
Fine-tuning approaches: bert/gpt
-
preplexity: lower preplexity -> 文本流畅
- zero-shot
BERT: masked LM, gap between pre-training and fine-tuning, efficiency of masked language model