Skip to content

Yiping's Notes

刘知远大模型

刘知远大模型

Basic tasks of NLP

Part of speech
Named entity recoginition
Co-reference
Basic dependencies

Word representation

Compute word similarity Infer word relation

Problems of synonym/hypernym representation

Missing nuance
Missing new meanings of words
Subjective
Data sparsity
Requires human labor to create and adapt
One-hot representation
Count-based representation
Increase in size with vocabulary
Require a lot of storage
Sparsity issues for those less frequent words
Subsequent classification models will be less robust
Word embedding
Distributed representation

Model pre-training: large-scale unlabled data Model fine-tuning: task-specific training data

Sigmoid: 二分类
Softmax: 多分类
Cross-entropy: 正确分类的-log的概率
Stochastic gradient descent

Sub-sampling: rare words can be more likely to carry information, according to which, sub-sampling weights to more distant words

Soft sliding window

RNN: 更长的文本，梯度消失
Gated Recurrent Unit: 是否丢弃当前状态
LSTM: cell state extra vector for capturing long-term dependency
CNN: 局部模式，better parallelization within sentences
Transformer

Byte Pair Encoding

Input encoding

Feature-based approaches: word2vec
Fine-tuning approaches: bert/gpt
preplexity: lower preplexity -> 文本流畅
zero-shot

BERT: masked LM, gap between pre-training and fine-tuning, efficiency of masked language model