Skip to content

刘知远大模型

Basic tasks of NLP

  • Part of speech
  • Named entity recoginition
  • Co-reference
  • Basic dependencies

Word representation

Compute word similarity Infer word relation

Problems of synonym/hypernym representation

  • Missing nuance
  • Missing new meanings of words
  • Subjective
  • Data sparsity
  • Requires human labor to create and adapt

  • One-hot representation

  • Count-based representation

  • Increase in size with vocabulary

  • Require a lot of storage
  • Sparsity issues for those less frequent words
  • Subsequent classification models will be less robust
  • Word embedding
  • Distributed representation

Model pre-training: large-scale unlabled data Model fine-tuning: task-specific training data

  • Sigmoid: 二分类
  • Softmax: 多分类
  • Cross-entropy: 正确分类的-log的概率
  • Stochastic gradient descent

Sub-sampling: rare words can be more likely to carry information, according to which, sub-sampling weights to more distant words

Soft sliding window

  • RNN: 更长的文本,梯度消失
  • Gated Recurrent Unit: 是否丢弃当前状态
  • LSTM: cell state extra vector for capturing long-term dependency
  • CNN: 局部模式,better parallelization within sentences
  • Transformer

Byte Pair Encoding

Input encoding

  • Feature-based approaches: word2vec
  • Fine-tuning approaches: bert/gpt

  • preplexity: lower preplexity -> 文本流畅

  • zero-shot

BERT: masked LM, gap between pre-training and fine-tuning, efficiency of masked language model