Full Stack Deep Learning
  • Full Stack Deep Learning
  • Course Content
    • Setting up Machine Learning Projects
      • Overview
      • Lifecycle
      • Prioritizing
      • Archetypes
      • Metrics
      • Baselines
    • Infrastructure and Tooling
      • Overview
      • Software Engineering
      • Computing and GPUs
      • Resource Management
      • Frameworks and Distributed Training
      • Experiment Management
      • Hyperparameter Tuning
      • All-in-one Solutions
    • Data Management
      • Overview
      • Sources
      • Labeling
      • Storage
      • Versioning
      • Processing
    • Machine Learning Teams
      • Overview
      • Roles
      • Team Structure
      • Managing Projects
      • Hiring
    • Training and Debugging
      • Overview
      • Start Simple
      • Debug
      • Evaluate
      • Improve
      • Tune
      • Conclusion
    • Testing and Deployment
      • Project Structure
      • ML Test Score
      • CI / Testing
      • Docker
      • Web Deployment
      • Monitoring
      • Hardware/Mobile
    • Research Areas
    • Labs
    • Where to go next
  • Guest Lectures
    • Xavier Amatriain (Curai)
    • Chip Huyen (Snorkel)
    • Lukas Biewald (Weights & Biases)
    • Jeremy Howard (Fast.ai)
    • Richard Socher (Salesforce)
    • Raquel Urtasun (Uber ATG)
    • Yangqing Jia (Alibaba)
    • Andrej Karpathy (Tesla)
    • Jai Ranganathan (KeepTruckin)
    • Franziska Bell (Toyota Research)
  • Corporate Training and Certification
    • Corporate Training
    • Certification
Powered by GitBook
On this page
  • Why Unified Multi-Task Models for NLP?
  • The 3 Major NLP Task Categories
  • A Multi-Task Question Answering Network for decaNLP
  • decaNLP: A Benchmark for Generalized NLP

Was this helpful?

  1. Guest Lectures

Richard Socher (Salesforce)

Richard is Chief Scientist at Salesforce, which he joined through acquisition of his startup Metamind. Previously, Richard was a professor in the Stanford CS department.

PreviousJeremy Howard (Fast.ai)NextRaquel Urtasun (Uber ATG)

Last updated 4 years ago

Was this helpful?

Why Unified Multi-Task Models for NLP?

  • Multi-task learning is a blocker for general NLLP systems.

  • Unified models can decide how to transfer knowledge (domain adaptation, weight sharing, transfer learning, and zero-shot learning).

  • Unified AND multi-task models can:

    • More easily adapt to new tasks.

    • Make deploying to production X times simpler.

    • Lower the bar for more people to solve new tasks.

    • Potentially move towards continual learning.

The 3 Major NLP Task Categories

  1. Sequence tagging: named entity recognition, aspect specific sentiment.

  2. Text classification: dialogue state tracking, sentiment classification.

  3. Sequence-to-sequence: machine translation, summarization, question answering.

⇒ They correspond to the 3 equivalent super-tasks of NLP: Language Modeling, Question Answering, and Dialogue.

A Multi-Task Question Answering Network for decaNLP

Methodology

  • Start with a context.

  • Ask a question.

  • Generate the answer one word at a time by:

    • Pointing to context.

    • Pointing to question.

    • Or choosing a word from an external vocabulary.

  • Pointer Switch is choosing between those three options for each output word.

Architecture Design

decaNLP: A Benchmark for Generalized NLP

  • Train a single question answering model for multiple NLP tasks (aka questions).

  • Framework for tackling:

    • More general language understanding.

    • Multi-task learning.

    • Domain adaptation.

    • Transfer learning.

    • Weight-sharing, pre-training, fine-tuning.

    • Zero-shot learning.

decaNLP - A Benchmark for Generalized NLP
Multi-Task Question Answering Network Architecture