Geoffrey Hinton

(164 articles)

Convolution

Mathematical operation used in signal processing and image processing to combine two functions, resulting in a third function that represents how one function modifies the other.

Generality: 870

1900

Tensor

Multi-dimensional array used in mathematics and computer science, serving as a fundamental data structure in neural networks for representing data and parameters.

Generality: 920

1936

Parameterized

Model or function in AI that utilizes parameters to make predictions or decisions.

Generality: 796

1936

Loss Optimization

Process of adjusting a model's parameters to minimize the difference between the predicted outputs and the actual outputs, measured by a loss function.

Generality: 886

1939

Quantization

Process of reducing the precision of the weights and activations in neural network models to decrease their memory and computational requirements.

Generality: 673

1943

ANN
Artificial Neural Networks

Computing systems inspired by the biological neural networks that constitute animal brains, designed to progressively improve their performance on tasks by considering examples.

Generality: 875

1943

Neural Network

Computing system designed to simulate the way human brains analyze and process information, using a network of interconnected nodes that work together to solve specific problems.

Generality: 932

1943

Connectionist AI

Set of computational models in AI that simulate the human brain's network of neurons to process information and learn from data.

Generality: 900

1950

Next Word Prediction

Enables language models to predict the most probable subsequent word in a text sequence using generative AI techniques.

Generality: 780

1950

NLP
Natural Language Processing

Field of AI that focuses on the interaction between computers and humans through natural language.

Generality: 931

1950

Natural Language Problem

Challenges encountered in understanding, processing, or generating human language using computational methods.

Generality: 875

1952

Speech Processing

Technology that enables computers to recognize, interpret, and generate human speech.

Generality: 870

1952

ASR
Automatic Speech Recognition

Translates spoken language into written text, enabling computers to understand and process human speech.

Generality: 830

1956

Generalization

Ability of a ML model to perform well on new, unseen data that was not included in the training set.

Generality: 891

1956

Motor Learning

Process by which robots or AI systems acquire, refine, and optimize motor skills through experience and practice.

Generality: 675

1956

Supervision

Use of labeled data to train ML models, guiding the learning process by providing input-output pairs.

Generality: 890

1956

Training

Process of teaching a ML model to make accurate predictions or decisions, by adjusting its parameters based on data.

Generality: 940

1956

Human-Level AI

AI systems that can perform any intellectual task with the same proficiency as a human being.

Generality: 945

1956

AGI
Artificial General Intelligence

AI capable of understanding, learning, and applying knowledge across a wide range of tasks, matching or surpassing human intelligence.

Generality: 905

1958

Unsupervised Learning

Type of ML where algorithms learn patterns from untagged data, without any guidance on what outcomes to predict.

Generality: 905

1958

Artificial Neuron

Computational models inspired by biological neurons, serving as the foundational units of artificial neural networks to process input and output signals.

Generality: 825

1958

MCP neuron

Early computational model of a biological neuron forming the basis for artificial neural networks.

Generality: 500

1959

ML
Machine Learning

Development of algorithms and statistical models that enable computers to perform tasks without being explicitly programmed for each one.

Generality: 965

1959

Supervised Learning

ML approach where models are trained on labeled data to predict outcomes or classify data into categories.

Generality: 882

1960

Memory Systems

Mechanisms and structures designed to store, manage, and recall information, enabling machines to learn from past experiences and perform complex tasks.

Generality: 790

1960

Internal Representation

The way information is structured and stored within an AI system, enabling the system to process, reason, or make decisions.

Generality: 845

1960

Image Recognition

Ability of AI to identify objects, places, people, writing, and actions in images.

Generality: 854

1960

Linear Separability

The ability of a dataset to be perfectly separated into two classes using a straight line in two dimensions or a hyperplane in higher dimensions.

Generality: 500

1960

Pattern Recognition

The identification and classification of patterns in data using computational algorithms, essential for enabling machines to interpret, learn from, and make decisions based on complex datasets.

Generality: 825

1961

Feed Forward

Essential structure of an artificial neural network that directs data or information from the input layer towards the output layer without looping back.

Generality: 860

1962

Function Approximation

Method used in AI to estimate complex functions using simpler, computationally efficient models.

Generality: 810

1965

Invariance

Property of a model or algorithm that ensures its output remains unchanged when specific transformations are applied to the input data.

Generality: 830

1965

Inference

Process by which a trained neural network applies learned patterns to new, unseen data to make predictions or decisions.

Generality: 861

1969

Perceptron Convergence

A phenomena where a perceptron algorithm effectively stabilizes, ensuring that it can find a solution for linearly separable datasets after a finite number of iterations.

Generality: 500

1970

Regularization

Technique used in machine learning to reduce model overfitting by adding a penalty to the loss function based on the complexity of the model.

Generality: 845

1970

NLU
Natural Language Understanding

Subfield of NLP focused on enabling machines to understand and interpret human language in a way that is both meaningful and contextually relevant.

Generality: 894

1976

Overfitting

When a ML model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.

Generality: 890

1980

CNN
Convolutional Neural Network

Deep learning algorithm that can capture spatial hierarchies in data, particularly useful for image and video recognition tasks.

Generality: 916

1980

Local Weight Sharing

Technique where the same weights are used across different positions in an input, enhancing the network's ability to recognize patterns irrespective of their spatial location.

Generality: 690

1980

Statistical AI

Utilizes statistical methods to analyze data and make probabilistic inferences, aimed at emulating aspects of human intelligence through quantitative models.

Generality: 890

1980

Generative

Subset of AI technologies capable of generating new content, ideas, or data that mimic human-like outputs.

Generality: 840

1980

Speech-to-Text Model

A computational model designed to convert spoken language into written text using AI and linguistic pattern recognition.

Generality: 805

1980

Universal Learning Algorithms

Theoretical frameworks aimed at creating systems capable of learning any task to human-level competency, leveraging principles that could allow for generalization across diverse domains.

Generality: 840

1980

Learnability

Capacity of an algorithm or model to effectively learn from data, often measured by how well it can generalize from training data to unseen data.

Generality: 847

1980

Program Induction

A process in AI where computers generate, or 'induce', programs based on provided data and specific output criteria.

Generality: 785

1980

Dualism

Theory or concept that emphasizes the division between symbolic (classical) AI and sub-symbolic (connectionist) AI.

Generality: 830

1985

Boltzmann Machine

Stochastic recurrent neural network used to learn and represent complex probability distributions over binary variables.

Generality: 790

1985

EBM
Energy-Based Model

Class of deep learning models that learn to associate lower energy levels with more probable configurations of the input data.

Generality: 625

1986

RBMs
Restricted Boltzmann Machines

Type of generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.

Generality: 770

1986

Saturating Non-Linearities

Activation functions in neural networks that reach a point where their output changes very little, or not at all, in response to large input values.

Generality: 575

1986

MLP
Multilayer Perceptron

Type of artificial neural network comprised of multiple layers of neurons, with each layer fully connected to the next, commonly used for tasks involving classification and regression.

Generality: 775

1986

DL
Deep Learning

Subset of machine learning that involves neural networks with many layers, enabling the modeling of complex patterns in data.

Generality: 905

1986

Backpropagation

Algorithm used for training artificial neural networks, crucial for optimizing the weights to minimize error between predicted and actual outcomes.

Generality: 890

1986

Inductive Bias

Assumptions integrated into a learning algorithm to enable it to generalize from specific instances to broader patterns or concepts.

Generality: 827

1986

DNN
Deep Neural Networks

Advanced neural network architectures with multiple layers that enable complex pattern recognition and learning from large amounts of data.

Generality: 916

1986

Subsymbolic AI

AI approaches that do not use explicit symbolic representation of knowledge but instead rely on distributed, often neural network-based methods to process and learn from data.

Generality: 900

1986

RNN
Recurrent Neural Network

Class of neural networks where connections between nodes form a directed graph along a temporal sequence, enabling them to exhibit temporal dynamic behavior for a sequence of inputs.

Generality: 892

1986

Hidden Layer

Layer of neurons in an artificial neural network that processes inputs from the previous layer, transforming the data before passing it on to the next layer, without direct exposure to the input or output data.

Generality: 861

1986

Forward Propagation

Process in a neural network where input data is passed through layers of the network to generate output.

Generality: 830

1986

Feature Extraction

Process of transforming raw data into a set of features that are more meaningful and informative for a specific task, such as classification or prediction.

Generality: 880

1986

Function Approximator

Computational model used to estimate a target function that is generally complex or unknown, often applied in machine learning and control systems.

Generality: 806

1986

Weight Initialization

An essential process in neural network training that involves setting the initial values of the model's weights to influence learning effectiveness and convergence.

Generality: 675

1986

Prediction Error

The discrepancy between predicted outcomes by an AI model and the actual observed results in a dataset.

Generality: 675

1986

Node

A fundamental unit within a neural network or graph that processes inputs to produce outputs, often reflecting the biological concept of neurons.

Generality: 500

1986

Batch

A collection of data samples processed simultaneously in a single step of a neural network's training process.

Generality: 500

1987

Weight Decay

Regularization technique used in training neural networks to prevent overfitting by penalizing large weights.

Generality: 730

1987

Autoencoder

Type of artificial neural network used to learn efficient codings of unlabeled data, typically for the purpose of dimensionality reduction or feature learning.

Generality: 815

1990

Max Pooling

Downsampling technique that reduces the dimensionality of input data by selecting the maximum value from a specified subset of the data.

Generality: 695

1990

Speech-to-Speech Model

Systems that directly convert spoken language into another language through AI, enabling real-time translation and cross-lingual communication.

Generality: 809

1990

SotA
State of the Art

The highest level of performance achieved in a specific field, particularly in AI, where it denotes the most advanced model or algorithm.

Generality: 720

1990

Incremental Learning

A method where AI systems continuously acquire new data and knowledge while retaining previously learned information without retraining from scratch.

Generality: 750

1991

MoE
Mixture of Experts

ML architecture that utilizes multiple specialist models (experts) to handle different parts of the input space, coordinated by a gating mechanism that decides which expert to use for each input.

Generality: 705

1991

Meta-Learning

Learning to learn involves techniques that enable AI models to learn how to adapt quickly to new tasks with minimal data.

Generality: 858

1991

Catastrophic Forgetting

Phenomenon where a neural network forgets previously learned information upon learning new data.

Generality: 686

1993

MTL
Multi-Task Learning

ML approach where a single model is trained simultaneously on multiple related tasks, leveraging commonalities and differences across tasks to improve generalization.

Generality: 761

1995

Wake Sleep

Biologically inspired algorithm used within unsupervised learning to train deep belief networks.

Generality: 540

1995

Transfer Learning

ML method where a model developed for a task is reused as the starting point for a model on a second task, leveraging the knowledge gained from the first task to improve performance on the second.

Generality: 870

1995

Continuous Learning

Systems and models that learn incrementally from a stream of data, updating their knowledge without forgetting previous information.

Generality: 870

1996

Early Stopping

A regularization technique used to prevent overfitting in ML models by halting training when performance on a validation set begins to degrade.

Generality: 675

1997

Transfer Capability

A feature of AI systems that allows acquired knowledge in one domain or task to be applied to another distinct but related domain or task.

Generality: 775

2000

ReLU
Rectified Linear Unit

Activation function commonly used in neural networks which outputs the input directly if it is positive, otherwise, it outputs zero.

Generality: 855

2001

Classifier

ML model that categorizes data into predefined classes.

Generality: 861

2002

CD
Contrastive Divergence

Algorithm used to approximate the gradient of the log-likelihood for training probabilistic models.

Generality: 660

2003

Word Vector

Numerical representations of words that capture their meanings, relationships, and context within a language.

Generality: 690

2005

Narrow AI

Also known as Weak AI, refers to AI systems designed to perform a specific task or a narrow range of tasks with a high level of proficiency.

Generality: 760

2006

DBN
Deep Belief Network

A type of artificial neural network that is deeply structured with multiple layers of latent variables, or hidden units.

Generality: 851

2006

One-Shot Learning

ML technique where a model learns information about object categories from a single training example.

Generality: 542

2006

Model Compression

Techniques designed to reduce the size of a machine learning model without significantly sacrificing its accuracy.

Generality: 715

2006

Feature Learning

Automatically learning representations or features from raw input data in order to improve model performance and reduce dependency on manual feature engineering.

Generality: 500

2008

Denoising Autoencoder

A neural network designed to reconstruct a clean input from a corrupted version, enhancing feature extraction by learning robust data representations.

Generality: 806

2008

Sparse Autoencoder

Type of neural network designed to learn efficient data representations by enforcing sparsity on the hidden layer activations.

Generality: 625

2010

Initialization

Process of setting the initial values of the parameters (weights and biases) of a model before training begins.

Generality: 865

2011

Cognitive Computing

Computer systems that simulate human thought processes to solve complex problems.

Generality: 900

2012

AlexNet

Deep convolutional neural network that significantly advanced the field of computer vision by winning the ImageNet Large Scale Visual Recognition Challenge in 2012.

Generality: 610

2012

Dropout

Regularization technique used in neural networks to prevent overfitting by randomly omitting a subset of neurons during training.

Generality: 808

2012

Data Augmentation

Techniques used to increase the size and improve the quality of training datasets for machine learning models without collecting new data.

Generality: 830

2012

Data Efficient Learning

ML approach that requires fewer data to train a functional model.

Generality: 791

2012

Similarity Learning

A technique in AI focusing on training models to measure task-related similarity between data points.

Generality: 675

2012

Landmarks

Key points in an image used as reference for computer vision and AI systems to understand and manipulate visual data.

Generality: 500

2013

Pretrained Model

ML model that has been previously trained on a large dataset and can be fine-tuned or used as is for similar tasks or applications.

Generality: 860

2013

Latent Space

Abstract, multi-dimensional representation of data where similar items are mapped close together, commonly used in ML and AI models.

Generality: 805

2013

Embedding Space

Mathematical representation where high-dimensional vectors of data points, such as text, images, or other complex data types, are transformed into a lower-dimensional space that captures their essential properties.

Generality: 700

2013

Embedding

Representations of items, like words, sentences, or objects, in a continuous vector space, facilitating their quantitative comparison and manipulation by AI models.

Generality: 865

2013

Gradient Clipping

A technique used to mitigate the exploding gradient problem during the training of neural networks by capping gradients to a specified value range.

Generality: 625

2014

Model Layer

Discrete level in a neural network where specific computations or transformations are applied to the input data, progressively abstracting and refining the information as it moves through the network.

Generality: 805

2014

End-to-End Learning

ML approach where a system is trained to directly map input data to the desired output, minimizing the need for manual feature engineering.

Generality: 800

2014

Image-to-Text Model

AI systems that convert visual information from images into descriptive textual representations, enabling machines to understand and communicate the content of images.

Generality: 755

2014

Sequence Model

Model designed to process and predict sequences of data, such as time series, text, or biological sequences.

Generality: 830

2014

Sequence Prediction

Involves forecasting the next item(s) in a sequence based on the observed pattern of prior sequences.

Generality: 825

2014

Generative AI

Subset of AI technologies that can generate new content, ranging from text and images to music and code, based on learned patterns and data.

Generality: 830

2014

Discriminative AI

Algorithms that learn the boundary between classes of data, focusing on distinguishing between different outputs given an input.

Generality: 840

2014

Recognition Model

Element of AI that identifies patterns and features in data through learning processes.

Generality: 790

2014

Autoregressive Sequence Generator

A predictive model harnessed in AI tasks, particularly involving times series, which leverages its own prior outputs as inputs in subsequent predictions.

Generality: 650

2014

Sequential Models

Type of data models in AI where the arrangement of data points or events adhere to a specific order for predictive analysis and pattern recognition.

Generality: 815

2014

Convergence

The point at which an algorithm or learning process stabilizes, reaching a state where further iterations or data input do not significantly alter its outcome.

Generality: 845

2014

Overparameterized

ML model that has more parameters than the number of data points available for training.

Generality: 555

2014

Loss Landscape

The topographical representation of a neural network's loss function showcasing the variations in loss values across different parameter settings.

Generality: 500

2015

Model Distillation

ML technique where a larger, more complex model (teacher) is used to train a smaller, simpler model (student) to approximate the teacher's predictions while maintaining similar performance.

Generality: 625

2015

Teacher Model

Pre-trained, high-performing model that guides the training of a simpler, student model, often in the context of knowledge distillation.

Generality: 561

2015

Activation Data

Intermediate outputs produced by neurons in a neural network when processing input data, which are used to evaluate and update the network during training.

Generality: 575

2015

DQN
Deep Q-Networks

RL technique that combines Q-learning with deep neural networks to enable agents to learn how to make optimal decisions from high-dimensional sensory inputs.

Generality: 853

2015

Variance Scaling

Variance scaling is a technique used in machine learning to ensure weights of layers are initialized in a way that maintains consistent variance of activations throughout a neural network.

Generality: 525

2016

GLU
Gated Linear Unit

Neural network component that uses a gating mechanism to control information flow, improving model efficiency and performance.

Generality: 665

2016

Layer Normalization

Technique used in neural networks to normalize the inputs across the features within a layer, improving training stability and model performance, particularly in recurrent and transformer models.

Generality: 715

2016

Black Box Problem

The difficulty in understanding and interpreting how an AI system, particularly ML models, makes decisions.

Generality: 850

2016

Few Shot

ML technique designed to recognize patterns and make predictions based on a very limited amount of training data.

Generality: 675

2016

Federated Learning

ML approach enabling models to be trained across multiple decentralized devices or servers holding local data samples, without exchanging them.

Generality: 805

2016

Inference Acceleration

Methods and hardware optimizations employed to increase the speed and efficiency of the inference process in machine learning models, particularly neural networks.

Generality: 775

2016

Multimodal

AI systems or models that can process and understand information from multiple modalities, such as text, images, and sound.

Generality: 837

2016

Responsible AI

Application of AI in a manner that is transparent, unbiased, and respects user privacy and value.

Generality: 815

2016

Federated Training

Decentralized machine learning approach where multiple devices or nodes collaboratively train a shared model while keeping their data localized, rather than aggregating it centrally.

Generality: 805

2017

Capsule Networks

Type of artificial neural network designed to improve the processing of spatial hierarchical information by encoding data into small groups of neurons called capsules.

Generality: 660

2017

Expressive Hidden States

internal representations within a neural network that effectively capture and encode complex patterns and dependencies in the input data.

Generality: 695

2017

Neurosymbolic AI

Integration of neural networks with symbolic AI to create systems that can both understand and manipulate symbols in a manner similar to human cognitive processes.

Generality: 675

2017

Out of Distribution

Data that differs significantly from the training data used to train a machine learning model, leading to unreliable or inaccurate predictions.

Generality: 675

2017

Point-wise Feedforward Network

Neural network layer that applies a series of linear and non-linear transformations to each position (or

Generality: 625

2017

Hybrid AI

Combines symbolic AI (rule-based systems) and sub-symbolic AI (machine learning) approaches to leverage the strengths of both for more versatile and explainable AI systems.

Generality: 820

2017

Masking

Technique used in NLP models to prevent future input tokens from influencing the prediction of current tokens.

Generality: 639

2017

Ablation

Method where components of a neural network are systematically removed or altered to study their impact on the model's performance.

Generality: 650

2017

Zero-shot Capability

The ability of AI models to perform tasks or make predictions on new types of data that they have not encountered during training, without needing any example-specific fine-tuning.

Generality: 775

2018

SSL
Self-Supervised Learning

Type of ML where the system learns to predict part of its input from other parts, using its own data structure as supervision.

Generality: 815

2018

LLM
Large Language Model

Advanced AI systems trained on extensive datasets to understand, generate, and interpret human language.

Generality: 827

2018

Next Token Prediction

Technique used in language modeling where the model predicts the following token based on the previous ones.

Generality: 735

2018

xLSTM

Extended form of Long Short-Term Memory (LSTM), integrating enhancements for scalability and efficiency in DL models.

Generality: 675

2018

Base Model

Pre-trained AI model that serves as a starting point for further training or adaptation on specific tasks or datasets.

Generality: 790

2018

DLMs
Deep Language Models

Advanced ML models designed to understand, generate, and translate human language by leveraging DL techniques.

Generality: 874

2019

Self-Supervised Pretraining

ML approach where a model learns to predict parts of the input data from other parts without requiring labeled data, which is then fine-tuned on downstream tasks.

Generality: 725

2019

Adapter Layer

Neural network layer used to enable transfer learning by adding small, trainable modules to a pre-trained model, allowing it to adapt to new tasks with minimal additional training.

Generality: 625

2019

Continual Pre-Training

Process of incrementally training a pre-trained ML model on new data or tasks to update its knowledge without forgetting previously learned information.

Generality: 670

2019

Post-Training

Techniques and adjustments applied to neural networks after their initial training phase to enhance performance, efficiency, or adaptability to new data or tasks.

Generality: 650

2020

AMI
Advanced Machine Intelligence

Refers to high-level AI systems possessing the capability to perform complex cognitive tasks with or without human-like reasoning.

Generality: 873

2020

Scaling Laws

Mathematical relationships that describe how the performance of machine learning models, particularly deep learning models, improves as their size, the amount of data, or computational resources increases.

Generality: 835

2020

1-N Systems

Architectures where one input or controller manages multiple outputs or agents, applicable in fields like neural networks and robotics.

Generality: 790

2020

Scaling Hypothesis

Enlarging model size, data, and computational resources can consistently improve task performance up to very large scales.

Generality: 765

2020

Generative Model

A type of AI model that learns to generate new data instances that mimic the training data distribution.

Generality: 840

2021

Parametric Knowledge

Information and patterns encoded within the parameters of a machine learning model, which are learned during the training process.

Generality: 849

2021

Instruction Following Model

AI system designed to execute tasks based on specific commands or instructions provided by users.

Generality: 640

2021

Instruction-Following

Ability to accurately understand and execute tasks based on given directives.

Generality: 725

2021

Transformative AI

AI systems capable of bringing about profound, large-scale changes in society, potentially altering the economy, governance, and even human life itself.

Generality: 825

2021

Self-Correction

An AI system's ability to recognize and rectify its own mistakes or errors without external intervention.

Generality: 815

2021

VLM
Visual Language Model

AI models designed to interpret and generate content by integrating visual and textual information, enabling them to perform tasks like image captioning, visual question answering, and more.

Generality: 621

2021

Foundation Model

Type of large-scale pre-trained model that can be adapted to a wide range of tasks without needing to be trained from scratch each time.

Generality: 835

2021

MLLMs
Multimodal Large Language Models

Advanced AI systems capable of understanding and generating information across different forms of data, such as text, images, and audio.

Generality: 625

2023

LVLMs
Large Vision Language Models

Advanced AI systems designed to integrate and interpret both visual and textual data, enabling more sophisticated understanding and generation based on both modalities.

Generality: 675

Geoffrey Hinton

Convolution

Tensor

Parameterized

Loss Optimization

Quantization

ANNArtificial Neural Networks

Neural Network

Connectionist AI

Next Word Prediction

NLPNatural Language Processing

Natural Language Problem

Speech Processing

ASRAutomatic Speech Recognition

Generalization

Motor Learning

Supervision

Training

Human-Level AI

AGIArtificial General Intelligence

Unsupervised Learning

Artificial Neuron

MCP neuron

MLMachine Learning

Supervised Learning

Memory Systems

Internal Representation

Image Recognition

Linear Separability

Pattern Recognition

Feed Forward

Function Approximation

Invariance

Inference

Perceptron Convergence

Regularization

NLUNatural Language Understanding

Overfitting

CNNConvolutional Neural Network

Local Weight Sharing

Statistical AI

Generative

Speech-to-Text Model

Universal Learning Algorithms

Learnability

Program Induction

Dualism

Boltzmann Machine

EBMEnergy-Based Model

RBMsRestricted Boltzmann Machines

Saturating Non-Linearities

MLPMultilayer Perceptron

DLDeep Learning

Backpropagation

Inductive Bias

DNNDeep Neural Networks

Subsymbolic AI

RNNRecurrent Neural Network

Hidden Layer

Forward Propagation

Feature Extraction

Function Approximator

Weight Initialization

Prediction Error

Node

Batch

Weight Decay

Autoencoder

Max Pooling

Speech-to-Speech Model

SotAState of the Art

Incremental Learning

MoEMixture of Experts

Meta-Learning

Catastrophic Forgetting

MTLMulti-Task Learning

Wake Sleep

Transfer Learning

Continuous Learning

Early Stopping

ANN
Artificial Neural Networks

NLP
Natural Language Processing

ASR
Automatic Speech Recognition

AGI
Artificial General Intelligence

ML
Machine Learning

NLU
Natural Language Understanding

CNN
Convolutional Neural Network

EBM
Energy-Based Model

RBMs
Restricted Boltzmann Machines

MLP
Multilayer Perceptron

DL
Deep Learning

DNN
Deep Neural Networks

RNN
Recurrent Neural Network

SotA
State of the Art

MoE
Mixture of Experts

MTL
Multi-Task Learning

ReLU
Rectified Linear Unit

CD
Contrastive Divergence

DBN
Deep Belief Network

DQN
Deep Q-Networks

GLU
Gated Linear Unit

SSL
Self-Supervised Learning

LLM
Large Language Model

DLMs
Deep Language Models