The Top 100 “AI” Terms Every Developer Needs to Know

If you’re like me, you probably have a hard time keeping up with all the new buzzwords and acronyms that are popping up in the world of technology. Machine learning, artificial intelligence, deep learning, neural networks, natural language processing… the list goes on and on. But don’t worry, you’re not alone. In fact, according to a recent survey, only 17% of Americans can correctly define what artificial intelligence is. And that’s a problem.

Why? Because AI is not just some futuristic concept that only nerds and sci-fi fans care about. It’s a reality that is transforming every industry and every aspect of our lives. Whether you realize it or not, you’re already using AI every day. When you ask Siri or Alexa a question, when you scroll through your Facebook or Instagram feed, when you shop online or watch Netflix, when you use Google Maps or Uber, you’re interacting with AI. And that’s just the tip of the iceberg.

AI is also behind some of the most important innovations and breakthroughs of our time. It’s helping doctors diagnose diseases, farmers grow crops, teachers educate students, lawyers review contracts, artists create music, and scientists discover new planets. It’s also helping us tackle some of the biggest challenges facing humanity, such as climate change, poverty, hunger, and pandemics.

So what does this mean for you? It means that if you want to succeed in the new economy of AI, you need to familiarize yourself with the basic terminology and concepts of machine learning and artificial intelligence. You don’t need to become an expert programmer or machine learning engineer, but you do need to understand what AI can and cannot do, how it works, and how it affects you and your career.

That’s why I’ve created this blog post: to give you a quick and easy introduction to the most essential terms and concepts of machine learning and artificial intelligence. By the end of this post, you’ll be able talk confidently about AI developments and techniques with your newfound knowledge and confidence. You’ll also be able to spot the opportunities and challenges that AI presents for your industry and profession. And most importantly, you’ll be able to make informed decisions about how to leverage AI for your own benefit and growth.

So let’s get started!

The List

I’ve hand picked these as the most important and most relevant at this point in time and ones that are more general than specific to certain areas of machine learning. I may choose to update this list as it (undoubtably) changes. If I’ve missed any you believe should be included, please leave the term and short definition in the comments and we’ll all be smarter from it!

I’ve tried to keep the definitions very “short and sweet” (there are entire books written on each of them), but I encourage you to dive deeper yourself if any of these catch your interest.

Algorithm: A set of rules or instructions followed by the machine learning model to learn patterns in data.
Artificial Intelligence (AI): The broad discipline of creating intelligent machines.
Backpropagation: A method used in artificial neural networks to calculate the gradient that is needed in the calculation of the weights to be used in the network.
Bias: The simplifying assumptions made by the model to make the target function easier to approximate.
Big Data: Large amounts of data that traditional data processing software can’t manage.
Binary Classification: A type of classification task where each input sample is classified into one of two possible categories.
Boosting: A machine learning ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning.
Categorical Data: Data that can be divided into multiple categories but having no order or priority.
Classification: A type of machine learning model that outputs one of a finite set of labels.
Clustering: The task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups.
Convolutional Neural Network (CNN): A type of artificial neural network that uses convolutional layers to filter inputs for useful information.
Cross-Validation: A resampling procedure used to evaluate machine learning models on a limited data sample.
Data Mining: The process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Data Preprocessing: The process of converting raw data into a well-readable format to be used by a machine learning model.
Dataset: A collection of related sets of information composed of separate elements but can be manipulated as a unit by a computer.
Deep Learning: A subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.
Decision Trees: A decision support tool that uses a tree-like model of decisions and their possible consequences.
Dimensionality Reduction: The process of reducing the number of random variables under consideration by obtaining a set of principal variables.
Ensemble Learning: A machine learning paradigm where multiple models are trained to solve the same problem and combined to get better results.
Epoch: One complete pass through the entire training dataset while training a machine learning model.
Feature: An individual measurable property of a phenomenon being observed.
Feature Engineering: The process of using domain knowledge to extract features from raw data via data mining techniques.
Feature Extraction: The process of reducing the number of resources required to describe a large set of data.
Feature Selection: The process of selecting a subset of relevant features for use in model construction.
Gradient Descent: An optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient.
Hyperparameter: A parameter whose value is set before the learning process begins.
Imbalanced Data: A situation where the number of observations is not the same for the categories in a classification problem.
K-Nearest NeNeighbors (K-NN): A simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems.
Kernel: A function used in machine learning to transform data into a certain form.
Label: The final output you get in the output layer of a neural network.
Latent Variable: Variables in a statistical model that are not directly observed but are inferred or estimated from other variables that are observed.
Linear Regression: A statistical method for predicting a real-valued output based on one or more input features.
Logistic Regression: A classification algorithm used to predict a binary outcome based on a set of independent variables.
Loss Function: A method of evaluating how well a specific algorithm models the given data.
Machine Learning (ML): The scientific study of algorithms and statistical models that computer systems use to perform tasks without explicit instructions.
Multi-Class Classification: A classification task with more than two classes.
Naive Bayes: A classification technique based on the Bayes’ Theorem with an assumption of independence among predictors.
Natural Language Processing (NLP): A field of AI that gives the machines the ability to read, understand, and derive meaning from human languages.
Neural Network: A series of algorithms that endeavors to recognize underlying relationships in a set of data.
Normalization: Adjusting values measured on different scales to a common scale.
Outlier: A data point that differs significantly from other similar points.
Overfitting: A modeling error which occurs when a function is too closely fit to a limited set of data points.
Parameter: An internal characteristic or property of a model that the learning algorithm uses to make predictions.
Perceptron: The simplest form of a neural network, used for binary classification.
Precision: The number of True Positives divided by the number of True Positives and False Positives. It is a measure of a classifier’s exactness.
Principal Component Analysis (PCA): A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables.
Random Forest: An ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time.
Recall: The number of True Positives divided by the number of True Positives and the number of False Negatives. It is a measure of a classifier’s completeness.
Regression: A set of statistical processes for estimating the relationships among variables.
Reinforcement Learning (RL): An area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
Regularization: A technique used to prevent overfitting by adding an additional penalty to the loss function.
ReLu (Rectified Linear Unit): A commonly used activation function in neural networks and deep learning models.
RNN (Recurrent Neural Network): A type of artificial neural network designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or the spoken word.
Semi-Supervised Learning: Machine learning techniques that involve training using a small amount of labeled data and a large amount of unlabeled data.
SGD (Stochastic Gradient Descent): A simple and very efficient approach to fitting linear classifiers and regressors under convex loss functions such as (linear) Support Vector Machines and Logistic Regression.
Supervised Learning: A type ofmachine learning model that makes predictions based on a set of labeled examples.
Support Vector Machine (SVM): A type of machine learning model used for classification and regression analysis.
TensorFlow: An open-source software library for machine learning and artificial intelligence.
Time Series Analysis: Techniques used to analyze time series data in order to extract meaningful statistics and other characteristics of the data.
Transfer Learning: A machine learning method where a pre-trained model is used as the starting point for a different but related problem.
Underfitting: A modeling error which occurs when a function is too loosely fit to the data.
Unsupervised Learning: A type of machine learning model that makes predictions based on a set of unlabeled examples.
Validation Set: A subset of the data set aside to adjust a model’s hyperparameters or to guide model selection.
Variable: Any characteristic, number, or quantity that can be measured or counted.
Weights: The parameters in a model that the machine learning algorithm learned.
XGBoost: An open-source software library which provides a gradient boosting framework for C++, Java, Python, R, and Julia.
Zero-Shot Learning: A machine learning concept where a model is able to predict classes that were not seen during training.
Autoencoder: A type of artificial neural network used for learning efficient codings of input data.
Batch Normalization: A technique for improving the performance and stability of artificial neural networks.
Bias-Variance Tradeoff: The property of a model that the variance of the parameter estimates across samples can be reduced by increasing the bias in the estimated parameters.
GAN (Generative Adversarial Network): An algorithmic architecture used in unsupervised learning, particularly to generate synthetic instances of data that can pass for real data.
Genetic Algorithm: A method for solving both constrained and unconstrained optimization problems that is based on natural selection, the process that drives biological evolution.
Grid Search: An approach to parameter tuning that will methodically build and evaluate a model for each combination of algorithm parameters specified in a grid.
Imputation: The process of replacing missing data with substituted values.
LSTM (Long Short-Term Memory): A type of recurrent neural network capable of learning order dependence in sequence prediction problems.
Multilayer Perceptron (MLP): A class of feedforward artificial neural network.
One-Hot Encoding: A process of converting categorical data variables so they can be provided to machine learning algorithms to improve predictions.
Overfitting: A modeling error which occurs when a function is too closely fit to a limited set of data points.

Polynomial Regression: A type of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial.
Quantum Machine Learning: The interdisciplinary area combining quantum physics and machine learning.
Q-Learning: A reinforcement learning technique used to find the optimal action-selection policy using a q function.
Regular Expression (RegEx): A sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.
Reinforcement Learning: An area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
Sequential Model: A type of model used in machine learning which consists of a linear stack of layers.
Softmax Function: A function that takes an N-dimensional vector of real numbers and transforms it into a vector of real number in range (0,1) which add up to 1.
State-Action-Reward-State-Action (SARSA): An algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning.
T-distributed Stochastic Neighbor Embedding (t-SNE): A machine learning algorithm for visualization based on Stochastic Neighbor Embedding originally developed by Geoffrey Hinton and his students.
Univariate Analysis: The simplest form of analyzing data. “Uni” means “one”, so in other words, your data has only one variable.
Variance: A statistical measurement of the spread between numbers in a data set.
Word2Vec: A group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words.
Yann LeCun: A computer scientist with contributions to convolutional neural networks and other areas of machine learning and computational neuroscience.
Z-score: The number of standard deviations by which the value of a raw score is above or below the mean value of what is being observed or measured.
One-shot Learning: The object categorization problem when only one single training example is given.
Manifold Learning: A class of unsupervised estimators for non-linear dimensionality reduction.
Denoising Autoencoder: A type of autoencoder, which is designed to remove noise from data.
Curse of Dimensionality: A term that is used to describe the difficulty of training models on data with high dimensionality (large number of features).
Collaborative Filtering: A technique used by some recommendation systems. In collaborative filtering, algorithms are used to make automatic predictions about the interests of a user by collecting preferences from many users.
Multi-task Learning: A type of machine learning where multiple learning tasks are solved at the same time while exploiting commonalities and differences across tasks.
Perceptual Hashing (pHash): A technique to convert multimedia content (images, text, video) into a manageable hash value.
Generative Model: A type of machine learning model that generates new data that is similar to the training data.

That should get you started! If you liked this, subscribe to get the latest content on AI and Engineering! Cheers!

The Curious Programmer