Unless you’re living under a rock, you’ve probably noticed Artificial Intelligence (AI) is popping up more and more in technology talks and business strategies. I’ve even noticed among my friends an increased interest in “cognifying” their applications.

It’s easy to see why. Everyone is aware of the autonomous car revolution, and, if you are in the loop, you know it’s due largely to the advancements in AI, particularly “Machine Learning,” which is a strategy used to implement AI in software applications or robots running software. More on this later.

Let’s first step back and discuss AI What does it mean for a machine to possess artificial intelligence?

At its core, it’s a simple idea. Artificial Intelligence is the broader concept of machines carrying out tasks in a way that we consider “smart.”


Machine Learning is a current application of AI based upon the idea that we should give machines access to data and let them learn for themselves.

Some have heard of the term “AI” used before and even have some experience with AI applications as far back as the 90’s and 2000’s. Most people’s familiarity with AI is thanks to gaming. It is common to play an AI in a video game when you don’t have another person to play with. Other AI applications with which users are familiar include tools like spell checkers and other helpful systems that seem partially smart in helping humans complete a task using well-defined rules. However, many of these “older AIs” were developed and implemented in what we call an “Expert System”. This means that to codify the intelligence into the program or software, we would need an expert, such as a linguistic expert when talking about spell check or a medical expert when talking about systems that help doctors diagnose patients. This type of AI system was very popular in the 80’s and 90’s.

Unfortunately, there was a problem with these types of AI expert systems. As anyone who has used an AI that was implemented with an expert system approach can attest, these systems constantly made mistakes when dealing with uncommon scenarios or in situations where even the expert was not well versed. The AI was useless in these situations, and fixing these systems called for reprograming them with new expert information. Another drawback to expert systems is that they are very costly to build. It requires finding for each particular domain an expert who can articulate to programmers the intricacies of their field and when and why any given decision should be made. These types of decisions are hard to codify when writing a deterministic algorithm (a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states).

For these reasons, artificial intelligence researchers needed to invent a better way to give “smarts” to a machine.

This is where Machine Learning (ML) comes into play. Many people are surprised to learn that ML is actually a relatively old topic when it comes to AI research. Researchers understood back in the 80s that expert systems were never going to create an AI that would be capable of driving our cars or beating the best humans in chess or Jeopardy. That is because the parameters of problems such as these are too varied, change over time, and have many different weights based on different states of the applications. Moreover, there are many attributes to a problem that cannot be directly observed and thus cannot be directly programmed into the application logic.

Machine Learning addresses this problem by developing a program capable of learning and decision making based upon accessible data, similar to human cognition, instead of programming the machine to perform a deterministic task. So, instead of making a deterministic decision, the program relies on probability and a probability threshold to decide if it “knows” or “doesn’t know” something. This is how the human brain makes decisions in the complex world in which we live.

For example, if you see a flower that looks like a rose and you say, “Yep that is a rose,” what you are really saying is, “Yep, based on my previous knowledge (data) on what I believe a rose to look like and what I’ve been told a rose looks like, I am 97 percent sure that this is a rose.”

On the other hand, if you saw an iris flower, and you haven’t seen many before or don’t have many representations in your memory as to what an iris looks like, you might say, “I’m not sure what that flower is.” Your “confidence interval” is below the threshold of what you believe is acceptable for identifying the flower as an iris (let’s call that 30 percent sure that it was an iris flower.)

Making decisions based on probability is what our brains do best. If we can model a computer program in the same conceptual way, then the implications for AI is potentially unlimited!

Okay, so the question you should ask now is, “If we knew all this in the 70s and 80s then why is ML just now becoming popular?”

My answer to that is analogous to the scientific community just now verifying the existence of gravitational waves that Einstein proposed in theoretical physics so many years ago. We just didn’t have the machinery or tools to validate or invalidate his theory, it was ahead of its time and its practical use had to wait.

Even though we’ve understood the mathematical model around machine learning for many years, the infrastructure, technology, and data needed to make machine learning a reality (as opposed to a theory) were not available in the 70s and 80s. However, that is no longer the case, and the “AI Winter” may soon be over.

Three basic “raw materials” are necessary to create a practical machine learning application. Let’s talk about them.

1. Cheap Parallel Computation

Thinking is not a synchronous process. At its core, thinking is just pattern recognition. We are constantly seeing, feeling, hearing, tasting, or smelling patterns that activate different clusters of neurons. Millions of these “pattern recognizing neurons” communicate low-level patterns to the next neural network. This process continues until we reach the highest conceptual layers and most abstract patterns.

Each neuron can be thought of as an individual pattern recognition unit, and based on the input it gets from other neurons (recognized patterns) we can eventually make high-level decisions. We would like to model computer programs in a similar way. Modeling computer programs in a way that resembles the biological structure of the brain is called an “artificial neural network.” How fitting!

Parallelization of this type of data processing is obviously vital to building a system that simulates human thought. That type of power was not available in the 80’s and just recently became cheap enough for it to be practical in machine learning solutions.

Why now?

One word. GPUs.

Okay, maybe that’s three words? Graphical Processing Units (GPUs) became popular because of their ability to process high graphics for video games, consoles, and then even cell phones. Graphics processing is inherently parallel, and these GPUs were architected in a way to take advantage of this type of computing. As GPUs became popular, they also became cheaper because companies competed against each to drive down prices of GPUs. It didn’t take long to realize that GPUs might solve the computation problem that had been stumping researchers for decades. This could give them the necessary parallel computation that is required to build an artificial neural network. They were right, and this lowering of costs of GPUs enabled companies to buy massive amounts of them and use them in building machine learning platforms. This will greatly accelerate what we are able to build with highly parallelized neural networks and the amount of data we are able to process. Speaking of data…

2. Big Data

Big Data this, Big Data that…

I know everyone has heard all of the hype about big data. What does it really mean, though? Why is it here now? Where was it before? How big is “big”??

Well, the truth of the matter is that when we talk about big data, we’re really saying that we’re capturing, processing, and generating more data every year and this data is growing exponentially.

The reason this is a big deal is that in order to train an artificial brain to learn for itself, you need a MASSIVE amount of data. The amount of visual data alone that a baby takes in and processes each year is more data than data centers had in the 80s. That’s not even enough to train machines, though. That’s because we don’t want to wait a year to learn elementary vision! To train computers in artificial vision we need more data than some people can absorb in a lifetime. That has finally become possible because storing and recording so much data is cheap, fast, and everywhere and generated by everything!

The amount of data on our smartphone holds more data than most giant computers systems in the 80s. Data and memory to store that it has grown to epic proportions and has no indication of slowing down anytime soon. This data is crucial for the implementation of smart machines as it takes many instances of a problem to infer a probabilistically correct solution. Big data is the knowledge base from which these computers need to learn. All the knowledge to which an AI has access is the result of us collecting and feeding the AI more and more data and letting the machine learn from the underlying patterns in the data.

The power of AI comes when computers start recognizing patterns never seen to human practitioners. The machine will understand the data and recognize patterns in that data the same way our neurons begin to recognize certain patterns to problems that we have seen before. The advantage the machine has over us is the electronic signaling through circuitry, which is much faster than our biological chemical signaling over synapses in our brain. Without big data, our machines would have nothing from which to learn. The larger the data set, the smarter the AI will become and the quicker the machines will learn!

3. Better/Deep Algorithms

As I have alluded before, researchers invented artificial neural nets in the 1950s, but the problem was that even if they had the computing power, they still didn’t have efficient algorithms to process these neural nets. There were just too many astronomically huge combinatorial relationships between a million—or a hundred million— neurons. Recently that has all changed. Breakthroughs in the algorithms involved in this process have lead to new types of artificial networks. Layered Networks.

For example, take the relatively simple task of recognizing that a face is a face. When a group of bits in a neural net is found to trigger a pattern—the image of an eye, for instance—that result (“It’s an eye!”) is moved up to another level in the neural net for further parsing. The next level might group two eyes together and pass that meaningful chunk on to another level of hierarchical structure that associates it with the pattern of a nose. It can take many millions of these nodes (each one producing a calculation feeding others around it), stacked up many levels high, to recognize a human face. In 2006, Geoff Hinton, then at the University of Toronto, made a key tweak to this method, which he dubbed “deep learning.” He was able to mathematically optimize results from each layer so that the learning accumulated faster as it proceeded up the stack of layers. Deep learning algorithms accelerated enormously a few years later when they were ported to GPUs. The code of deep learning alone is insufficient to generate complex logical thinking, but it is an essential component of all current AIs, including IBM’s Watson; DeepMind, Google’s search engine; and Facebook’s algorithms.

This perfect storm of cheap parallel computation, bigger data, and deeper algorithms generated the 60-years-in-the-making overnight success of AI. And this convergence suggests that as long as these technological trends continue—and there’s no reason to think they won’t—AI will keep improving.

Thanks for reading, and if you liked this please subscribe to my blog at JasonRoell.com or follow me on LinkedIn where I post about technology topics that I think are interesting for the general programmer or even technology enthusiast to know.

Also, I would like to thank Kevin Kelly for my inspiration for this post. I highly recommend picking up his book “The Inevitable” in which he discusses these and many more processes in much more detail.


The Inevitable: Understanding the 12 Technological Forces that will Shape our Future. -Kevin Kelly 2016

How to Create a Mind – Ray Kurzweil