Interest in artificial intelligence is reaching new heights. 2016 was a record year for AI startups and funding, and 2017 will certainly surpass it, if it hasn’t already. According to IDC, spending on cognitive systems and AI will rise more than 750% by 2020. Both interest and investment in AI spans the full spectrum of the business and technology landscape, from the smallest startups to the largest corporations, from smartphone apps to public health safety systems. The biggest names in technology are all investing heavily in AI, while baking it into their business models and using it increasingly in their offerings: virtual assistants (e.g. Siri), computer vision, speech recognition, language translation and dozens of other applications. But what is the actual IT behind AI? In short: artificial neural networks (ANN). Here we take a look at what they are, how they work, and how they relate to the biological neural networks that inspired them.
Defining an Artificial Neural Network
The term artificial neural network is used either to refer to a mathematical model or to an actual program that mimics the essential computational features found in the neural networks of the brain.
Although biological neurons are extremely complicated cells, their essential computational nature in terms of inputs and outputs is relatively straightforward. Each neuron has multiple dendrites and a single axon. The neuron receives its inputs from its dendrites and transmits its output through its axon. Both inputs and outputs take the form of electrical impulses. The neuron sums up its inputs, and if the total electrical impulse strength exceeds the neuron’s firing threshold, the neuron fires off a new impulse along its single axon. The axon, in turn, distributes the signal along its branching synapses which collectively reach thousands of neighboring neurons.
Biological vs Artificial Neurons
There are a few basic similarities between neurons and transistors. They both serve as the basic unit of information processing in their respective domains; they both have inputs and outputs; and they both connect with their neighbors. However, there are drastic differences between neurons and transistors as well. Transistors are simple logic gates generally connected to no more than three other transistors. Neurons, by contrast, are highly complex organic structures connected to roughly 10,000 other neurons. Naturally, this rich network of connections gives neurons an enormous advantage over transistors when it comes to performing cognitive feats that require thousands of parallel connections. For decades, engineers and developers have envisioned ways to capitalize on this advantage by making computers and applications operate more like brains. Finally, their ideas have made their way into the mainstream. Although transistors themselves will not look like neurons anytime soon, some of the AI software they run can now mimic basic neural processing, and it’s only getting more sophisticated.
Modeling the Neuron
The perceptron, or single-layer neural network, is the simplest model of neural computation, and is the ideal starting point to build upon. You can think of a perceptron as a single neuron. However, rather than having dendrites, the perceptron simply has inputs: x1, x2, x3,…,xN. Moreover, rather than having an axon, the perceptron simply has a single output: y = f(x).
Each of the perceptron’s inputs (x1, x2, x3,…,xN) has a weight (w1,w2,w3,…,wN). If a particular weight is less than 1, it will weaken its input. If it’s greater than 1, it will amplify it. In a slightly more complex, but widely-adopted, model of the perceptron, there is also an Input 1, with fixed weight b, which is called the bias and serves as the target value used for training the perceptron.
The activation function
Also called a transfer function, the activation function determines the value of the perceptron’s output. The simplest form of activation function is a certain type of step function. It mimics the biological neuron firing upon reaching its firing threshold by outputting a 1 if the total input exceeds a given threshold quantity, and outputting a 0 otherwise. However, for a more realistic result, one needs to use a non-linear activation function. One of the most commonly used is the sigmoid function:
f(x)= 1 / (1+e^-x)
There are many variations on this basic formula that are in common use. However, all sigmoid functions will adopt some form of S-curve when plotted on a graph. When the inputs are zero, the output is zero. As the input values become positive, however, the output initially increases (roughly) exponentially, but eventually maxes out at a fixed value represented by a horizontal asymptote. This maximum output value reflects the maximum electrical impulse strength that a biological neuron can generate.
Adding Hidden Layers
In more complex, realistic neural models, there are at least three layers of units: an input layer, an output layer, and one or more hidden layers. The input layer receives the raw data from the external world that the system is trying to interpret, understanding, perceive, learn, remember, recognize, translate and so on. The output layer, by contrast, transmits the network’s final, processed response. The hidden layers that reside between the input and the output layers, however, serve as the key to the machine learning that drives the most advanced artificial neural networks.
Most modeling assumes that the respective layers are fully connected. In a fully connected neural network, all the units in one layer are connected to all the units in their neighboring layers.
You can think of backpropagation as the process in neural networks that allow the network to learn. During backpropagation the network is in a continual process of training, learning, adjusting and fine-tuning itself until it gets closer to the intended output. Backpropagation optimizes by comparing the intended output to the actual output using a loss function. The result is an error value, or cost, which backpropagation uses to re-calibrate the networks weights between neurons (to find the most relevant features and inputs that result in the desired output), usually with the help of the well-known gradient descent optimization algorithm. If there are hidden layers, then the algorithm re-calibrates the weights of all the hidden connections as well. After each round of re-calibration, the system runs again. As error rates get smaller, each round of re-calibration becomes more refined. This process may need to repeat thousands of times before the output of the backpropagation network closely matches the intended output. At this point, one can say that the network is fully trained.
Thinking It Through
With AI investment and development reaching new heights, this is an exciting time for AI enthusiasts and aspiring developers. However, it’s important to first take a good look at the IT behind AI: artificial neural networks (ANN). These computational models mimic the essential computational features found in biological neural networks. Neurons become perceptrons or simply units; dendrites become inputs; axons become outputs; electrical impulse strengths become connection weights; the neuron’s firing strength function becomes the unit’s activation function; and layers of neurons become layers of fully-connected units. Putting it all together, you can run your fully-trained feed-forward network as-is, or you can train and optimize your backpropagation network to reach a desired value. Soon you’ll be well on your way to your first image recognizer, natural language processor, or whatever new AI app you dream up.
Thanks for reading, and if you liked this please share this post or subscribe to my blog at JasonRoell.com or follow me on LinkedIn where I post about technology topics that I think are interesting for the general programmer or even technology enthusiast to know.
Have a great day and keep on learning!