The Curious Programmer

Software, Gadgets, Books, and All Things Geek

🔮 Unveiling the Hidden Magic: How a URL Transforms into a Webpage 🧙 — March 17, 2023

🔮 Unveiling the Hidden Magic: How a URL Transforms into a Webpage 🧙

Hello, dear readers! Today I’m going to explain one of the most basic and fascinating concepts of the web world: what happens when you navigate to a URL in a browser and hit “Enter”. You probably do this every day without thinking much about it, but behind the scenes there is a lot of magic going on. Let’s dive into it!

What is a URL?

First of all, let’s clarify what a URL is. URL stands for Uniform Resource Locator and it is basically an address that tells your browser where to find the information you want on the Internet. A URL has different parts that have different meanings. For example:

https://example.com/page1

In this URL, the first part https tells your browser which protocol to use for communication. A protocol is a set of rules that define how data is exchanged over the network. There are different protocols for different purposes, such as http, https, ftp, etc. In this case, https means that the communication will be secure and encrypted.

The second part example.com is called the domain name and it identifies the server that hosts the website you want to visit. A server is a powerful computer that stores web files and responds to requests from browsers. Each server has a unique address called an IP address that consists of four numbers separated by dots, such as 203.0.113.0. However, these numbers are hard to remember and type, so we use domain names instead.

The third part /page1 is called the path and it specifies which page or resource on the website you want to access. A website can have multiple pages or resources such as images, videos, scripts, etc., each with its own path.

What happens when you hit “Enter”?

Now that we know what a URL is made of, let’s see what happens when you hit “Enter” after typing it in your browser.

Step 1: DNS lookup

The first thing your browser does is to look up the IP address of the domain name using a service called DNS (Domain Name System). DNS is like a phone book for the Internet that maps domain names to IP addresses. Your browser contacts a DNS server (usually provided by your Internet Service Provider) and asks for the IP address of example.com. The DNS server responds with something like 203.0.113.0.

Step 2: TCP connection

The next thing your browser does is to establish a TCP (Transmission Control Protocol) connection with the server at 203.0.113.0. TCP is another protocol that ensures reliable and ordered delivery of data over the network. Your browser initiates a three-way handshake with the server:

  • Your browser sends a SYN (synchronize) packet to the server asking for permission to start communication.
  • The server replies with a SYN-ACK (synchronize-acknowledge) packet granting permission.
  • Your browser sends an ACK (acknowledge) packet back confirming receipt.

This way, both your browser and the server agree on some parameters such as port numbers and sequence numbers for data transmission.

Step 3: HTTPS handshake

If your URL starts with https, then your browser also performs an HTTPS (Hypertext Transfer Protocol Secure) handshake with
the server before sending any data. HTTPS adds another layer of security on top of TCP by encrypting all data using SSL/TLS (Secure Sockets Layer/Transport Layer Security) protocols.

Your browser initiates an HTTPS handshake with these steps:

  • Your browser sends a ClientHello message to the server indicating its supported SSL/TLS versions and cipher suites (encryption algorithms).
  • The server replies with a ServerHello message choosing one SSL/TLS version and cipher suite from those offered by your browser.
  • The server also sends its digital certificate signed by a trusted Certificate Authority (CA) proving its identity.
  • Your browser verifies the certificate against its list of trusted CAs and checks if it matches with example.com.
  • If everything checks out, your browser generates a random symmetric key for encryption and sends it to
    the server encrypted with its public key.
  • The server decrypts this key using its private key and sends back an encrypted Finished message indicating readiness.
  • Your browser decrypts this message using its symmetric key and sends back another encrypted Finished message confirming completion.

This way both your browser and
the server agree on an encryption key for secure communication.

Step 4: HTTP request

Now that your browser has established both TCP
and HTTPS connections, it’s time to send the actual HTTP (Hypertext Transfer Protocol) request to the server. The request contains the following information:

  • The HTTP method (usually GET for retrieving data or POST for submitting data)
  • The path of the resource you want to access (/page1)
  • The HTTP version (usually HTTP/1.1 or HTTP/2)
  • Additional headers that provide more information about your browser, the type of content it accepts, cookies, etc.

Here’s an example of an HTTP GET request:
`

GET /page1 HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8

Step 5: Server processing

Upon receiving the HTTP request, the server processes it and generates an appropriate response. This may involve querying databases, executing server-side scripts, or fetching static files, depending on the requested resource. Once the server has prepared the response, it sends it back to your browser over the established TCP and HTTPS connections.

Step 6: HTTP response

The server’s response is also an HTTP message with the following information:

  • The HTTP version (e.g., HTTP/1.1 or HTTP/2)
  • The status code indicating the result of the request (e.g., 200 OK for success, 404 Not Found for a missing resource)
  • Additional headers providing more information about the server, the content type, the content length, etc.
  • The actual content (HTML, images, videos, etc.) of the requested resource

Here’s an example of an HTTP 200 OK response:

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Content-Length: 12345

<!DOCTYPE html>
<html>
<head>
<title>Page 1</title>
...

Step 7: Rendering the page

Now that your browser has received the response, it starts parsing the HTML content and rendering the page on your screen. This involves several sub-steps:

  1. The browser builds the DOM (Document Object Model), a tree-like structure representing the HTML elements and their hierarchy.
  2. The browser retrieves and applies CSS (Cascading Style Sheets) rules to style the DOM elements.
  3. The browser executes JavaScript code (if any) that may manipulate the DOM, fetch additional resources, or provide interactivity.
  4. The browser calculates the layout and position of each DOM element based on the CSS rules and the available screen space.
  5. The browser paints the final representation of the page on your screen, including images, videos, and other media elements.

Step 8: Closing the connection

Once the page is fully rendered, your browser may close the TCP and HTTPS connections to the server, unless you have enabled HTTP Keep-Alive or are using HTTP/2 multiplexing features that allow multiple requests and responses to share the same connection.

And that’s it! You’ve now seen the intricate dance of all the underlying events that occur every time you request more cat memes from a site…

Well, sort of 😬

In actuality, this is still a relatively high level view of all of the technologies, events, and protocols that make the internet possible. We didn’t cover a lot of interesting details that allow for each of the topics explained above to even be possible. But this is a blog post… not a book!

If you are interested in learning all of those fascinating details in a fun and engaging “comic book” style type of writing, then check out this book on How the Internet Really Works, which I highly recommend:

How the Internet Really Works

internet

It’s amazing how much is going on behind the scenes, and understanding these details can help you appreciate the marvel of modern web technologies. So the next time you visit a website, remember the intricate ballet of protocols, connections, and data transfers that make it all possible.

12 Most Influential Books Every Software Engineer Needs to Read — March 16, 2015

12 Most Influential Books Every Software Engineer Needs to Read

This is a question that I get a lot, especially from co-workers or friends that are just beginning their journey as a software craftsman.

What book should I read to become a better developer? Do I need to read books?

I think it’s a great question, and it is one that I asked many of my mentors as I was becoming a software engineer. The problem was that many people suggested different books on different topics. All the books they suggested were great in their own right, but no one was able to give me a list that would be the ESSENTIAL books, the MUST READS, that any engineer with hopes of being great should most certainly read.

Well, I’ve learned a lot from my mentors and realized that I still had a lot to learn with the many different books that were suggested to me. I decided to develop a routine to read one book a month in my profession field (software engineering). Over the years, I’ve aggregated a list that, I believe, to be MUST READS for anyone that wants to be a top tier developer.

Now let me state the obvious – just reading all of these books on the list will not make you a great developer. That will come with years of experience and applying the principles in these books into real practices and developing your problem-solving skills in the real world.

However, reading these books will help you avoid the major pitfalls and mistakes that many developers make early off in their careers. I wish that someone would have told me about these books just starting out, but I was lucky enough to have found and read them over the years. You might have read some of these books in college for your computer science or engineering classes. Maybe at the time, you didn’t think they were important, but I can say first hand that I’ve used and applied many principles from each and every one of these books.

Let me also point out that this is not an exhaustive list. Many great books come out every year. These are just the ones that have had the biggest impact on myself and my career. Also, these are mostly language agnostic, and can be applied using any of the many software languages.  (I will do another post with the best books targeted at certain technology platforms and stacks)

Well, let’s get to it then! (drum roll, please)

THE LIST

(All these are essential, but I put them in descending order from which ones had the biggest impact on me. I have also provided the link (click on the book cover) to where you can purchase the book on Amazon if interested. Read the reviews and decide for yourself!)

download (3)  12. Working Effectively with Legacy Code

I love this book because almost every software developer, at some point in their career, has to support and work with a legacy system. In this book, Michael Feathers offers start-to-finish strategies for working more effectively with large, untested legacy code bases. This book draws on material Michael created for his renowned Object Mentor seminars: techniques Michael has used in mentoring to help hundreds of developers, technical managers, and testers bring their legacy systems under control.

51WIpM70FEL._SL160_  11. The Mythical Man-Month

This book is a classic, but recently revised and corrected. The amazing thing is how relevant the book still is to software product development. If you are involved in software, this book is a must-read. The most valuable part of the book, I believe, is the “plan to throw out” prototype chapter. While the goal is always to make a bigger, better, fast whatever, it is almost an axiom that you WILL build something that has to be discarded and reworked. This happens every time, I can tell you from first-hand experience. Therefore, it is vital to plan to throw out so you can migrate your users to whatever will follow. If you dream that the first product is THE ONE, you risk abandoning them on a product that will inevitably evolve. Planning to throw-away also helps meet the schedule goals by setting reasonable milestones that can be obtained.

download (2)  10. Design Patterns

If you are planning to be an architect or designer of a system, you will most likely be required to read this book. Hailed as one of the greatest software development books ever written, this book goes into great detail on the many different design patterns that have been developed over the years to help software engineers avoid and handle common problems that the industry faces. Following the strategies in this book will allow you to build higher quality, flexible, and maintainable software. This book also goes by the name “Gang of Four” in software groups because of its famous four authors that put this book together.

 pp2e  9. Programming Pearls (2nd Edition)

This book is slightly different from the other books on the list. I would say this book helps a person “think like a programmer”. Programming Pearls is a compendium of 15 columns previously published in Communications of the ACM. The columns cover a broad range of topics related to programming: from requirements gathering to performance tuning. The focus is primarily on coding techniques and algorithms.

Each column has been reorganized as a chapter. Chapters usually start with the presentation of a practical problem. Then various solutions are presented and are used as lessons to be learned. The writing style is clear and fun.

Programming Pearls is not a usual book teaching new programming concepts. Although it contains good and sometimes quite novel ideas, the aim of the book is not to teach something new but to help you become a better problem solver.

31GBgcA5PML._SL160_   8. CODE: The Hidden Language of Computer Hardware and Software

This book cleared up a lot of the “Magic” that goes into creating and developing complex systems. There are so many abstractions these days that the low-level details are sometimes unknown to the developer. Though you may not find yourself using this book 24/7 in practice…I believe it is a good idea to have an understanding of what you are building on top of and how the whole orchestration works. It may come in handy when you need to open up that “Black Box” and deep dive into the software or hardware to fix a pesky bug. “CODE: The Hidden Language of Computer Hardware and Software” by Charles Petzold deals with a number of programming concepts starting from number systems – decimal, octal, binary to high-level languages. The book explains packet based communication protocols and TCP. Many chapters are about hardware concepts, and five chapters are devoted to software and teach about the operating system, floating point arithmetic, and GUIs.

41Jon2rS8nL._SL160_  7.The Art of Computer Programming

This is another classic. This was written by the famous computer scientist Professor Donald Knuth and is highly praised by many of the top programmers in the industry. Even Bill Gates is quoted saying

If you think you’re a really good programmer… read [Knuth’s] Art of Computer Programming… You should definitely send me a resume if you can read the whole thing.”

The book begins with basic programming concepts and techniques, then focuses more particularly on information structures–the representation of information inside a computer, the structural relationships between data elements and how to deal with them efficiently. Elementary applications are given to simulation, numerical methods, symbolic computing, software, and system design.

51T4YZ3HieL._SL160_  6. Refactoring

“Refactoring” by Martin Fowler is about improving the design of existing code. It is the process of changing a software system in such a way that it does not alter the external behavior of the code, yet improves its internal structure. With refactoring, you can even take a bad design and rework it into a good one. This book offers a thorough discussion of the principles of refactoring, including where to spot opportunities for refactoring, and how to set up the required tests. There is also a catalog of more than 40 proven refactorings with details as to when and why to use the refactoring, step by step instructions for implementing it, and an example illustrating how it works The book is written using Java as its principal language, but the ideas apply to any OO language.

41znMZniZ1L._SL160_  5. Clean Code

“Clean Code,” written by Robert C. Martin, is divided into three parts. The first describes the principles, patterns, and practices of writing clean code. The second part consists of several case studies of increasing complexity. Each case study is an exercise in cleaning up code—of transforming a code base that has some problems into one that is sound and efficient. The third part is the payoff: a single chapter containing a list of heuristics and “smells” gathered while creating the case studies. The result is a knowledge base that describes the way we think when we write, read, and clean code.

41kXXE4mAKL._SL160_  4. Introduction to Algorithms

This has to be the single best book for understanding and using algorithms (which you will be doing a lot of in software development). Some books on algorithms are rigorous but incomplete; others cover masses of material but lack rigor. Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.The first edition became a widely used text in universities worldwide as well as the standard reference for professionals. The second edition featured new chapters on the role of algorithms, probabilistic analysis, and randomized algorithms, and linear programming.

41AJV8G0ZTL._SL160_  3. Structure and Interpretation of Computer Programs

With an analytical and rigorous approach to problem solving and programming techniques, this book is oriented toward engineering. Structure and Interpretation of Computer Programs emphasizes the central role played by different approaches to dealing with time in computational models. Its unique approach makes it appropriate for an introduction to computer science courses, as well as programming languages and program design. The book further explains the four best-known paradigms of programming languages – imperative, object-oriented, logic based and applicative programming.

41HXiIojloL._SL160_  2. Pragmatic Programmer

This was one of the first programming books I read. I had a friend recommend it to me in my first professional job. I’m glad he did. Though the book was written in 1999 (I believe), the concepts are the basis of how we go about developing a complex system in a practical manner. Programmers are craftspeople trained to use a certain set of tools (editors, object managers, version trackers) to generate a certain kind of product (programs) that will operate in some environment (operating systems on hardware assemblies). Like any other craft, computer programming has spawned a body of wisdom, most of which isn’t taught at universities or in certification classes. Most programmers arrive at the so-called tricks of the trade over time, through independent experimentation. In The Pragmatic Programmer, Andrew Hunt and David Thomas codify many of the truths they’ve discovered during their respective careers as designers of software and writers of code.

Some of the authors’ nuggets of pragmatism are concrete, and the path to their implementation is clear. They advise readers to learn one text editor, for example, and use it for everything. They also recommend the use of version-tracking software for even the smallest projects and promote the merits of learning regular expression syntax and a text-manipulation language. Other (perhaps more valuable) advice in the book is more light-hearted. In the debugging section, it is noted that “if you see hoof prints think horses, not zebras.” That is, suspect everything, but start looking for problems in the most prominent places. There are recommendations for making estimates of time and expense, and for integrating testing into the development process. You’ll want a copy of The Pragmatic Programmer for two reasons: it displays your own accumulated wisdom more clearly than you ever bothered to state it, and it introduces you to methods of work that you may not yet have considered.

51nWkLCu1SL._SL160_  1. Code Complete 2

And this is it! The number one book (IMHO) to read if you are going to be a great software engineer. Widely considered one of the best practical guides to programming, Steve McConnell’s original CODE COMPLETE has been helping developers write better software for more than a decade. Now this classic book has been fully updated and revised with leading-edge practices—and hundreds of new code samples—illustrating the art and science of software construction. Capturing the body of knowledge available from research, academia, and everyday commercial practice, McConnell synthesizes the most effective techniques and must-know principles into clear, pragmatic guidance. No matter what your experience level, development environment, or project size, this book will inform and stimulate your thinking—and help you build the highest quality code.

Discover the timeless techniques and strategies that help you:

  • Design for minimum complexity and maximum creativity
  • Reap the benefits of collaborative development
  • Apply defensive programming techniques to reduce and flush out errors
  • Exploit opportunities to refactor—or evolve—code, and do it safely
  • Use construction practices that are right-weight for your project
  • Debug problems quickly and effectively
  • Resolve critical construction issues early and correctly
  • Build quality into the beginning, middle, and end of your project

Well, that’s it for now!

Let me know in the comments if you have read any of these or have any other must-reads for software developers!

If you have enjoyed this post, the biggest compliment you could give would be to share this with someone that you think would enjoy it!

Additionally, if you never want to miss a post, subscribe to this blog by clicking the follow button in the bottom right corner! Thanks for reading, have a great day, and never stop learning!

Quantum Computing and AI Tie the Knot — April 13, 2018

Quantum Computing and AI Tie the Knot

In 2018, quantum technicians and daring developers are using quantum algorithms to transform the field of artificial neural network optimization: the bees knees of machine learning and AI. So we can say with some confidence that thanks to quantum algorithms, the future of quantum computing and artificial intelligence are hopelessly entangled. So let’s take a deep dive into the quantum algorithms that are making waves in the digital age. I’ll be paying special attention to quantum annealing (rhymes with feeling), a unique animal that seems to thrive in an AI-rich area where classical algorithms often struggle or altogether fail: training artificial neural networks.

Trouble training your neural net? Join the club…

Rather amazingly, you can train artificial neural nets such as RNNs and CNNs to get wise and not make the same mistake twice. It’s this power to follow Esther Dyson’s advice that makes neural nets the intelligence engine that drives machine learning and AI. That said, training neural networks is a notoriously tricky task. But this hasn’t stopped researchers and coders from working furiously over the last few years to find new ways to reduce training errors with bleeding-edge optimization algorithms. The first stab at the error-reduction problem is best known as hill climbing. Let’s run through it.

Hill climbing

Optimization algorithms that belong to the hill climbing club always check for the gradient (more or less the steepness of a graphed function’s slope) before making their next move. But this runs a real risk of missing out on the real action going on in the graph’s landscape. Two enemies hill climbers often find themselves facing are the plateau problem and the local minima problem. In a word, these problems are the hiker’s equivalent to getting lost in a mirage-riddled desert, or getting stuck in a small muddy valley. But let’s dig deeper…

blog1.png

The plateau problem

When an optimization process enters a plateau, it means it’s getting roughly the same output (y) for every input (x). Because the slope of the function is at or near zero for long flat stretches, an optimization algorithm can run out of time before it finds the edge. And like a shimmering desert mirage, the long stretches of flat function can create the illusion that you’ve reached an optimal state (in this case the global minimum) when you’re nowhere near it.

blog2.png

The local minima problem

A local minimum is a relatively small valley in the graph of a function whose deepest and most important valley lies elsewhere. You can think of the optimization process (when it’s searching for the lowest value in a function) as a beach ball: it will roll downhill and eventually stop at the lowest point in the immediate landscape, even if there’s a much deeper valley on the other side of a nearby hill. That’s the problem.

blog3.png

The most exciting solution

There are a number of alternatives to conventional hill climbing that can help you get out of the dreaded valleys posed by the local minima problem and the desert mirages posed by the plateau problem. But for the purposes at hand, let’s just focus on the most exciting solution: simulated annealing. This is a wild breed of optimization animal that is tackling valleys and plateaus in a computationally clever way that’s well worth at least a couple paragraphs of pondering…

The hottest and coolest classical optimization algorithm around

To cut to the chase, simulated annealing steals from physics to tie time and temperature together in a single elegant algorithm. Yes, you read that right: an algorithm with a temperature parameter. When you run a simulated annealing algorithm, it begins with a completely random, frenetic series of selections from the entire landscape of the function at hand. This is the hot phase of the process. But as the temperature parameter drops with time, the random selections cover an ever-narrower range of the landscape. Finally, we enter the cool phase of the process as the algorithm begins to home in on (with a little luck) the deepest valley or the highest peak, where the holy grail of optimization lies: the global minimum or maximum.

Although the code found in simulated annealing algorithms generally contains some heavy math, the underlying connection between time and temperature is quite easy to grasp. Just picture something hot moving wildly throughout an unknown landscape, hitting everything in sight and reporting back a very rough picture of the lay of the land. Then you can think of progressively colder things moving ever-more slowly and cautiously through an ever-narrower region of the landscape, documenting the details as they creep further down into the deepest valley or up onto the highest peak… Okay, if you’re still not sure what on earth I’m talking about, here’s an excellent animation that should do the trick.

Quantum annealing (oh, what a feeling)

Simulated annealing can often get you out of a pinch when the other alternatives to conventional hill climbing come up short. But it’s an extremely specialized approach, and it suffers from at least one chilling drawback: you have to run the algorithm for an infinite amount of time to smoothly reach absolute zero and thus guarantee that you reach the true global minimum or maximum in the energy landscape. Since you probably don’t have an eternity to spare, you will never really know if your optimization solution is caught in yet another trap.

Enter quantum annealing. First, it’s important to keep in mind that quantum annealing algorithms in their basic form are remarkably similar to simulated annealing algorithms. Why? Because quantum tunneling strength plays the same role in quantum annealing as temperature does in simulated annealing. As time passes, the quantum tunneling strength in the quantum annealer drops dramatically, just as the temperature in the simulated annealer drops dramatically. It’s also easy to visualize the similarity between tunneling-strength and temperature. As time passes and quantum tunneling strength decreases, the system gets cozier and cozier with each progressively deeper valley in the energy landscape, and less and less inclined to tunnel its way out. Eventually, it gives up tunneling altogether when it finds itself (ideally) at the bottom of the deepest and coziest valley in the energy landscape (AKA the global minimum).

blog4.jpeg

Not your grandmother’s quantum computer

The first difference your bound to notice between relatively conventional quantum computers and quantum annealing computers is the number of qubits they use. While the state-of-the-art in conventional quantum computers is pushing a few dozen qubits in 2018, the leading quantum annealer has more than 2000 qubits. Of course, the trade-off is that quantum annealers are not universal but specialized quantum computers that technically tackle only optimization problems and sampling problems. Because solving optimization problems is considered one of the key paths to the AI promised land, I’m going to focus on it from here on out.

A dizzying state of disarray

Before we apply the quantum annealing algorithm to the pool of qubits in our quantum annealer, they’re a mess: a maximally cloudy and unconnected configuration. This means we start out knowing nothing about the quantum system, which may be in any of 2n different states (where n is the number of qubits). For a quantum annealer with 2000 qubits, that’s a crazy number of possible states. If you have any doubts about that, try plugging 2²⁰⁰⁰ into your favorite calculator for a second opinion.

The quantum wishing well

Individual qubits always start out in an initial state of cloudy superposition that places them at the minimum possible energy. Physicists like to visualize this lowest-energy state as the bottom of a quantum potential well that looks sort of like a big letter U.

U

0/1

Then quantum annealing comes along and forces the state of superposition into two halves, two states, two bottoms of the well: 0 and 1. The result looks more like a big letter W:

W

0 1

The next step for the quantum annealer is to start loading the dice to favor the house in the quantum probability game.

Biases

With the help of an applied magnetic field, the quantum annealer nudges each qubit into being heavily biased toward 0 or 1: favoring either the first or the second dip in the W above.

Couplings

While quantum annealers are loading the dice (that is, individual qubits) with biases via magnetic fields, they are also busy tying together pairs of dice with theoretical threads via couplers. Specifically, a coupler can do one of two things. It can guarantee that a pair of qubits are always in the same state: either both 0 or both 1Or it can ensure that two neighboring qubits are always in the opposite state: 0 and 1, or 1 and 0. The quantum coupler uses (surprise, surprise) quantum entanglement to tie qubits together and create the couplings.

Sculpting an energy landscape

As an aspiring developer working with a quantum annealer, it’s your job to essentially load all the quantum dice by coding a collection of biases and couplings that define the optimization problem you want your trusty annealer to solve. Another way to look at it is that you are sculpting, or at least generating, a sophisticated energy landscape of peaks and valleys that represent all possible outcomes in your optimization problem. Then you are setting the quantum annealer loose to search and ferret out the very bottom of that energy landscape’s deepest valley, which corresponds to the optimal solution. If you’re consistently successful, then your quantum annealing prowess may help power a new generation of machine learning and AI for posterity.

Quantum computing and AI news

On August 31st, 2017, the Universities Space Research Association (USRA) announced that in partnership with NASA and Google it had upgraded the quantum annealing computer at the Quantum Artificial Intelligence Lab(Quantum AI Lab) to a D-Wave 2000Q. With nearly twice as many qubits as its predecessor and a new knack for “adiabatic quantum computing,” the latest D-Wave is going after bigger fish in the optimization-problem pond. The USRA team has even got their eye on using quantum algorithms and the D-Wave to tackle “challenging computational problems involved in NASA missions.” Partner Google, on the other hand, has their eye on AI:

“We are particularly interested in applying quantum computing to artificial intelligence and machine learning.”

But it’s not just Google and NASA that have access to the Quantum AI Lab. Believe it or not, you may too. If you’re a qualified candidate, you might just get some quality time with the latest D-Wave to try out your genius idea. In the Lab’s own words, “the call is open.”

If you liked this article I would be super excited if you could share with your curious friends. Anyway, thanks again for reading have a great day!

Demystifying Quantum Gates — One Qubit At A Time — February 27, 2018

Demystifying Quantum Gates — One Qubit At A Time

(I’ve written an introduction to quantum computing found here. If you are brand new to the field, it will be a better place to start.)

If you want to get into quantum computing, there’s no way around it: you will have to master the cloudy concept of the quantum gate. Like everything in quantum computing, not to mention quantum mechanics, quantum gates are shrouded in an unfamiliar fog of jargon and matrix mathematics that reflects the quantum mystery. My goal in this post is to peel off a few layers of that mystery. But I’ll save you the suspense: no one can get rid of it completely. At least, not in 2018. All we can do today is reveal the striking similarities and alarming differences between classical gates and quantum gates, and explore the implications for the near and far future of computing.

Classical vs quantum gates: comparing the incomparable?

Striking similarities

If nothing else, classical logic gates and quantum logic gates are both logic gates. So let’s start there. A logic gate, whether classical or quantum, is any physical structure or system that takes a set of binary inputs (whether 0s and 1s, apples and oranges, spin-up electrons and spin-down electrons, you name it) and spits out a single binary output: a 1, an orange, a spin-up electron, or even one of two states of superposition. What governs the output is a Boolean function. That sounds fancy and foreboding, but trust me, it’s not. You can think of a Boolean function as nothing more than a rule for how to respond to Yes/No questions. It’s as simple as that. The gates are then combined into circuits, and the circuits into CPUs or other computational components. This is true whether we’re talking about Babbage’s Difference EngineENIAC, retired chess champion Deep Blue, or the latest room-filling, bone-chilling, headline-making quantum computer.

Alarming differences

Classical gates operate on classical bits, while quantum gates operate on quantum bits (qubits). This means that quantum gates can leverage two key aspects of quantum mechanics that are entirely out of reach for classical gates: superposition and entanglement. These are the two concepts that you’ll hear about most often in the context of quantum computing, and here’s why. But there’s a lesser known concept that’s perhaps equally important: reversibility. Simply put, quantum gates are reversible. You’ll learn a lot about reversibility as you go further into quantum computing, so it’s worth really digging into it. For now, you can think of it this way — all quantum gates come with an undo button, while many classical gates don’t, at least not yet. This means that, at least in principle, quantum gates never lose information. Qubits that are entangled on their way into the quantum gate remain entangled on the way out, keeping their information safely sealed throughout the transition. Many of the classical gates found in conventional computers, on the other hand, do lose information, and therefore can’t retrace their steps. Interestingly enough, that information is not ultimately lost to the universe, but rather seeps out into your room or your lap as the heat in your classical computer.

V is for vector

We can’t talk about quantum gates without talking about matrices, and we can’t talk about matrices without talking about vectors. So let’s get on with it. In the language of quantum mechanics and computing, vectors are depicted in an admittedly pretty weird package called a ket, which comes from the second half of the word braket. And they look the part. Here’s a ket vector: |u>, where u represents the values in the vector. For starters, we’ll use two kets, |0> and |1>, which will stand-in for qubits in the form of electrons in the spin-up (|0>) and spin-down (|1>) states. These vectors can span any number of numbers, so to speak. But in the case of a binary state such as a spin up/down electron qubit, they have only two. So instead of looking like towering column vectors, they just looked like numbers stacked two-high. Here’s what |0> looks like:

/ 1 \

\ 0 /

Now, what gates/matrices do is transform these states, these vectors, these kets, these columns of numbers, into brand new ones. For example, a gate can transform an up-state (|0>) into a down state (|1>), like magic:

/ 1 \ → / 0 \

\ 0 / \ 1 /

M is for matrix

This transformation of one vector into another takes place through the barely understood magic of matrix multiplication, which is completely different than the kind of multiplication we all learned in pre-quantum school. However, once you get the hang of this kind of math, it’s extremely rewarding, because you can apply it again and again to countless otherwise incomprehensible equations that leave the uninitiated stupefied. If you need some more motivation, just remember that it was through the language of matrix mathematics that Heisenberg unlocked the secrets of the all-encompassing uncertainty principle.

All the same, if you’re not familiar with this jet-fuel of a mathematical tool, your eyes will glaze over if I start filling this post with big square arrays of numbers at this point. And we can’t let that happen. So let’s wait a few more paragraphs for the matrix math and notation. Suffice it to say, for now, that we generally use a matrix to stand-in for a quantum gate. The size and outright fear-factor of the matrix will depend on the number of qubits it’s operating on. If there’s just one qubit to transform, the matrix will be nice and simple, just a 2 x 2 array with four elements. But the size of the matrix balloons with two, three or more qubits. This is because a decidedly exponential equation that’s well worth memorizing drives the size of the matrix (and thus the sophistication of the quantum gate):

2^n x 2^n = the total number of matrix elements

Here, n is the number of qubits the quantum gate is operating on. As you can see, this number goes through the roof as the number of qubits (n) increases. With one qubit, it’s 4. With two, it’s 16. With three, it’s 64. With four, it’s… hopeless. So for now, I’m sticking to one qubit, and it’s got Pauli written all over it.

The Pauli gates

The Pauli gates are named after Wolfgang Pauli, who not only has a cool name, but has managed to immortalize himself in two of the best-known principles of modern physics: the celebrated Pauli exclusion principle and the dreaded Pauli effect.

The Pauli gates are based on the better-known Pauli matrices (aka Pauli spin matrices) which are incredibly useful for calculating changes to the spin of a single electron. Since electron spin is the favored property to use for a qubit in today’s quantum gates, Pauli matrices and gates are right up our alley. In any event, there’s essentially one Pauli gate/matrix for each axis in space (X, Y and Z).

So you can picture each one of them wielding the power to change the direction of an electron’s spin along their corresponding axis in 3D space. Of course, like everything else in the quantum world, there’s a catch: this is notour ordinary 3D space, because it includes an imaginary dimension. But let’s let that slide for now, shall we?

Mercifully, the Pauli gates are just about the simplest quantum gates you’re ever going to meet. (At least the X and Z-gates are. The Y is a little weird.) So even if you’ve never seen a matrix in your life, Pauli makes them manageable. His gates act on one, and only one, qubit at a time. This translates to simple, 2 x 2 matrices with only 4 elements a piece.

The Pauli X-gate

The Pauli X-gate is a dream come true for those that fear matrix math. No imaginary numbers. No minus signs. And a simple operation: negation. This is only natural, because the Pauli X-gate corresponds to a classical NOT gate. For this reason, the X-gate is often called the quantum NOT gate as well.

In an actual real-world setting, the X-gate generally turns the spin-up state |0> of an electron into a spin-down state |1> and vice-versa.

|0>   -->   |1>   OR   |1> --> |0>

A capital “X” often stands in for the Pauli X-gate or matrix itself. Here’s what Xlooks like:

/ 0 1 \

\ 1 0 /

In terms of proper notation, applying a quantum gate to a qubit is a matter of multiplying a ket vector by a matrix. In this case, we are multiplying the spin-up ket vector |0> by the Pauli X-gate or matrix X. Here’s what X|0> looks like:

/ 0 1 \ /1\

\ 1 0 / \0/

Note that you always place the matrix to the left of the ket. As you may have heard, matrix multiplication, unlike ordinary multiplication, does not commute, which goes against everything we were taught in schoolIt’s as if 2 x 4 was not always equal to 4 x 2. But that’s how matrix multiplication works, and once you get the hang of it, you’ll see why. Meanwhile, keeping the all-important ordering of elements in mind, the complete notation for applying the quantum NOT-gate to our qubit (in this case the spin-up state of an electron), looks like this:

X|0> = / 0 1 \ /1\ = /0\ = |1>

\ 1 0 / \0/ \1/

Applied to a spin-down vector, the complete notation looks like this:

X|1> = / 0 1 \ /0\ = /1\ = |0>

\ 1 0 / \1/ \0/

Despite all the foreign notation, in both of these cases what’s actually happening here is that a qubit in the form of a single electron is passing through a quantum gate and coming out the other side with its spin flipped completely over.

The Pauli Y and Z-gates

I’ll spare you the math with these two. But you should at least know about them in passing.

Of the three Pauli gates, the Pauli Y-gate is the fancy one. It looks a lot like the X-gate, but with an i (yep, the insane square root of -1) in place of the regular 1, and a negative sign in the upper right. Here’s what Y looks like:

/ 0 -i \

i 0 /

The Pauli Z-gate is far easier to follow. It looks kind of like a mirror image of the X-gate above, but with a negative sign thrown into the mix. Here’s what Zlooks like:

/ 1 0 \

\ 0 -1 /

The Y-gate and the Z-gate also change the spin of our qubit electron. But I’d probably need to delve into the esoteric mysteries of the Bloch sphere to really explain how, and I’ve got another gate to go through at the moment…

The Hadamard gate

While the Pauli gates are a lot like classic logic gates in some respects, the Hadamard gate, or H-gate, is a bona fide quantum beast. It shows up everywhere in quantum computing, and for good reason. The Hadamard gate has the characteristically quantum capacity to transform a definite quantum state, such as spin-up, into a murky one, such as a superposition of both spin-up and spin-down at the same time.

Once you send a spin-up or spin-down electron through an H-gate, it will become like a penny standing on its end, with precisely 50/50 odds that it will end up heads (spin-up) or tails (spin-down) when toppled and measured. This H-gate is extremely useful for performing the first computation in any quantum program because it transforms pre-set, or initialized, qubits back into their natural fluid state in order to leverage their full quantum powers.

Other quantum gates

There are a number of other quantum gates you’re bound to run into. Many of them operate on several qubits at a time, leading to 4×4 or even 8×8 matrices with complex-numbered elements. These are pretty hairy if you don’t already have some serious matrix skills under your belt. So I’ll spare you the details.

The main gates that you will want to be familiar are the ones we covered shown in the graph below:

You should know that other gates exist so here’s a quick list of some of the most widely used other quantum gates, just so you can get a feel for the jargon:

  • Toffoli gateFredkin gate
  • Deutsch gate
  • Swap gate (and swap-gate square root)
  • NOT-gate square root
  • Controlled-NOT gate (C-NOT) and other controlled gates

There are many more. But don’t let the numbers fool you. Just as you can perform any classical computation with a combination of NOT + OR = NOR gates or AND + OR = NAND gates, you can reduce the list of quantum gates to a simple set of universal quantum gates. But we’ll save that deed for another day.

Future gazing through the quantum gateway

As a recent Quanta Magazine article points out, the quantum computers of 2018 aren’t quite ready for prime time. Before they can step into the ring with classical computers with billions of times as many logic gates, they will need to face a few of their own demons. The most deadly is probably the demon of decoherence. Right now, quantum decoherence will destroy your quantum computation in just “a few microseconds.” However, the faster your quantum gates perform their operations, the more likely your quantum algorithm will beat the demon of decoherence to the finish line, and the longer the race will last. Alongside speed, another important factor is the sheer number of operations performed by quantum gates to complete a calculation. This is known as a computation’s depth. So another current quest is to deepen the quantum playing field. By this logic, as the rapidly evolving quantum computer gets faster, its calculations deeper, and the countdown-to-decoherence longer, the classical computer will eventually find itself facing a formidable challenger, if not successor, in the (quite possibly) not too far future.

If you liked this article I would be super excited if you hit the like button 🙂 or share with your curious friends. You can subscribe to this profile and get all my articles sent to you as soon as I write them by clicking the subcribe button! (how awesome?!)

Anyway, thanks again for reading have a great day!

The Need, Promise, and Reality of Quantum Computing — February 1, 2018

The Need, Promise, and Reality of Quantum Computing

Despite giving us the most spectacular wave of technological innovation in human history, there are certain computational problems that the digital revolution still can’t seem to solve. Some of these problems could be holding back key scientific breakthroughs, and even the global economy. Although conventional computers have been doubling in power and processing speed nearly ever two years for decades, they still don’t seem to be getting any closer to solving these persistent problems. Want to know why? Ask any computer scientist, and they’ll probably give you the same answer: today’s digital, conventional computers are built on a classical, and very limited, model of computing. In the long run, to efficiently solve the world’s most persistent computing problems, we’re going to have to turn to an entirely new and more capable animal: the quantum computer.

Ultimately, the difference between a classical computer and a quantum computer is not like the difference between an old car and a new one. Rather, it’s like the difference between a horse and a hawk: while one can run, the other can fly. Classical computers and quantum computers are indeed that different. Here we take a good look at where the key difference lies, and take a deep dive into what makes quantum computers unique. However, what you won’t find here is a final explanation for how quantum computers ultimately work their magic. Because no one really knows.

The hard limits of classical computing

Moore’s law, Shmore’s Law

For several decades now, the sheer speed and computational power of conventional computers has been doubling every two years (and by some accounts just eighteen months). This is known as Moore’s law. Although the breakneck pace of progress may have finally begun to slow slightly, it’s still more or less true that the room-filling supercomputer of today is the budget laptop of tomorrow. So at this rate, it seems reasonable to assume that there is no computational task that a conventional computer couldn’t eventually tackle in the foreseeable future. Nonetheless, unless we’re talking trillions of years (and then some), that’s simply not a safe assumption when it comes to certain stubborn tasks.

The conventional computer’s Achilles heel

The fact is that a computational task such as quickly finding the prime factors for very large integers is probably out of reach for even the fastest conventional computers of the future. The reason behind this is that finding the prime factors of a number is a function that has exponential growth. What’s exponential growth? Well let’s dive into it because this is a very important piece for understanding why quantum computers have so much potential and why classical computers fall short.

Quick introduction to exponential growth

Some things grow at a consistent rate and somethings grow faster as the number of “things” you have also grows. When growth becomes more rapid (not constant) in relation to the growing total number, then it is exponential.

Exponential growth is extremely powerful. One of the most important features of exponential growth is that, while it starts off slowly, it can result in enormous quantities fairly quickly — often in a way that is shocking.

This definition can be a bit hard to get your head around without an example, so let’s dive into a quick story.

There is a legend in which a wise man, who was promised an award by a king, asks the ruler to reward him by placing one grain of rice on the first square of a chessboard, two grains on the second square, four grains on the third and so forth. Every square was to have double the number of grains as the previous square. The king granted his request but soon realized that the rice required to fill the chessboard was more than existed in the entire kingdom and would cost him all of his assets.

Exponential Growth of Rice

The number of grains on any square reflects the following rule, or formula:

In this formula, k is the number of the square and N is the number of grains of rice on that square.

  • If k = 1 (the first square), then N = 2⁰, which equals 1.
  • If k = 5 (the fifth square), then N = 2⁴, which equals 16.

This is exponential growth because the exponent, or power, increases as we go from square to square.

To conceptualize this further, I’ve included a graph of what exponential growth looks like in relation to the input quantity of an exponential function.

As you can see, the function starts relatively slow, but soon shoots up to numbers that no classical computer would be able to compute with large enough input sizes.

Real exponential functions have real consequences

Okay, enough storytelling. Let’s move on to real-world exponential problems like the one we were talking about earlier. Prime Factorization.

Take the number 51. See how long it takes you to find the two unique prime numbers that you can multiply together to generate it. If you’re familiar with these kinds of problems, it probably only took you a few seconds to find that 3 and 17, both primes, generate 51. As it turns out, this seemingly simple process, lies at the heart of the digital economy and is the basis for our most secure types of encryption. The reason we use this technique in encryption is because as the numbers used in prime factorization get larger and larger, it becomes increasingly difficult for conventional computers to factor them. Once you reach a certain number of digits, you find that it would take even the fastest conventional computer months, years, centuries, millennia, or even countless eons to factor it.

With this idea in mind, even if computers continue to double in processing power every two years for the foreseeable future (and don’t bet on it), they will always struggle with prime factorization. Other equally stubborn problems at the heart of modern science and mathematics include certain molecular modeling and mathematical optimization problems which promise to crash any supercomputer that dares to come anywhere near them.

Below is a great illustration from IBM Research that shows the most complex molecule (F cluster) that we can simulate on our the worlds most powerful supercomputer. As you can see (in the bottom left of the image), the molecule is not very complex at all, and if we want to model more complex molecules to discover better drug treatments and understand our biology, then we will need a different approach!

Molecular Simulation Problem. Source: IBM Research

Enter the quantum computer

Conventional computers are strictly digital and rely purely on classical computing principles and properties. Quantum computers, on the other hand, are strictly quantum. Accordingly, they rely on quantum principles and properties — most importantly superposition and entanglement — that make all the difference in their almost miraculous capacity to solve seemingly insurmountable problems.

Superposition

To make sense out of the notion of superposition, let’s consider the simplest possible system: a two-state system. An ordinary, classical two-state system is like an On/Off switch that is always in one state (On) or another (Off). Yet a two-state quantum system is something else entirely. Of course, whenever you measure its state, you will find that it is indeed either on or off, just like a classical system. But between measurements, a quantum system can be in a superposition of both on and off states at the same time, no matter how counter-intuitive, and even supernatural, this may seem to us.

Superposition. Source: IBM Research

Generally speaking, physicists maintain that it’s meaningless to talk about a quantum system’s state, such as its spin, prior to measurement. Some even argue that the very act of measuring a quantum system causes it to collapse from a murky state of uncertainty to the value (On or Off, Up or Down) that you measure. Although probably impossible to visualize, there’s no escaping the fact that this mysterious phenomenon is not only real but gives rise to a new dimension of problem-solving power that paves the way for the quantum computer. Keep the idea of superposition in mind. We will come back to how this is used in quantum computing in a bit.

How superposition is even possible is beyond the scope of this article, but trust that it has been proven to be true. If you want to understand what gives rise to superposition then you are going to first need to understand the idea of Wave/Particle Duality.

Entanglement

Okay, on to the next property of quantum mechanics which we need to leverage to create a quantum computer.

It is known that once two quantum systems interact with one another, they become hopelessly entangled partners. From then on, the state of one system will give you precise information about the state of the other system, no matter how far the two are from one another. Seriously, the two systems can be light years apart and still give you precise and instantaneous information about each other. Let’s illustrate this with a concrete example as this caused even Einstein to puzzle about how this could be possible. (Einstein famously referred to this phenomenon as “Spooky action at a distance”)

Quantum Entanglement. Source: IBM Research

Suppose you have two electrons, A and B. Once you have them interact in just the right way, their spins will automatically get entangled. From then on, if A’s spin is Up, B’s spin will be Down, like two kids on a seesaw, except that this holds true even you take A and B to opposite ends of the Earth (or the galaxy, for that matter). Despite the thousands of miles (or light years) between them, it’s been proven that if you measure A to have spin Up, you will know instantly that B’s spin is Down. But wait: we’ve already learned that these systems don’t have precise values for states such as spin, but rather exist in a murky superposition, prior to measurement. So does our measuring A actually cause B to instantaneously collapse to the opposite value, even when the two are light years apart? If so, then we have yet another problem on our hands, because Einstein taught us that no causal influence, such as a light signal, between two systems can travel faster than the speed of light. So what gives? All told, we honestly don’t know. All we know is that quantum entanglement is real and that you can leverage it to work wonders.

The qubit

The qubit plays the same role in quantum computing as the bit does in classical computing: its the fundamental unit of information. However, compared to a qubit, a bit is downright boring. Although both bits and qubits generate one of two states (a 0 or a 1) as the outcome of a computation, a qubit can simultaneously be in both 0 and 1 states prior to that outcome. If this sounds like quantum superposition, it is. Qubits are quantum systems par excellence.

Just as conventional computers are built bit by bit with transistors that are either On or Off, quantum computers are built qubit by qubit with electrons in spin-states that are either Up or Down (once measured, of course). And just as transistors in On/Off states are strung together to form the logic gates that perform classical computations in digital computers, electrons in Up/Down spin-states are strung together to form the quantum gates that perform quantum calculations in quantum computers. Yet stringing together individual electrons (while preserving their spin states) is far, far easier said than done.

Quantum Algorithms. Source: IBM Research

Where are we today?

While Intel is busy pumping out conventional chips with billions of transistors a piece, the world’s leading experimental computer scientists are still struggling to build a quantum computer “chip” with more than a handful of qubits. Just to give you a sense of how early we are in the history of quantum computing, it was a big deal when recently IBM unveiled the largest quantum computer in the world with an astonishing… wait for it… 50 qubits. Nonetheless, it’s a start, and if anything like Moore’s law applies to quantum computers, we should get into the hundreds in a few years, and the thousands in a few more. A billion? I wouldn’t hold your breath, but then again, you don’t need a billion qubits to outperform the daylights out of a conventional computer in some key categories, such as prime categorization, molecular modeling and a slew of optimization problems that no conventional computer can touch today.

The quantum computers of 2018

All the same, as of right now, nearly every quantum computer is a multi-million dollar borderline mad-scientist project that looks the part. You generally find them in R&D departments at large IT companies like IBM, or in the experimental physics wing of large research universities, like MIT. They have to be super-cooled to a hair above absolute zero (that’s colder than intergalactic space), and experimenters need to use microwaves of a precise frequency to communicate with each qubit in the computer individually. Needless to say, that doesn’t scale. But neither did the vacuum tubes of the earliest conventional computers, so let’s not judge this first generation too harshly.

Roadblocks awaiting breakthroughs

The primary reason that quantum computers haven’t gone mainstream yet is that the best minds and inventors in the world are still struggling with high error rates and low numbers of qubits. As we address these two problems together, we will rapidly increase what IBM calls each computers’ “quantum volume,” a way of visualizing the sheer quantity of useful calculations a quantum computer can perform.

Quantum Volume. Source: IBM Research

In short, for quantum computing to take off and quantum-powered Macbooks to start flying off the shelves, we need far more qubits and far fewer mistakes. That’s going to take time, but at least we know what we’re aiming for, and what we’re up against.

Myths vs explanations

Although we know that quantum computers can easily do things that no conventional computer can dream of doing, we don’t really know how they do it. If this sounds surprising, given that the first-generation of quantum computers already exists, keep in mind the word quantum. We’ve been using quantum mechanics to solve problems for a century now, and we still don’t really know how it works. Quantum computing, as a member of the quantum family, is in the same boat. Michael Nielsen (who basically wrote the book on the subject), has argued convincingly that any explanation of quantum computing is destined to miss the mark. After all, according to Nielsen, if there were a straightforward explanation for how a quantum computer works (that is, something you could visualize), then it could be simulated on a conventional computer. But if it could be simulated on a conventional computer, then it couldn’t be an accurate model of a quantum computer, because a quantum computer by definition does what no conventional computer can do.

According to Nielsen, the most popular myth that pretends to explain quantum computation is called quantum parallelism. Because you’re going to hear the quantum parallelism story a lot, let’s look at it for a moment. The basic idea behind quantum parallelism is that quantum computers, unlike their conventional counterparts, explore the full spectrum of possible computational outcomes/solutions simultaneously (i.e. in a single operation), while digital computers must plod along, looking at each solution in sequence. According to Nielsen, this part of the quantum-parallelism story is roughly right. However, he sharply criticizes the rest of the story, which goes on to say that after surveying the full spectrum of solutions, quantum computers pick out the best one. Now that, according to Nielsen, is a myth. The truth, he insists, is that what quantum computers, like all quantum systems, are really doing behind the scenes is entirely out of our reach. We see the input, and the output, and what happens in between is sealed in mystery.

If you liked this article I would be super excited if you share with your curious friends. I’ve got much more like it coming and if you want to be notified whenever I post a new article you can just subscribe to this blog and have the articles sent to you as soon as I write them! (how awesome?!)

Anyway, thanks again for reading have a great day!

Why AlphaGo is a bigger game changer for Artificial Intelligence than many realize — October 9, 2017

Why AlphaGo is a bigger game changer for Artificial Intelligence than many realize

What’s all this fuss about the AI AlphaGo’s recent victory against the masters?

While it’s seemed like AI had hit a dead-end as much as a decade ago, if you’re like many of us sci-fi enthusiasts and have always wanted an AI best friend, the recent victory of AlphaGo has brought us much closer than you may have thought was possible.

AI is Finally Moving Forward

We’re not surprised if you haven’t been following the recent developments in AI all that closely because, for the most part, it’s seemed like nothing exciting has happened for quite a long time. Sci-fi dreams about computer powered best friends aside, AI for the general public has come to mean reasonably responsive and well-programmed computer assistance rather than independent thinking machines. Concepts like ‘smart’ chatbots somehow seem to pull us further from the Star Trek or Heinlinian dream of fully sentient and intuitive computers while many products and services that claim to integrate AI seem to be nothing more than a fast way to analyze large amounts of data. In fact, the last time most of us heard something hopeful about AI was when Deep Blue beat the world Chess champion, but what ever came of that AI? Surely it hasn’t used that incredible logical power to take over the world or begin making friends, so what do we even care?

Not All AIs are Equal

The answer lies in the fact that there many forms of Artificial Intelligence and most of them are limited by the tasks they were made to perform. That’s what makes AlphaGo so special, because while it was designed, named, and trained to play Go against the masters, its potential functionalities go well beyond the realm of board games unlike most of its AI contemporaries.

While practical applications for specifically built AI are growing, the tradition of training your AI programming skills on classic strategy games has existed since the 1950s when a computer was programmed to play and was able to win a game of tic-tac-toe. Since then a large variety of games and custom built AIs have been tested against each other to the great entertainment of experts in the field and curious nerds like us who care about that sort of thing. The real difference is not what they’re programmed for but how they are programmed to start with and, in fact, this is also what most profoundly distinguishes Alpha Go from its older-generation relative, the Chess champion Deep Blue.

Chess is a Closed Game

You may not know this, but there is a standard way to program an AI to play a board game known as the Search Tree in which the computer analyzes all the pieces and spaces in a game and determines which move during its turn is most likely to result in victory. However, for games with a limited total number of moves and responses, you don’t even have to spend too much time on programming good judgment, all you need is a complete understanding of the game. That said, consider how long people have been playing, analyzing, and writing down their analysis for chess.

Every possible arrangement of the limited and highly specialized pieces on the board has been replicated and studied in-depth. Do you know what they found? There is a finite number of possible piece arrangements on the board and each one of these finite arrangements has a finite number of moves that can be made and each of these moves can then be judged as a good or bad idea. In other words, you can contain every possible chess move and the best move for each board arrangement into a single database. That’s right, the quick and dirty way to make a chess “AI” doesn’t even require any thought, simply a database containing a complete knowledge of the game. Therefore AIs were always destined to master chess because it can simply store everything there is to know and reference it at will.

So how did Deep Blue win back in the 90s? You can breath easy knowing that the famous AI did not use the database method but instead relied on a parallel system designed to run a complex tree search. At each point in the game it would analyze the board and run an assessment on the possible moves it detected and which could move it closer toward a win. Defeating the world chess champion was a huge victory for Deep Blue by more than just capturing a king piece. It indicated that the AI’s board assessment program could be faster and smarter than a human strategy expert, but it was not what most of us sci-fi enthusiasts would think of as the beginning of independent computer thought. The only thing Deep Blue can do is play chess and because chess is a finite game, Deep Blue never needs to get smarter.

Go is Not a Closed Game

People have been trying to define Go for thousands of years. With computer analysis in hand, they have tried to discover if it is a finite game, like chess, and it simply cannot be done. With a near-infinite number of pieces available to each player and the complexities of the game itself, there are too many possibilities, board arrangements, and good or bad placement choices for any reasonable purpose-built program to handle. While you can make a program that plays go, until AlphaGo computer opponents only ever reached an intermediate level of capability and trying to fill a database with all the possible board arrangements and possible moves might well catch your servers on fire.

AlphaGo Learns to Play

It is for this reason that many people, Go masters included, were certain that a computer could never learn to beat the human champions of the game and for this reason that DeepMind decided to try. Why has AlphaGo succeeded where other AIs were judged to not even have a chance? The difference was that DeepMind decided to try something new in the world of games vs AIs: Machine learning and neural networks instead of custom built search trees. AlphaGo doesn’t just judge the board, it learns from its mistakes. Like a go expert who has been playing since their early childhood, they ran AlphaGo through thousands of games against itself and it learned from every one of those games how to be a better player, improve its strategy, and it never gets bored, frustrated, or tired during practice.

AlphaGo Teaches the Masters

Two years ago, DeepMind felt that AlphaGo was ready to start playing against expert human opponents and invited the European Go champion Mr. Fan Hui to a closed-door five-game test. To their surprise and delight, it won every single game and became the first computer program to defeat a professional go player. They then set it against the legendary winner of 18 world titles, Mr. Lee Sedol in Seoul in which it won 4–1 and earned a 9-dan professional ranking, the highest certification available. If this wasn’t awesome enough, during these games AlphaGo dazzled the audience and its opponent with creative winning moves, one of which effectively overturned hundreds of years of cumulative go wisdom.

DeepBlue Was Columbus Discovering America And AlphaGo Is The Moon Landing

Any computer scientist or programmer will admit that DeepBlue achieved something incredible when it beat Kasparov. But the amazing feat was in the computational power that DeepBlue had. It did not learn to play chess. It was programmed to search through thousands of chess games and evaluate the best move it had. Once DeepBlue had won the game and proven its strength, it was packed away and it has not been seen since. Everyone knew that its only purpose was to play chess and its programming could not be applied to much of anything else. AlphaGo, on the other hand, took the idea of computational power and added human reasoning or intuition — this combination makes it incredibly applicable to countless purposes.

Computer Scientists Versus Chess Masters

Another very unique aspect of how AlphaGo was created versus how DeepBlue was created is who the experts relied on. With DeepBlue, the computer scientists heavily relied on Chess experts, professionals, and masters to help the program have as many chess games programmed into it as possible. And the thing is, even after DeepBlue had strutted its stuff, it did not change much for the world of Chess. Chess players did not learn anything from it. With AlphaGo, however, the computer scientists simply used lots and lots of games from a myriad of players, who were all at different levels of Go knowledge and experience. And unlike when DeepBlue was unveiled, when AlphaGo was first shown to the world, Go players paid attention. They saw that AlphaGo was playing in innovative ways. It has taught them to think and play more creatively.

AlphaGo’s Intuitive Factor

It is easy to say that AlphaGo has intuition, which DeepBlue was missing. It is much more difficult to explain where that intuition comes from. To put it simply, it built on DeepBlue’s search and optimize idea. The DeepMind team programmed AlphaGo with 150,000 Go games that had been played by good players. It would then search through those games to base its next move on probability. To take AlphaGo to the next level, though, DeepMind used a neural network, or machine learning, so that through self-play and play against humans it could slowly make millions of tiny adjustments, allowing it to obtain something as close to intuition as possible.

And it is this intuition factor of recognizing good patterns and learning them that will have a much deeper impact on artificial intelligence. In the world of art, this type of artificial intelligence will expose a neural network to a specific artistic style, it will then show the network an image, and the network will replicate that image in the artistic style it was shown. In the world of language, the same neural networks are being used to recognize natural language. In the world of games, these networks are employed to improve video game experiences. And the list of future possibilities for expanding the impact of neural networks, machine learning, and artificial intelligence to provide the ability of intuition to computers is growing by the day — Think healthcare, smartphone assistants, and robotics. In fact, UK’s National Health Service has already signed a deal with DeepMind.

It Was Not Supposed To Be This Easy

Go is a game that has been around for 3000 years. It is widely accepted as the most challenging strategy game that exists. Individuals, especially in countries like South Korea and China, are sent to private school specifically to learn how to play the game at an expert level. It takes years of playing for several hours every day to master the game. In other words, even though it has simple rules, it is not a simple game to excel at. And due to its complexity, and how long it had taken computer scientists to create a machine that could win at Chess, experts estimated that a machine that could effectively play Go would only be created in about 10 years.

Surprise! Deep Mind managed to create a machine that could master the game, without being programmed with explicit rules and without being taught by a professional Go player. AlphaGo mainly played against itself and learned from this self-play. At its core, it learned like a human learns, by looking at the board, evaluating the options, making moves, and learning from mistakes — it just did it a lot faster than any human can.

This is extremely exciting because, at its core, what it means is that computer scientists have had all the tools they needed to do this for years. Neural networks have been known about and discussed since the middle of the last century. All it really took was simply getting creative with them, applying them in new ways. AlphaGo beating the world’s best Go player proves that AI has the potential to do anything. It can learn anything and understand anything, and from that learning and understanding it can accomplish what humans can accomplish in a much shorter period of time.

You’re probably wondering what this all means. The good news is that we’re much closer to the dream of an AI best friend than most of us would have dared to imagine a few years ago. Let it sink in for a moment: AlphaGo can learn the most complex, intuition and creativity based logic game known to man and it didn’t do so through a finite database or search trees alone. It learned from practice and experience, just like we do, and the ability to create amazing new solutions to ancient puzzles suggests a realm of digital creativity never before fathomed.

AlphaGo is not like other game playing AIs that have come before it. It is the future of intelligent and intuitive machines, one that we plan to turn toward more than just board games. From practical applications to that friend you’ve been hoping for, AlphaGo is sure to be the first of a new generation of self-learning intuitive AIs that go above and beyond the limited calculating capacities of its older siblings and contemporaries. If you love AI like I do, keep your eyes open for new practical applications for very real artificial intelligence popping up in places you may not have even imagined. The AI winter is over.

Scala vs Kotlin: Practical Considerations for the Pragmatic Programmer — September 14, 2017

Scala vs Kotlin: Practical Considerations for the Pragmatic Programmer

Java isn’t just a language; it’s an ecosystem. You can write code for the JVM without writing any Java. This gives you the option of using a more modern language. Some of the shortcomings of Java are obvious. It makes you write a lot of boilerplate code. It supports functional programming only as an afterthought; the lambda feature is a kludge. The NullPointerException is every Java programmer’s bane.

In 2004, a group led by Martin Odersky released an updated version of the language, called Scala (“scalable language”). It added features such as objects for everything, functions as assignable data, type inference, and pattern matching. It compiles to Java bytecodes and can be mixed with Java code.

Another language aimed at the same goals is Kotlin, released by JetBrains in 2012. It built on people’s experience with Scala. A common complaint with Scala is slow compilation time, and Kotlin offers compile speeds comparable to Java. It’s recently gotten a big boost from Google, which has declared it a first-class language for Android development.

If some features of Java constantly annoy you, you’ll find things to like in both languages. If you’re annoyed enough to make the jump, which way should you go? Should you choose the maturity of Scala or the freshness of Kotlin? There are benefits to each.

Solving problems different ways

Kotlin and Scala, like Java, are statically typed. Whatever type a variable starts out as, it will keep it for its whole life. But both of them save you some of the effort of declaring every variable. You can implicitly declare a type with an initializer. In either language you can write

var count = 1

That makes count an integer. Notice that no semicolon is required. The difference between the languages is that Scala goes much further in allowing implicit conversions. If you use x.transmogrify(), and x belongs to a class which doesn’t have a transmogrify function, that isn’t necessarily an error. You can create an implicit class which has a transmogrify method, and the compiler will figure out, without making you do any casting, whether it can step in to do the job.

Kotlin’s creators found this a little too free-wheeling. It lets you define extension methods on a class, adding custom functionality. You can do this even on standard data types. (Remember, everything is an object, so every data type is a class. Boxing of simple data types is no longer needed.)

Null values are a huge headache in Java. Scala helps to relieve this in a couple of ways. First, variables must be initialized. You can initialize them to the default value (var a:Int = _), which is often null, but at least it makes you aware you’re doing it. Second, the Option class helps in guaranteeing null-safety in parameters and returned values. It’s one of the more complicated features of the language to understand, but it gives you a lot of control.

Kotlin gets right to the point. By default, it doesn’t allow variables to have the value null. You can declare a variable to be nullable if you really need to, by putting a question mark after the type. If you’ve worked with Swift, this approach will sound familiar. If you use nullable values, the compiler does extensive checking to make sure you aren’t putting them at risk of a NullPointerException and will give you a compile-time error if you are.

Java makes you use a regular class or an enum if you just want to package some data together in an object. Scala and Kotlin offer some better options. Scala gives you the case class, which a specialized class for data objects. It automatically defines accessor functions (why doesn’t Java just do that?). Instances are compared by structure rather than reference. A copy function is automatically provided to do a shallow copy.

Kotlin’s data class does pretty much the same thing. The main difference is that Scala has a powerful pattern matching feature which Kotlin didn’t pick up. A match statement is like a bionically enhanced case statement. Patterns can check not only literal values but types, lists, and ranges. Scala can do matching on all kinds of objects, but the feature is especially powerful with case classes.

Scala provides strong support for XML. You can put XML directly into Scala code and assign it to an XML object. This creates complications, since a <operator that isn’t followed by a space may be read as the start of an XML expression. Kotlin uses the more traditional approach of classes to handle XML objects.

Type classes are a feature of Scala that doesn’t have an equivalent in Kotlin. A type class defines a set of operations which member classes must support. This isn’t like subclassing in Java; a type class can be added to types that already exist. It lets the developer create new kinds of polymorphism with existing types. Extension functions in Kotlin aren’t the same thing, but they let you add common ground to different types, so they address some of the same needs.

The feel of the language

OK, there are differences between the languages, but they aim at more or less the same thing. You can learn either one. Are there bigger, more philosophical reasons for choosing one or the other?

To some people, the difference is that Scala is more aimed at exploring new ideas, and Kotlin is more focused on getting results. Kotlin’s emphasis on fast compilation and its removal of some of Scala’s more esoteric features reflect this. Scala just lets you do lots of things. The operator name ?:+ appears to be legal, and maybe there’s a reason you’d want to use it. Kotlin is more restrictive. Some would say saner.

If you love functional programming, Scala has more of its features than Kotlin. Type classes are a functional programming feature. As another example, Scala supports currying and partial application, which are ways to break down functions that take multiple arguments. This provides additional flexibility in using argument lists. Kotlin provides ways to do the same things, but they might not be as mathematically elegant.

People who have learned Scala thoroughly love it. It takes more effort, but it lets developers do things they can’t do in Kotlin. Kotlin adherents often find that much flexibility more confusing than useful.

Practical considerations

Sometimes the realities of what you’re trying to do are the main factor. You need to pick the language that will let you do the job, even if you don’t like it quite as much. If you ‘re going to do Android development, Kotlin is the only choice. Android doesn’t use Oracle’s JVM, so you can’t use any old JVM-compatible language. Kotlin has the tools for compiling, debugging, and running software on Android. It’s built into Android Studio, starting with version 3.0.

Outside Android, Kotlin’s options are more limited. Are you committed to Eclipse for your IDE? You can use it to work with both languages, up to a point. The Scala IDE for Eclipse is more mature than the Kotlin plugin, which is a bit painful to set up. Some users have reported trouble getting the latter to work. The situation for NetBeans is similar. With Kotlin’s growing popularity, the situation may be more equal in a year or two. If you like working from the command line, the IDE situation isn’t an issue, and Kotlin has all the necessary tools.

Kotlin is still maturing, but many Java people find adopting it is an easier transition than Scala is. The one that works best for your needs will depend on your personal style and your practical aims. Look at both carefully before making a decision.

If you enjoyed this article please share it! This is the biggest compliment you can give a writer! Thank you!

The Simply Deep, Yet Convoluted World of Supervised vs Unsupervised Learning — September 6, 2017

The Simply Deep, Yet Convoluted World of Supervised vs Unsupervised Learning

Artificial intelligence (AI) is a lot like life’s relationships. Sometimes what you put into it is pretty straightforward, leading to the output or outcome that you wanted. Other times, let’s just say, the process gets a bit more convoluted and sometimes the outcome isn’t exactly what you envisioned. In other words, you may input the same into both relationships, but different paths lead you to different results. Nevertheless, both are learning processes. In the AI world, this is called supervised and unsupervised deep learning–and like most relationships, the shortest distance between what you input to what you get as output isn’t always the proverbial straight line.

What is Deep Learning?

Before we delve into what supervised and unsupervised deep learning is, you should know that deep learning evolved from a process called machine learning. Machine learning employs an algorithm, or set of rules, that creates output without specific programming. Think about how social networking mines data from your posts. For instance, you go out to eat with your friends at your favorite sushi place and share facts online about your experience–what you loved, found distasteful, photos, would you return–once you input these into your social network, an algorithm picks up tidbits about your input to extract patterns about what you like, don’t like, even what you look like based upon your pictures. The algorithm may discover that you are around 23 years old, eat out at this particular type of restaurant twice a month with your friends and like California rolls over eel sushi. It then sends you ads based upon that data. Machine learning iteratively gleans information about input despite not being told how to do so or where to look for that information.

Deep learning kicks it up a notch. It takes your input, finds that it can either categorize it without issue (supervised) or clusters unlabeled information, attempting to categorize it so that it makes sense (unsupervised), before taking that input and creating some sort of viable output. It’s a layered architecture making sense of data that can be quite abstract from one layer to another. That’s how deep learning emulates the multi-faceted complexity of the human brain–its neural pathways processing copious amounts of information that doesn’t make sense until it does (or not).

Supervised Deep Learning: The KISS Pathway That Leads To The Expected

What happens when your supervisor’s hanging over your shoulder at work? Like most, it drives you batty, so you tend to take the path of least resistance to find the most non-challenging way to get your job done quickly and still meet the expectations of your supervisor, right? Let’s say that particular supervisor trained you to process credit applications. Said supervisor knows what’s in those applications and that the expected outcome of any application is approval or not approval. You learned from your training set how to function in the best way to get to the desired outcome, i.e. the results that your supervisor needs. Supervised deep learning is like that. We humans tend to process in a specific hierarchy: we take in life’s input and based on our experiences (training), we organize that input so that our prior knowledge can make sense of it, process it to some expected outcome. Supervised deep learning belongs to that Keep-It-Simple-Stupid (KISS) pathway, that literal path of least resistance leading to some fulfilled expectation.

Supervised deep learning is well suited for decision-making: take our credit card example for instance. The bank takes your application and runs it against its categories of risk before taking action for or against approving you. Here’s the procedural gist:

Application is input from customer

The bank inputs data from application into the algorithm

The algorithm notices from past applications that data follows certain pathways (modeling)

For example: marital status–single, married, divorced, widowed all have a yes or no answer

The algorithm takes that application data, the yes or no answers, as determined by the bank and follows its flowchart (pathway rules)

Data flows through that pathway as the algorithm decides which of the primary categories of approved and not approved the data belongs in

The expected decision of approved or not approved is rendered

The customer is approved and is a happy camper or is not approved and wonders how to fix his credit score (had to throw that in).

Supervised deep learning is more than your typical lights on, lights off binary function. The algorithm classifies criteria into the bank’s expectation of risk, processing that risk into one of two decisions. This method of classification is known as binomial classification (two choices) or multi-class (more than two choices).

Unsupervised Deep Learning: An Exploratory Journey To Figuring Out the Unknown

If supervised deep learning is a path to expected output, unsupervised deep learning takes that same input and attempts to make sense of it before eschewing some output. Let’s take a trip to the art museum with your best friend as an example. You both become captivated with a painting of a rose. One of you sees it rather literally, the other sees it figuratively. To you, a rose is a just a rose and you want to move on to the Van Gogh exhibit. To your friend that rose is yellow when it should be red and your friend cannot figure out why the painting denotes friendship and not love. There’s no Van Gogh until there’s ready to go–and that’s not happening until your friend muses about that rose and why her current relationship is hanging on that museum wall.

Unsupervised deep learning has no target, no expectation from the input. It relies on exploring layers of possibilities to get to some conclusion. While you can move on to the Van Gogh exhibit, your friend struggles to figure out how to classify all the many pathways friendship and love can take someone from convolution to happy life and how one can learn from their mistakes.

Decision Time: If You Knew Then What You Know Now

Humans are subjectively sentient creatures with decision-making processes that cater more to the unexpected (unsupervised deep learning) than to the expected (supervised deep learning). Computers don’t have the human factor. They don’t have experiences. They just have data sets, functions, and “thinking” based on layers of pooling information together in either ordered or non-ordered ways.

As neural nets and AI become more complex, so do the deep learning algorithms. You can choose among supervised, unsupervised or a combo-pack of deep learning to tackle anything from credit approvals to the complexities of mind-boggling, robotic data sets. Remember the social networking example? When you uploaded images, something called Convolutional Neural Networks (CNN) picked out traits before it came to the conclusion that you around 23, pooling together relevant data: restaurant, friends laughing, friends frowning, facial recognition, background recognition. Combine and categorize those subgroups and your image spoke volumes about who you are and how you live. Imagine what they’d unpack from what you say on an uploaded video (Recurrent NN)? Yet, sometimes life has to unfold unsupervised by knowns, reconstructing (autoencoding) the data-driven universe while self-organizing maps translate often nebulous data patterns into two-dimensions (think topographic maps) that allow you further muse as to why that rose by any other name is just backpropagation (wink).

Understanding Recurrent Neural Networks: The Preferred Neural Network for Time-Series Data — June 26, 2017

Understanding Recurrent Neural Networks: The Preferred Neural Network for Time-Series Data

Artificial intelligence has been in the background for decades, kicking up dust in the distance, but never quite arriving. Well that era is over. In 2017, AI has broken through the dust cloud and arrived in a big way. But why? What’s the big deal all of a sudden? And what do recurrent neural networks have to do with it? Well, a lot, actually. Thanks to an ingenious form of short-term memory that is unheard of in conventional neural networks, today’s recurrent neural networks (RNNs) have been proving themselves as powerful predictive engines. When it comes to certain sequential machine learning tasks, such as speech recognition, RNNs are reaching levels of predictive accuracy, time and time again, that no other algorithm can match. However, the first generation of RNNs, back in the day, were not so hot. They suffered from a serious setback in their error-tweaking process that held up their progress for decades. Finally, a major breakthrough came in the late 90s that led to a new generation of far more accurate RNNs. Building on that breakthrough for nearly twenty years, developers refined and perfected their new RNNs until all-star apps such as Google Voice Search and Apple’s Siri started snatching them up to power key processes. Now recurrent networks are showing up everywhere, and are helping to ignite the AI renaissance that’s unfolding right now.

Neural Networks That Cling to the Past

Most artificial neural networks, such as feedforward neural networks, have no memory of the input they received just one moment ago. For example, if you provide a feedforward neural network with the sequence of letters “WISDOM,” when it gets to “D,” it has already forgotten that it just read “S.” That’s a big problem. No matter how hard you train it, it will always struggle to guess the most likely next character: “O.” This makes it a rather crappy candidate for certain tasks, such as speech recognition, that greatly benefit from the capacity to predict what’s coming next. Recurrent networks, on the other hand, do remember what they’ve just encountered, and at a remarkably sophisticated level.

Let’s take the example of the input “WISDOM” again and apply it to a recurrent network. The unit, or artificial neuron, of the RNN, upon receiving the “D” also takes as its input the character it received one moment ago, the “S.” In other words, it adds the immediate past to the present. This gives it the advantage of a limited short-term memory that, along with its training, provides enough context for guessing what the next character is most likely to be: “O.”

Tweaking and Re-tweaking

If you like to get into the weeds, this is where you get excited. Otherwise, get ready for a rough patch. But hang in there, it’s worth it. Like all artificial neural networks, the units of an RNN assign a matrix of weights to their multiple inputs, then apply a function to those weights to determine a single output. However, recurrent networks apply weights not only to their present inputs, but also to their inputs from a moment ago. Then they adjust the weights assigned to their present and past inputs through a process that involves two key concepts that you’ll definitely want to know if you really want to get into AI: gradient descent and backpropogation through time (BPTT).

Gradient Descent

One of the most famous algorithms in machine learning is known as gradient descent. Its primary virtue is its remarkable capacity to sidestep the dreaded “curse of dimensionality.” This issue plagues systems, such as neural networks, with far too many variables to make a brute-force calculation of their optimal values possible. Gradient descent, however, breaks the curse of dimensionality by zooming in on the local low-point, or local minimum, of the multi-dimensional error or cost function. This helps the system determine the tweaked value, or weight, to assign to each of the units in the network, bringing accuracy back in line.

Backpropogation Through Time

The RNN trains its units by adjusting their weights following a slight modification of a feedback process known as backpropogation. Okay, this is a weird concept. But if you’re into AI, you’ll learn to love it. The process of backpropogation works its way back, layer by layer, from the network’s final output, tweaking the weights of each unit, or artificial neuron, according to the unit’s calculated portion of the total output error. Got it? If so, get ready for one more layer of complexity. Recurrent neural networks use a heavier version of this process known as backpropogation through time (BPTT). This version extends the tweaking process to include the weight of the T-1 input values responsible for each unit’s memory of the prior moment.

Yikes: The Vanishing Gradient Problem

Despite enjoying some initial success with the help of gradient descent and BPTT, many artificial neural networks, including the first generation of RNNs, eventually ran out gas. Technically, they suffered a serious setback known as the vanishing gradient problem. Although the details fall way outside the scope of this sweeping overview, the basic idea is pretty straightforward. First, let’s look at the notion of a gradient. Like its simpler relative, the derivative, you can think of a gradient as a slope. In the context of training a deep neural network, the larger the gradient, the steeper the slope, the more quickly the system can roll downhill to the finish line and complete its training. But this is where developers ran into trouble — their slopes were too flat for fast training. This was particularly problematic in the first layers of their deep networks, which are the most critical when it comes to proper tweaking of memory units. Here the gradient values got so small, and their corresponding slopes so flat, that one could describe them as “vanishing,” thus the vanishing gradient problem. As the gradients got smaller and smaller, and thus flatter and flatter, the training times grew unbearably long. It was an error-correction nightmare without end.

The Big Breakthrough: Long Short-Term Memory

Finally, in the late 90s, a major breakthrough solved the vanishing descent problem and gave a second wind to recurrent network development. At the center of this new approach were units of long short-term memory (LSTM).

As weird as that sounds, the long and short of it is that LSTM made a world of difference in the field AI. These new units, or artificial neurons, like the standard short-term memory units of RNNs, remember their inputs from a moment ago. However, unlike standard RNN units, LSTMs can hang on to their memories, which have read/write properties akin to memory registers in a conventional computer. Yet LSTMs have analog, rather than digital, memory, making their functions differentiable. In other words, their curves are continuous and you can find the steepness of their slopes. So they are a good fit for the partial differential calculus involved in backpropogation and gradient descent.

Altogether, LSTMs can not only tweak their weights, but retain, delete, transform and otherwise control the inflow and outflow of their stored data according to the quirks of their training. Most importantly, LSTMs can cling to important error information for long enough to keep gradients relatively steep and thus training periods relatively short. This wipes out the vanishing gradient problem and greatly improves the accuracy of today’s LSTM-based recurrent networks. Thanks to this remarkable improvement in the RNN architecture, Google, Apple and many other leading companies, not to mention startups, are now using RNNs to power applications at the center of their businesses. In short, RNNs are suddenly a big deal.

What to Remember about RNNs

Let’s recap the highlights of these amazing memory machines. Recurrent neural networks, or RNNs, can remember their former inputs, which gives them a big edge over other artificial neural networks when it comes to sequential, context-sensitive tasks such as speech recognition. However, the first generation of RNNs hit the wall when it came to their capacity to correct for errors through the all-important twin processes of backpropogation and gradient descent. Known as the dreaded vanishing gradient problem, this stumbling block virtually halted progress in the field until 1997, when a major breakthrough introduced a vastly improved LSTM-based architecture to the field. The new approach, which effectively turned each unit in a recurrent network into an analogue computer, greatly increased accuracy and helped lead to the renaissance in AI we’re seeing all around us today.

If you have enjoyed this post, the biggest compliment you could give would be to share this with someone that you think would enjoy it!

If you would like to see more articles like this, click the subscribe button and never miss a post. Have a great day and never stop learning!

12 Most Influential Books Every Software Engineer Needs to Read — June 21, 2017
How do Computers See? — June 19, 2017

How do Computers See?

(This is part 3 in a series of posts on artificial intelligence and deep learning/neural networks. You can check out part 1 and part 2 if you haven’t yet read them and are new to AI)

There was a time when artificial intelligence was only home to our most creative imaginations. Yet, isn’t that where technology is born? In our imaginative minds? Though it is tempting to simply jump right into the technological advances that are the driving forces behind AI, we must first take a trip back in time and gander at how far we have come since Samuel Butler first wrote in 1906,

“There is no security against the ultimate development of mechanical consciousness, in the fact of machines possessing little consciousness now. A jellyfish has not much consciousness. Reflect upon the extraordinary advance which machines have made during the last few hundred years, and note how slowly the animal and vegetable kingdoms are advancing. The more highly organized machines are creatures not so much of yesterday, as of the last five minutes, so to speak, in comparison with past time.”

Since the first play written by Karel Capek in 1920, which depicted a race of self-replicating robot slaves who rose up and revolted against their human masters, to the most recent Star Trek character named Data, humans have always imagined the day machines would become intelligent.

Today, not only is AI a reality, but it is changing the very way we live and work. From AI in autonomous vehicles, which allow them to locate each other, to Google’s AI Voice Assistant, we are unwittingly surrounded by artificial intelligence. The question most ask is, “How does it all work?”

I could not answer that in one article. I will, however, try to cover a small subset of AI today that has given computers an ability most humans take for granted, but would greatly miss if it were taken away…the power of sight!

The Problem

Why has recognizing an image been so hard for computers and so easy for humans? The answer boils down to the algorithms used for both. Algorithms? Wait, our brains don’t have algorithms, do they??

I, and many others, do believe our brains have algorithms…a set of laws (physics) that are followed, which allow our brain to take data from our senses and transform it into something our consciousness can classify and understand.

Computer algorithms for vision have been nowhere near as sophisticated as our biological algorithms. That is until now.

Artificial Neural Networks Applied to Vision

(If you haven’t been introduced to neural networks yet, please check out this post first to get a quick introduction to the amazing world of ANNs)

Artificial neural networks (ANNs) have been around for awhile now, but recently a particular type of ANN has broken records for computer vision competitions and changed what we thought was possible in this problem space. We call this type of ANN a convolutional neural network.

Convolutional Neural Networks

Convolutional neural networks, also known as ConvNets or CNNs, are among the most effective computational models for performing certain tasks, such as pattern recognition. Yet, despite their importance to aspiring developers, many struggle with understanding just what CNNs are and how they work. To penetrate the mystery, we will work with the common application of CNNs to computer vision, which begins with a matrix of pixels. Then we’ll go layer by layer, and operation by operation, through the CNN’s deep structure, finally arriving at its output: the identification of a cloud, cat, tree, or whatever the CNNs best guess is about what it’s witnessing.

High-Level Architecture of a CNN

CNNArchitecture

source: ResearchGate.com

Here you can see the conceptual architecture of a typical (simple) CNN. To come up with a reasonable interpretation of what it’s witnessing, a CNN performs four essential operations, each corresponding to a type of layer found in its network.

These four essential operations (illustrated above) in a CNN are:

  1. The Convolution Layer
  2. The ReLU activation function
  3. Pooling/subsampling Layer
  4.  Fully Connected ANN (Classification Layer)

The input is passed through each of these layers and will be classified in the output. Now let’s dig a little bit deeper into how each of these layers works.

The Input: A Matrix of Pixels

To keep things simple, we’ll only concern ourselves with the most common task CNNs perform: pattern or image recognition. Technically, a computer doesn’t see an image, but a matrix of pixels, each of which has three components: red, green and blue. Therefore, a 1,000-pixel image for us will have 3,000 pixels for a computer. It will then assign a value, or intensity, to each of those 3,000 pixels. The result is a matrix of 3,000 precise pixel intensities, which the computer must somehow interpret as one or more objects.

The Convolution Layer

The first key point to remember about the convolutional layer is that all of its units, or artificial neurons, are looking at distinct, but slightly overlapping, areas of the pixel matrix. Teachers and introductory texts often use the metaphor of parallel flashlight beams to help explain this idea. Suppose you have a parallel arrangement of flashlights with each of the narrow beams fixated on a different area of a large image, such as a billboard. The disk of light created by each beam on the billboard will overlap slightly with the disks immediately adjacent to it. The overall result is a grid of slightly overlapping disks of light.

featureMap

source: i.stack.imgur.com/GvsBA.jpg

The second point to remember about the convolution layer is that those units, or flashlights if you prefer, are all looking for the same pattern in their respective areas of the image. Collectively, the set of pattern-searching units in a convolutional layer is called a filter. The method the filter uses to search for a pattern is convolution.

The complete process of convolution involves some rather heavy mathematics. However, we can still understand it from a conceptual point of view, while only touching on the math in passing. To begin, every unit in a convolutional layer shares the same set of weights that it uses to recognize a specific pattern. This set of weights is generally pictured as a small, square matrix of values. The small matrix interacts with the larger pixel matrix that makes up the original image. For example, if the small matrix, technically called a convolution kernel, is a 3 x 3 matrix of weights, then it will cover a 3 x 3 array of pixels in the image. Naturally, there is a one-to-one relationship, in terms of size, between the 3 x 3 convolution kernel and the 3 x 3 section of the image it covers. With this in mind, you can easily multiply the weights in the kernel with the counterpart pixel-values in the section of the image at hand. The sum of those products, technically called the dot product, generates a single pixel value that the system assigns to that section of the new, filtered version of the image. This filtered image, known as the feature map, then serves as the input for the next layer in the ConvNet described below.

It’s important to note at this point that units in a convolutional layer of a ConvNet, unlike units in a layer of a fully-connected network, are not connected to units in their adjacent layers. Rather, a unit in a convolutional layer is only connected to the set of input units it is focused on. Here, the flashlight analogy is again useful. You can think of a unit in a convolutional layer as a flashlight that bears no relation to the flashlights ahead of it, or behind it. The flashlight is only connected to the section of the original image that it lights up.

The ReLU Activation Function

The rectified linear unit, or ReLU, performs the rectification operation on the feature map, which is the output of the convolution layer. The rectification operation introduces real-world non-linearity into the CNN in order to properly train and tune the network, using a feedback process known as back-propagation. Introducing non-linearity is important and powerful in neural networks to model problems (input parameters) that are inherently nonlinear by nature.relufamily

source: datasciencecentral.com

Above you can see three different implementations of a ReLU activation function (the most basic being just the ReLU). Different ReLUs are used in different problems to better break the linearity of input parameters most accurately.

The Pooling Layer

The more intricate the patterns the CNN searches for, the more convolution and ReLU layers are necessary. However, as layer after layer is progressively added, the computational complexity quickly becomes unwieldy.

pooling

source: wiki.tum.de

Another layer, called the pooling or subsampling layer, is now needed to keep the computational complexity from getting out of control. The pooling layer’s essential operation involves restricting the number of patterns the CNN concentrates on, isolating only the most relevant information for the purposes at hand.

The Classification Layer

Finally, the CNN requires one or more layers to classify the output of all previous layers into categories, such as cloud, cat, or tree.

The most obvious characteristic that distinguishes a classification layer from other layers in a CNN is that a classification layer is fully-connected. This means that it resembles a classic neural network (which we discussed in part 2), with the units in each layer connected to all of the units in their adjacent layers. Accordingly, classification layers often go by the name fully-connected layers, or FCs.

Depth and Complexity

Most CNNs are deep neural networks, meaning their architecture is quite complex, with dozens of layers. You might have, for example, a series of four alternating convolution and ReLU layers, followed by a pooling layer. Then this entire series of layers might, in turn, repeat several times before introducing a final series of fully-connected layers to classify the output.

Unraveling the Mystery of CNNs

Convolutional neural networks are deep, complex computational models that are ideal for performing certain tasks, such as image recognition.

carExample

source: computervisionblog.com

To understand how a CNN recognizes a pattern in an image, it’s valuable to go step by step through its operations and layers, beginning with its input: a matrix of pixel values. The first layer is the convolution layer, which uses the convolution operation to multiply a specific set of weights, the convolution kernel, by various sections of the image in order to filter for a particular pattern. The next layer is the ReLU layer, which introduces nonlinearity into the system to properly train the CNN. There may be a series of several alternations between convolution and ReLU layers before we reach the next layer, the pooling layer, which restricts the output to the most relevant patterns. The entire series of convolution, ReLU and pooling layers may, in turn, repeat several times before we reach the final classification layer. These are fully-connected layers that classify the CNNs output into likely categories, such as cloud, cat, tree, etc.

architectureEmergent

source: mdpi.com

This is just a high-level look at how a typical CNN is architected. There may be many variations that experts will use in practice to tune their network for their particular use cases. This is where the expertise comes into play. You may need to “tune” your network if the initial training does not produce as accurate of results as you had hoped. This process is called “Hyperparameter Tuning” and I will have to write another whole article just covering that. For now, familiarize yourself with the basics of ANNs and CNNs and come back soon to read about hyperparameter tuning in the near future!

As always, thanks so much for reading! Please tell me what you think or would like me to write about next in the comments. I’m open to criticism as well!

If you take the time to “like” or “share” the article, that would mean a lot to me. I write for free on my own time because I enjoy talking about technology and the more people that read my articles, the more individuals I get to geek out with!

Thanks and have a great day!