The world of programming is always changing. And changing FAST at that. We are constantly finding better ways to do what it is that we do. That is a great thing. Iteration is a very powerful concept.
However, there are a few ideas and constructs in the computer science world that remain constant. Data structures and their applications are some of those things.
You would think then, that of all things, this would be something that every programmer or software engineer would understand then right? Well, you would be wrong.
I can tell you that out of college, I sure as heck didn’t understand when and why to use one data structure over the other, and what I’ve found out is, neither do many of the programmers now a days that learn programming by doing (code bootcamps, online courses that get your hands dirty, building software out of their basement).
To be honest, I remember thinking that they really weren’t that important. I thought that they were only needed in special cases and maybe for code that was writing for public frameworks or libraries.
Boy, was I wrong.
Understanding how to efficiently use data structures can easily separate a good developer from a bad one. However, really getting a firm grasp on them can be difficult for some people that have a harder time grasping such abstract concepts. Just try to read the defacto book on the subject cover to cover (“Introduction To Algorithms” – side note: I know it says “Algorithms” but it really covers how data structures are built to lend themselves to certain algorithms).
In that book, and many others like it, you will find many mathematical proofs and ideas that seem very theoretical and the abstract nature of it all really makes it difficult to understand the practical use cases when actually developing software.
So what is a programmer to do if he didn’t graduate with a degree in mathematics?
When many developers first realize how important data structures are (after trying to write a system that processes millions of records in seconds) they are often presented with books or articles that were written for people with computer science degrees from Stanford.
I want this article to bridge the gap and explain data structures in a more practical way. What I want people to take away from this post is an understanding of why we have different data structures, what they are, and when to use a particular one.
This is going to be a simple introduction, so I will cover the data structures that you will use 95% of the time and leave the other 5% for you to discover on your own.
Let’s get to it then!
First, we need to define what exactly is a data structure. Well, a bunch of smart people have thrown around a lot of complex sounding definitions, but the simplest and really the most accurate way to describe a data structure is to say that a data structure is a particular way of organizing data in a computer so that it can be used efficiently. That is all it is. It is just a way to organize data, much like they way humans organize their bookshelves. You want to organize them in a way that makes them easy to get to what you want.
For example, continuing with the bookshelf analogy, if I wanted to be able to quickly pick out all of the books that I own (let’s say hundreds) that start with the letter ‘T’ or ‘B’ or any other letter, then I would want to organize these books in a way that makes that tasks quick and easy to perform. In this example, it would mean organizing the books in alphabetical order. Simple enough.
However, if the way I was using the bookshelf was different (say I wanted to find all the books that pertained to the subject of physics) then quickly we can see that this organization of books will be problematic and cause me to be very inefficient in finding the books that I want.
The solution here would be to organize the books differently based on how we are going to retrieve them in the most common scenario. In the second scenario, we might have decided to organize the shelves according to the topic. The same goes for data structures and how you are going to typically interact with them.
So let’s start talking about the different ways we could organize our data…AKA the types of common data structures. To kick off this fun topic, we will start with one of my favorite data structures known as…
The Linked List: The building block of more complex data structures
Linked lists are among the simplest and most common data structures. They are also a great place to start because the linked list is a data structure that many other data structures use internally in their own structure.
Understanding linked lists will help your understanding of “pointers” (a value – usually written in hexadecimal – that represents the memory address of another value or start of a sequence of values) and how memory is stored and accessed by the system.
To make things easier, I’ve included a visual representation of what a linked list looks like conceptually.
Let’s walk through the picture above and then we will go through why storing data in this way might be a good idea in some cases and a bad idea in others.
First, you can see where the image says “START”. This is what is known as the ‘head’ of the linked list. It is just the starting point that says where the first ‘node’ is located in memory (address 1000 in this case). The ‘1000’ is a pointer to the location of the next node.
I know what you are thinking. “What the hell is a node??”. At least that is what I was thinking when I first read about linked lists. Well, to put it simply, a ‘node’ is just an object that has two fields. One field is the ‘Information’ field (where you would store some data that you are concerned about) and the other is a Pointer field. All the pointer field holds is the address location of the next node location. That’s it! It’s actually so ridiculously simple that many people over think it.
As you can see in the picture above, after we go to memory address 1000 we encounter a node. In this node, there are two fields we can see. In our case the first field that holds the ‘information’ is storing the character ‘A’. The second field (the Pointer field) is storing the location in memory to the next node (memory location 2000).
Next, we follow the pointer to memory location 2000. Once we arrive there, we encounter our second node. As you can see, this second node has ‘B’ as its information and ‘3000’ as its Pointer value. Nothing new here!
Finally, we follow the Pointer value to the memory location (3000) and come to yet another node. This node is almost the same as the previous two. It has a value of ‘C’ for its information field, but as you can see, it has nothing (or null) for its Pointer field. This just means that we have come to the end of our list. There are no more pointers to follow! This node (being the last node) is known as the ‘Tail Node’.
So what makes a Linked List valuable as a data structure and when and why might we choose to use one? Let’s cover that next.
The structure of a Linked List allows it to have many properties that make it different than other collection structures (such as an Array which I will cover in a later post). Because of its use of pointers instead of contiguous memory blocks Linked List are great for the following:
- When you need constant-time insertions/deletions from the list (such as in real-time computing where time predictability is absolutely critical)
- When you don’t know how many items will be in the list. With arrays, you may need to re-declare and copy memory if the array grows too big (once again, I will go into more detail about arrays in a later post.)
- When you don’t need random access to any elements
- When you want to be able to insert items in the middle of the list (such as a priority queue – another data structure I will cover down the road)
However, there are times when using a Linked List would be very inefficient and you would be better off with another collection data structure like an Array.
Don’t use a Linked List in these scenarios:
- When you need indexed/random access to elements
- When you know the number of elements in the array ahead of time so that you can allocate the correct amount of memory for the array
- When you need speed when iterating through all the elements in sequence. You can use pointer math on the array to access each element, whereas you need to lookup the node based on the pointer for each element in linked list, which may result in page faults which may result in performance hits.
- When memory is a concern. Filled arrays take up less memory than linked lists. Each element in the array is just the data. Each linked list node requires the data as well as one (or more) pointers to the other elements in the linked list.
As you can see, depending on what you are trying to do, Linked List may be a great data structure or a very inefficient one. This all goes back to understanding the pros and cons of each data structure and choosing the right ‘tool’ for the job.
Hopefully, this was a quick and simple introduction to why data structures are important to learn and shed some light on when and why Linked List are an important starting point for data structures.
Next, up I will be covering Arrays, Stacks, and Queues. Stay tuned!
If you can think of any better ways of explaining Linked Lists or why data structures are important to understand, leave them in the comments!
If you liked this post, please share it with others! That is the biggest compliment I could receive. Also, please subscribe to my blog (jasonroell.com) if you are a technology enthusiast! Have a great day and learn on!