GUIDE
Have you ever encountered the term SQL during your data analysis journey? If you’re a marketer, you might assume it’s a high-level concept used exclusively by data scientists. Well, it’s time to dispel that myth!
Becoming data-oriented is a mark of successful marketers, and SQL is your primary tool to efficiently fetch data from databases.
Is your organization storing its data in databases? Then, mastering SQL is inevitable. Don't fret! You're at the right place to embark on this exciting journey.
Why Should You Learn SQL?
SQL, often pronounced as “sequel”, stands for Structured Query Language. It becomes handy when organizations have massive amounts of data they need to manipulate. The incredible thing about SQL is its universality - any employee of a company that stores data in a relational database can use it (and it's likely yours does too!).
For instance, if you work for a tech startup and you need to retrieve user engagement data, SQL can help you. If you are part of a web development team for an e-commerce site and want to analyze purchase data, SQL can guide you. These are just a few examples among the vast applications of SQL.
Consider this scenario: you have a hefty dataset that you open in Excel, only for your computer to slow down or crash? With SQL, you can selectively access portions of your data, meaning you won't need to download everything into a CSV, manipulate it, and risk overloading Excel. Essentially, SQL takes on the data analysis tasks that you might be accustomed to doing in Excel.
Mastering Basic SQL Queries
Before we dive in, ensure you have a database management application that allows you to retrieve data from your database. MySQL or Sequel Pro are some choices.
Download the suitable software and consult your IT team on how to connect to your database. The option depends on the back end of your product, so get advice from your product team on which one to choose.
Understanding Your Database StructureFirst, it's crucial to familiarize yourself with your database and its structure. If your data is spread across multiple databases, you'll need to focus on the specific location that houses the data you need.For instance, imagine you're dealing with multiple databases containing data about US residents. Use the command “SHOW DATABASES;”. The result might show you have different databases for various regions, including one for the Midwest.Each database has multiple tables holding the data you need. Let's assume we want to find out the data contained in one of the databases. We can use the command “SHOW TABLES in Midwest;” and find tables for each state in the Midwest region: people_Illinois, people_Indiana, people_Michigan, people_Ohio, and so on.You also need to understand what fields are present in the tables. Fields refer to specific chunks of data you can extract from your database. For instance, if you want to fetch someone's address, the field name might not be just "address". It could be broken down into address_city, address_state, address_zip. To figure this out, use the command “Describe people_Illinois;”. This command will display a list of all data that you can extract using SQL.
Here’s a brief recap of the database structure using the Midwest example:
* Our database is: Midwest.
* The tables within that database include: people_Illinois, people_Indiana, people_Michigan, people_Ohio, etc.
* The fields within the people_Illinois table might be: address_city, address_state, address_zip, hair_color, age, first_name, last_name.
Now, let's compose some basic SQL queries to extract data from our Midwest database.
Understanding SQL Queries
Let's pose a different question as a case study: Who are the individuals from Illinois, aged between 18 and 25, with red hair, listed in alphabetical order by last name?
To answer this question, we'll construct an SQL query:
SELECT
This clause is used to select the columns that will be displayed in the output. In this case, we're interested in individuals' first and last names.
FROM
The FROM clause specifies the table from which we want to extract data. For instance, we want to retrieve data from the "people_Illinois" table.
WHERE
The WHERE clause is used to filter data based on certain conditions. For example, we want to find people who have red hair.
BETWEEN
The BETWEEN keyword is used in SQL to filter the result within a specific range. In this context, it can be used to filter individuals between certain ages, for example, aged 18 and 25.
AND
The AND operator allows us to add multiple conditions in our WHERE clause. In this case, we want people with red hair who are between the ages of 18 and 25.
OR
The OR operator is used in our WHERE clause when we want to fetch records that match either condition. It can be used to fetch records of individuals who have red hair or are between the ages of 18 and 25.
NOT
The NOT operator is used to exclude records where the condition is true. It can be used to fetch all records where the hair color is not red.
ORDER BY
The ORDER BY clause helps us to sort our data in ascending or descending order. Here, we're sorting people by their last names.
GROUP BY
This clause is used to group rows that have the same values in specified columns into aggregated data. In our case, let's group the result by hair_color.
LIMIT
If you want to limit the number of records returned by the query, you can use the LIMIT clause. Here, we're limiting our result to 100 records.
Please note, the description and use cases for the GROUP BY clause are for grouping results that have the same value for specific columns, usually with an aggregate function (like COUNT, AVG, etc.). It's not typically used for eliminating duplicates. The LIMIT clause, on the other hand, is a good practice when running exploratory queries on large datasets to ensure the output doesn't overwhelm your resources.
INSERT INTO
In addition to retrieving information from a relational database, SQL can also be used to modify the contents of a database. Of course, you’ll need permissions to make changes to your company’s data. But, in case you’re ever in charge of managing the contents of a database, we’ll share some queries you should know.
First is the INSERT INTO statement, which is for putting new values into your database. If we want to add a new person to the Massachusetts table, we can do so by first providing the name of the table we want to modify, and the fields within the table we want to add to. Next, we write VALUES with each respective value we want to add.
Here’s what that query could look like:
Alternatively, if you are adding a value to every field in the table, you don’t need to specify fields. The values will be added to columns in the order that they are listed in the query.
If you only want to add values to specific fields, you must specify these fields. Say we only want to insert a record with first_name, last_name, and address_state — we can use the following query:
The next command is UPDATE, which is used if you want to replace existing values in your database with different values. For example, if someone is recorded in the database as having red hair when they actually have brown hair, we can update this record with UPDATE and WHERE statements:
Or, if there’s an inconsistency in your table where some values for “address_state” appear as “Massachusetts” and others appear as “MA”. To change all instances of “MA” to “Massachusetts”, we can use the following query to update multiple records at once:
Be cautious when using UPDATE. If you don’t specify which records to change with a WHERE statement, all values in the table will be altered.
DELETE is another SQL statement that removes records from your table. Like with UPDATE, it's important to include a WHERE statement, so you don’t accidentally delete the entire table. If we discover that several records in our people_massachusetts table actually live in Maine, we can delete these entries quickly by targeting the address_state field, like so:
Bonus: Advanced SQL Tips
Now that you’ve learned the basic SQL commands for managing a database, let's discuss some other tricks that can be useful for your queries.
* (asterisk)
When you add an asterisk character to your SQL query, it instructs the query to include all columns of data in the results. If your table has many columns, instead of typing all column names in the SELECT statement, you can simply replace them with an asterisk.
% (percent symbol)
The percent symbol is a wildcard character, used to represent one or more characters in a database value. For instance, if you want to find every person in your table whose zip code begins with “02”, you can use the LIKE operator with the percent symbol.
LAST 30 DAYS
When you frequently need to find data from the last 30 days, instead of manually specifying the date range, you can use SQL to automatically span the last 30 days, regardless of the current date.
COUNT
COUNT is used to tally the number of times a certain condition is met in a field. For instance, you can use COUNT to calculate the number of times different hair colors appear in your data.
AVG, SUM, MIN, MAX
These are some basic calculation functions you can use in SQL. AVG calculates the average, SUM calculates the total, MIN and MAX return the smallest and largest values respectively.
JOIN
JOIN is used when you need to combine information from two different tables in one SQL query. This is similar to using the VLOOKUP formula in Excel.
CASE
The CASE statement allows you to return different results based on different conditions. If no conditions are met, the query will return NULL.
Congratulations! Now you're equipped with the basic SQL queries to start managing your database. With these basics, you'll be able to handle more complex queries. Happy querying!
7 Days Free Trial
Sign up for a free trial today and see how AI2sql can help you!