SQL has been my professional bread and butter for years. Over time, every analyst develops their own style, and I'm no exception. Every style works as long as it fits your analytical culture.
These aren't things they teach you in school, but they're invaluable in real-world scenarios. As someone who frequently interviews candidates, I can tell you that these practices clearly show you use SQL every day. They set you apart and demonstrate your expertise.
Here are some of the SQL habits I've developed over time that make the most sense to me:
One Field Per Line
Always put each field on a new line. This improves readability and allows for easy commenting.
Instead of this:
select id, title, content, author_id, published_date, category, tags from blog_posts where status = 'published';
Do this:
select
id,
title,
content,
author_id,
published_date,
category,
tags
-- views_count, -- Commented out for now, may be needed later
-- last_updated_date
from blog_posts
where status = 'published'
This approach not only makes your code more readable but also makes it easier to work with when using Language Models (LLMs) for code generation or analysis.
Use CTEs, Avoid Subqueries
Common Table Expressions (CTEs) are your friends. They make your code more readable and often more efficient than subqueries.
Instead of this:
select title, author_name
from blog_posts
where author_id in (
select author_id
from blog_authors
where country = 'Canada'
)
Do this:
with canadian_authors as (
select author_id
from blog_authors
where country = 'Canada'
)
select
blog_posts.title,
blog_authors.author_name
from blog_posts
join canadian_authors
on blog_posts.author_id = canadian_authors.author_id
join blog_authors
on blog_posts.author_id = blog_authors.author_id
Use Descriptive CTE Names
When naming your CTEs, be descriptive. This makes your code self-documenting and easier to understand.
Instead of this:
with a as (
select * from blog_views where view_date >= '2023-01-01'
),
b as (
select * from blog_posts where category = 'Technology'
)
select a.post_id, b.title, count(*) as view_count
from a join b on a.post_id = b.id
group by a.post_id, b.title
Do this:
with recent_views as (
select * from blog_views where view_date >= '2023-01-01'
),
tech_posts as (
select * from blog_posts where category = 'Technology'
)
select
recent_views.post_id,
tech_posts.title,
count(*) as view_count
from recent_views
join tech_posts
on recent_views.post_id = tech_posts.id
group by recent_views.post_id, tech_posts.title
Use Full Table Names Instead of Aliases
While aliases can be useful, using full table names can make your queries more immediately understandable, especially in complex joins.
Of course, when the table names are too long, you can always abbreviate them.
Instead of this:
select u.username, bp.title, c.comment_text
from users u
join blog_posts bp on u.user_id = bp.author_id
join comments c on bp.id = c.post_id
Do this:
select
users.username,
blog_posts.title,
comments.comment_text
from users
join blog_posts
on users.user_id = blog_posts.author_id
join comments
on blog_posts.id = comments.post_id
Use Lowercase for SQL Keywords
While some developers prefer uppercase SQL keywords, using lowercase can improve readability and reduce visual noise. It's a personal preference, but consistency is key.
Instead of this:
SELECT author_id, COUNT(*) AS post_count, AVG(word_count) AS avg_word_count
FROM blog_posts
WHERE published_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY author_id
HAVING COUNT(*) > 10
Do this:
select
author_id,
count(*) as post_count,
avg(word_count) as avg_word_count
from blog_posts
where published_date between '2023-01-01' and '2023-12-31'
group by author_id
having count(*) > 10
Use the Last CTE as Your Final Query
As a personal habit, you can use the last CTE as your final query. This approach helps in wrapping up your thoughts and makes the query structure clear.
with author_stats as (
select
author_id,
count(*) as post_count,
avg(word_count) as avg_word_count
from blog_posts
where published_date >= '2023-01-01'
group by author_id
),
prolific_authors as (
select *
from author_stats
where post_count > 50
),
final_result as (
select
users.username,
prolific_authors.post_count,
prolific_authors.avg_word_count,
count(distinct comments.id) as total_comments
from users
join prolific_authors
on users.user_id = prolific_authors.author_id
left join blog_posts
on users.user_id = blog_posts.author_id
left join comments
on blog_posts.id = comments.post_id
group by users.username, prolific_authors.post_count, prolific_authors.avg_word_count
)
select * from final_result
In this example, final_result
is both a CTE and the query we're actually running, making it clear where our analysis ends.
Conclusion
By following these best practices, you'll create SQL code that's not only more efficient but also easier to read and maintain.
Always do what works for you. Happy querying!