Solving a LinkedIn Data Science SQL Interview Question
https://www.youtube.com/watch?v=d2920ysu2hQ&ab_channel=JayFeng
01:37
peepoThink (1)
- we could use
GROUP BYto see how many posting share the sameuid - we could then
SELECT WHEREcount == 1
(2) - use
DATE_TRUNCto group by year or - use WHERE to split each into 180 day intervals from
now()
02:13
visualize the output and work backwards, make an output table, split into intermediate tables to process
05:55
WITH user_job AS (
SELECT u_id, j_id, COUNT(DISTINCT date_posted) AS num_posted
FROM jobs
GROUP BY u_id, j_id
)
08:06
SELECT
SUM(CASE WHEN avg_posted > 1 THEN 1 END) AS posted_multiple
, SUM(CASE WHEN avg_posted = 1 THEN 1 END) AS posted_once
FROM (
SELECT
user_id, AVG(num_posted) AS avg_posted
FROM user_job
GROUP BY user_id
) AS t
10:41
HAVING won't work because it will filter values
Post Mortem
WITH - named functionCOUNT(DISTINCT field_name) - get count of distinct values in fieldGROUP_BY creates sub tables on different keysCASE WHEN - ifEND - end the ifELSE - elseAVG() - get average of field
Use intermediate tables to get to the final table