When sum(num_logins) over (partition by user_id order by date rows between 14 preceding and current row) >= 1 then TRUE The case statement classifies the user as active or not just as before. It first partitions the table by email, then evaluates over a set of rows - in this case we’re looking at a specific date range. In line 5, the query sums num_logins for the user’s previous 14 days. If you have millions or billions of rows (lucky you), the self join will take a long time to compute. With window functions, we can explicitly aggregate only over rows that we care about with just a single pass through the data. The self join works great, but modern databases have a more efficient way to get the same results. In Wagon, we can create a graph of the number of 7 day active users over time. This result can be aggregated further, filtered for specific dates, used to find inactive users, and joined with other data. This query generates a table that tells us which users are seven-day-active over time. I.date = (o.date :: date) - integer '7' AND The case statement identifies the user as active on that day if she had any logins in the prior week. In line 5, the query sums num_logins for those dates. The join condition requires emails to match and for the date to be within the last 7 days. How meta is that? For each row, we ask how many logins that user had in the last week. Approach 1: Self JoinĪ self join is when you join a table with itself. To use this data, you can create a temporary table, use a common table expression, or include it as a subselect. fill in the minimum date in your datasetĬurrent_date::timestamp, '24 hours') as dateįull outer join (select distinct(user_id) as user_id from userActivityTable) u on 1 = 1įull outer join userActivityTable c on u.user_id = c.user_id and c.date = d.date If your data isn’t already organized like this, you can generate a table with a row per day, per user, with the following query (this is Postgres syntax, for other databases, modify the generate_series function to generate a table of dates). In either approach, it’s helpful to have a table of logins per user for each day, even if the user didn’t login ( null in this example). Like calculating running totals, there are two approaches: self join or window function. How many users were active in the last week and month? date Writing this query in SQL gives you more control. How engaged are your users? How frequently do they visit your website or app? Analytics services like Google Analytics and MixPanel calculate basic counts of daily, weekly, and monthly active users, but it’s difficult to customize or join these results with other data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |