TransWikia.com

Big Query Standard SQL Filtering by Comparing Multiple Dates for One Event to Minimum Dates of Other Events

Stack Overflow Asked by Cale Kanack on December 16, 2021

I have created a query where I have device identifiers with the MIN(first_login_datetime), MIN(first_purchase_datetime), and a list of push_datetimes (multiple pushes may have been received per device. See example table below for a single device.

I am attempting to filter out devices that received ANY push between the first_login_datetime and first_purchase_datetime. The device in question (see device_a in table below) has received pushes both between the first_login_datetime and the first_purchase_datetime, but also after the first_purchase_datetime. Because of this, I am not able to add a filter condition for "WHERE push_datetime NOT BETWEEN first_login_datetime AND first_purchase_datetime", as this removes the instances of the push_datetime that are between the two, but the device_a remains in my cohort since there were pushes received after the first_purchase_datetime, even though the device did still have a push between the two datetimes. I need to be able to completely remove this device (and any device that had a push between the two datetimes) from the query, any suggestions?

    device_id    first_login_datetime first_purchase_datetime push_datetime
1   device_a    2020-06-27T00:26:10 2020-07-02T19:35:05 2020-06-28T20:00:14.187559
2   device_a    2020-06-27T00:26:10 2020-07-02T19:35:05 2020-07-01T20:07:13.020859
3   device_a    2020-06-27T00:26:10 2020-07-02T19:35:05 2020-07-01T23:52:33.943492
4   device_a    2020-06-27T00:26:10 2020-07-02T19:35:05 2020-07-03T16:10:09.884610
5   device_a    2020-06-27T00:26:10 2020-07-02T19:35:05 2020-07-03T19:46:51.505925

2 Answers

I think a window function is the simplest (and fastest) solution:

select t.* except (bad_cnt)
from (select t.*,
             countif(push_datetime between first_login_datetime and first_purchase_datetime) over (partition by device_id) as bad_cnt
      from t
     ) t
where bad_cnt > 0

Answered by Gordon Linoff on December 16, 2021

You need to get the distinct devices that you want to remove, then left join against your original table.

with my_table as (select * from `project.dataset.table`),
devices_to_remove as (
  select distinct device_id
  from my_table
  where push_datetime between first_login_datetime and first_purchase_datetime
)
select my_table.* 
from my_table 
left join devices_to_remove using(device_id)
where devices_to_remove.device_id is null

This will remove all records from any devices that meet your elimination criteria.

Answered by rtenha on December 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP