Как получить различные значения в группе CONCAT с помощью Google Big Query

Я пытаюсь получить различные значения при использовании GROUP_CONCAT в BigQuery.

я воссоздам ситуацию, используя более простой, статический пример:

EDIT: Я изменил пример, чтобы лучше представить мою реальную ситуацию: 2 столбца с group_concat, который должен быть отличным:

SELECT 
  category, 
  GROUP_CONCAT(id) as ids, 
  GROUP_CONCAT(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category

этот пример возвращает:

Row category    ids products
1   a   1,2,3,1 car,car,car,truck
2   b   4,5,6   car,car,bike

Я хотел бы удалить дублированные значения, чтобы вернуться как:

Row category    ids products 
1   a   1,2,3   car,truck
2   b   4,5,6   car,bike

В MySQL, GROUP_CONCAT имеет отдельную опцию, но в BigQuery ее нет.

какие идеи?

2 ответов


вот решение, которое использует UNIQUE функция агрегирования области для удаления дубликатов. Обратите внимание, что для его использования сначала нам нужно построить REPEATED используя NEST агрегация:

SELECT 
  GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
  GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD 
FROM (
SELECT 
  category, 
  NEST(id) as ids, 
  NEST(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category
)

удаление дубликатов перед применением group_concat приведет к желаемому результату:

    SELECT 
      category, 
      GROUP_CONCAT(id) as ids
    FROM (  
    SELECT category, id
    FROM 
     (SELECT "a" as category, "1" as id),
     (SELECT "a" as category, "2" as id),
     (SELECT "a" as category, "3" as id),
     (SELECT "b" as category, "4" as id),
     (SELECT "b" as category, "5" as id),
     (SELECT "b" as category, "6" as id),
     (SELECT "a" as category, "1" as id),
    GROUP BY 
      category, id
    )
    GROUP BY 
      category