Как получить различные значения в группе CONCAT с помощью Google Big Query
Я пытаюсь получить различные значения при использовании GROUP_CONCAT в BigQuery.
я воссоздам ситуацию, используя более простой, статический пример:
EDIT: Я изменил пример, чтобы лучше представить мою реальную ситуацию: 2 столбца с group_concat, который должен быть отличным:
SELECT
category,
GROUP_CONCAT(id) as ids,
GROUP_CONCAT(product) as products
FROM
(SELECT "a" as category, "1" as id, "car" as product),
(SELECT "a" as category, "2" as id, "car" as product),
(SELECT "a" as category, "3" as id, "car" as product),
(SELECT "b" as category, "4" as id, "car" as product),
(SELECT "b" as category, "5" as id, "car" as product),
(SELECT "b" as category, "2" as id, "bike" as product),
(SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY
category
этот пример возвращает:
Row category ids products
1 a 1,2,3,1 car,car,car,truck
2 b 4,5,6 car,car,bike
Я хотел бы удалить дублированные значения, чтобы вернуться как:
Row category ids products
1 a 1,2,3 car,truck
2 b 4,5,6 car,bike
В MySQL, GROUP_CONCAT имеет отдельную опцию, но в BigQuery ее нет.
какие идеи?
2 ответов
вот решение, которое использует UNIQUE
функция агрегирования области для удаления дубликатов. Обратите внимание, что для его использования сначала нам нужно построить REPEATED
используя NEST
агрегация:
SELECT
GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD
FROM (
SELECT
category,
NEST(id) as ids,
NEST(product) as products
FROM
(SELECT "a" as category, "1" as id, "car" as product),
(SELECT "a" as category, "2" as id, "car" as product),
(SELECT "a" as category, "3" as id, "car" as product),
(SELECT "b" as category, "4" as id, "car" as product),
(SELECT "b" as category, "5" as id, "car" as product),
(SELECT "b" as category, "2" as id, "bike" as product),
(SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY
category
)
удаление дубликатов перед применением group_concat приведет к желаемому результату:
SELECT
category,
GROUP_CONCAT(id) as ids
FROM (
SELECT category, id
FROM
(SELECT "a" as category, "1" as id),
(SELECT "a" as category, "2" as id),
(SELECT "a" as category, "3" as id),
(SELECT "b" as category, "4" as id),
(SELECT "b" as category, "5" as id),
(SELECT "b" as category, "6" as id),
(SELECT "a" as category, "1" as id),
GROUP BY
category, id
)
GROUP BY
category