BigQuery - Generate Unique Values
insights Stats
Data analytics with Google Cloud Platform (GCP). For BigQuery SQL mentioned in the articles, they are all using standard SQL dialect unless specified differently.
In SQL Server or other database systems, IDENTITY, GUID or other autoincrement features are presenting to support generating a unique value for each row. In BigQuery, the support is very limited. We can use UUID or numbering functions to implement similar ones.
GENERATE_UUID
This function returns a random UUID as STRING. It is similar as function NEWID in SQL Server. The returned string is in the form xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.
Example:
SELECT GENERATE_UUID() AS ID;
Sample output (in JSON format):
[ { "ID": "d51e9e92-561e-4746-9b41-ed6c6df22b7d" } ]
ROW_NUMBER function
We can also use numbering functions like ROW_NUMBER to generate a sequential unique numbers when saving data to BigQuery tables.
Example:
SELECT *, ROW_NUMBER() OVER (ORDER BY CustomerID) AS ROW_NUM FROM `test.dim_customer`;
Sample output (in JSON format):
[ { "CustomerID": "77878d7f-4e4f-4633-b7c9-796229e06cd3", "CustomerNumber": "002", "Name": "Jason", "IsCurrent": true, "StartDate": "2021-02-01", "EndDate": "2021-03-12", "ROW_NUM": "1" }, { "CustomerID": "cb68b654-8f29-4302-81a0-8929154a241c", "CustomerNumber": "003", "Name": "Celia", "IsCurrent": true, "StartDate": "2021-03-13", "EndDate": "9999-12-31", "ROW_NUM": "2" }, { "CustomerID": "e42a8b6e-e558-43fe-b5c2-378e681652bf", "CustomerNumber": "001", "Name": "Raymond", "IsCurrent": false, "StartDate": "2021-01-01", "EndDate": "2021-03-12", "ROW_NUM": "3" }, { "CustomerID": "e42a8b6e-e558-43fe-b5c2-378e681652bf", "CustomerNumber": "001", "Name": "Ray", "IsCurrent": true, "StartDate": "2021-03-13", "EndDate": "9999-12-31", "ROW_NUM": "4" } ]
References
UUID functions in Standard SQL | BigQuery | Google Cloud
Numbering functions in Standard SQL | BigQuery | Google Cloud