This article provides example of reading data from Google BigQuery as pandas DataFrame.
Refer to Pandas - Save DataFrame to BigQuery to understand the prerequisites to setup credential file and install pandas-gbq package. The permissions required for read from BigQuery is different from loading data into BigQuery; so please setup your service account permission accordingly.
The source table used in this tutorial looks like the screenshot below:
About to_gbq function
In this tutorial, we directly use read_gbq function. Refer to pandas.read_gbq — pandas 1.2.3 documentation (pydata.org) for more details.
The signature of the function looks like the following:
pandas.read_gbq(query, project_id=None, index_col=None, col_order=None, reauth=False, auth_local_webserver=False, dialect=None, location=None, configuration=None, credentials=None, use_bqstorage_api=None, max_results=None, progress_bar_type=None)
Create a python script file
We start to create a python script file named pd-from-bq.py with the following content:
import pandas as pd from google.oauth2.service_account import Credentials # Define source table in BQ source_table = "YOUR_DATA_SET.pandas" project_id = "YOUR_PROJECT_ID" credential_file = "PATH_TO_YOUR_SERVICE_ACCOUNT_CREDENTIAL_FILE.json" credential = Credentials.from_service_account_file(credential_file) # Location for BQ job, it needs to match with destination table location job_location = "australia-southeast1" # Save Pandas dataframe to BQ df = pd.read_gbq("select * from "+source_table, project_id=project_id, location=job_location, credentials=credential) print(df)
Please change the variables accordingly to reflect your environment and BigQuery setup.
Run the script
Run the script using the following command:
The output looks like the following:
DATE TYPE SALES 0 2020-01-01 TypeA 1000 1 2020-01-01 TypeB 200 2 2020-01-01 TypeC 300 3 2020-02-01 TypeA 700 4 2020-02-01 TypeB 400 5 2020-02-01 TypeC 500 6 2020-03-01 TypeA 300 7 2020-03-01 TypeB 900 8 2020-03-01 TypeC 100