Raymond Raymond

Python: Read Data from BigQuery

event 2021-03-17 visibility 2,052 comment 0 insights toc
more_vert
insights Stats

This article provides example of reading data from Google BigQuery as pandas DataFrame.

Prerequisites

Refer to Pandas - Save DataFrame to BigQuery to understand the prerequisites to setup credential file and install pandas-gbq package. The permissions required for read from BigQuery is different from loading data into BigQuery; so please setup your service account permission accordingly. 

The source table used in this tutorial looks like the screenshot below:

2021031794639-image.png

About to_gbq function

In this tutorial, we directly use read_gbq function. Refer to pandas.read_gbq — pandas 1.2.3 documentation (pydata.org) for more details. 

The signature of the function looks like the following:

pandas.read_gbq(query, project_id=None, index_col=None, col_order=None, reauth=False, auth_local_webserver=False, dialect=None, location=None, configuration=None, credentials=None, use_bqstorage_api=None, max_results=None, progress_bar_type=None)

Create a python script file

We start to create a python script file named pd-from-bq.py with the following content:

import pandas as pd
from google.oauth2.service_account import Credentials

# Define source table in BQ
source_table = "YOUR_DATA_SET.pandas"
project_id = "YOUR_PROJECT_ID"
credential_file = "PATH_TO_YOUR_SERVICE_ACCOUNT_CREDENTIAL_FILE.json"
credential = Credentials.from_service_account_file(credential_file)
# Location for BQ job, it needs to match with destination table location
job_location = "australia-southeast1"

# Save Pandas dataframe to BQ
df = pd.read_gbq("select * from "+source_table, project_id=project_id,
                 location=job_location, credentials=credential)
print(df)

Please change the variables accordingly to reflect your environment and BigQuery setup.

Run the script

Run the script using the following command:

python ./pd-from-bq.py

The output looks like the following:

         DATE   TYPE  SALES
0  2020-01-01  TypeA   1000
1  2020-01-01  TypeB    200
2  2020-01-01  TypeC    300
3  2020-02-01  TypeA    700
4  2020-02-01  TypeB    400
5  2020-02-01  TypeC    500
6  2020-03-01  TypeA    300
7  2020-03-01  TypeB    900
8  2020-03-01  TypeC    100
infoDue to pyarrow version issues in my Cloud Shell environment, I had to use Python 2.7 to run the script. Alternatively, you can also install an older version of pyarrow. For example: pip3 install pyarrow==1.0.0
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts