access_time 1 month ago languageEnglish
more_vert

Python: Read Data from BigQuery

visibility 15 comment 0

This article provides example of reading data from Google BigQuery as pandas DataFrame.

Prerequisites

Refer to Pandas - Save DataFrame to BigQuery to understand the prerequisites to setup credential file and install pandas-gbq package. The permissions required for read from BigQuery is different from loading data into BigQuery; so please setup your service account permission accordingly. 

The source table used in this tutorial looks like the screenshot below:


About to_gbq function

In this tutorial, we directly use read_gbq function. Refer to pandas.read_gbq — pandas 1.2.3 documentation (pydata.org) for more details. 

The signature of the function looks like the following:

pandas.read_gbq(query, project_id=None, index_col=None, col_order=None, reauth=False, auth_local_webserver=False, dialect=None, location=None, configuration=None, credentials=None, use_bqstorage_api=None, max_results=None, progress_bar_type=None)

Create a python script file

We start to create a python script file named pd-from-bq.py with the following content:

import pandas as pd
from google.oauth2.service_account import Credentials

# Define source table in BQ
source_table = "YOUR_DATA_SET.pandas"
project_id = "YOUR_PROJECT_ID"
credential_file = "PATH_TO_YOUR_SERVICE_ACCOUNT_CREDENTIAL_FILE.json"
credential = Credentials.from_service_account_file(credential_file)
# Location for BQ job, it needs to match with destination table location
job_location = "australia-southeast1"

# Save Pandas dataframe to BQ
df = pd.read_gbq("select * from "+source_table, project_id=project_id,
                 location=job_location, credentials=credential)
print(df)

Please change the variables accordingly to reflect your environment and BigQuery setup.

Run the script

Run the script using the following command:

python ./pd-from-bq.py

The output looks like the following:

         DATE   TYPE  SALES
0  2020-01-01  TypeA   1000
1  2020-01-01  TypeB    200
2  2020-01-01  TypeC    300
3  2020-02-01  TypeA    700
4  2020-02-01  TypeB    400
5  2020-02-01  TypeC    500
6  2020-03-01  TypeA    300
7  2020-03-01  TypeB    900
8  2020-03-01  TypeC    100
infoDue to pyarrow version issues in my Cloud Shell environment, I had to use Python 2.7 to run the script. Alternatively, you can also install an older version of pyarrow. For example: pip3 install pyarrow==1.0.0
info Last modified by Raymond 1 month ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to contribute on Kontext to help others?

Learn more

More from Kontext

visibility 11
thumb_up 0
access_time 2 months ago