BigQuery

BigQuery_512px

In my previous post, I started looking into Google Cloud Platform’s Big Data offerings and shared my notes on Pub/Sub.  In this post, I want to continue on the Big Data theme and explore BigQuery.

What is BigQuery?

BigQuery Projects

These are the main concepts in a BigQuery project:

  • Project: This is the top level construct that every GCP project needs.
  • Dataset: This is a grouping of tables with access control.
  • Table: This is where your data resides and what you query against using SQL.
  • Jobs: Actions (load data, copy data etc.) that BigQuery can run on your behalf.
  • Access control (ACL): To manage access to projects and datasets. A table inherits its ACL from dataset.

Loading Data

  • Either load the data directly into BigQuery or setup data as a federated/external data source.
  • If loading data in BigQuery, you can bulk load or stream data as individual records.
  • Other Google Cloud sources for loading into BigQuery:
    • Cloud Storage.
    • Cloud Datastore.
    • Cloud Dataflow.
    • AppEngine log files.
    • Cloud Storage access/storage logs.
    • Cloud Audit Logs.
  • 3 data source formats: CSV, newline-delimited JSON, Cloud Datastore backup files.

Exporting Data

Data can be exported from BigQuery in 2 ways:

  1. Files: Export up to 1GB of data per file and supports multiple files.
  2. Use Google Cloud Dataflow to read data from BigQuery.

Querying

  • Queries are written in BigQuery SQL dialect.
  • Synch and async query methods.
  • Results are saved either in temporary or persistent tables.
  • Queries can be interactive (executed ASAP) or batched (execute when possible).
  • Query results are cached by default but caching can be disabled.
  • Supports user-defined function in JavaScript (basically the Map part of MapReduce).

Pricing and Quotas

  • BigQuery charges for data storage, streaming inserts and query data (details).
  • Free of charge: Loading and exporting data.
  • For queries, you’re charged for the number of bytes processed. 1TB per month is free.
  • There are limits on incoming requests (details)

Resources