Skip to content Skip to sidebar Skip to footer

Valueerror: A Bigquery Table Or A Query Must Be Specified With Beam.io.gcp.bigquery.readfrombigquery

I'm trying to pass a BigQuery table name as a value provider for a apache beam pipeline template. According to their documentation and this StackOverflow answer, it's possible to p

Solution 1:

The table argument must be passed by name to ReadFromBigQuery.

BigQuerySource (deprecated) accepts a table as the first argument so you can pass one in by position (docs). But ReadFromBigQuery expects the gcs_location as the first argument (docs). So if you are porting code from using BigQuerySource to using ReadFromBigQuery and you weren't explicitly passing the table in by name, it will fail with the error you received.

Here are two working examples and one that does not work:

import apache_beam as beam

project_id = 'my_project'
dataset_id = 'my_dataset'
table_id = 'my_table'if __name__ == "__main__":
    args = [
        '--temp_location=gs://my_temp_bucket',
    ]
    # This works:with beam.Pipeline(argv=args) as pipeline:
        query_results = (
          pipeline
          | 'Read from BigQuery' 
          >> beam.io.ReadFromBigQuery(table=f"{project_id}:{dataset_id}.{table_id}")
        )
    # So does this:with beam.Pipeline(argv=args) as pipeline:
        query_results = (
          pipeline
          | 'Read from BigQuery' 
          >> beam.io.ReadFromBigQuery(table=f"{dataset_id}.{table_id}", project=project_id)
        )
    # But this doesn't work becuase the table argument is not passed in by name.# The f"{project_id}:{dataset_id}.{table_id}" string is interpreted as the gcs_location.with beam.Pipeline(argv=args) as pipeline:
        query_results = (
          pipeline
          | 'Read from BigQuery'
          >> beam.io.ReadFromBigQuery(f"{project_id}:{dataset_id}.{table_id}")
        )

Post a Comment for "Valueerror: A Bigquery Table Or A Query Must Be Specified With Beam.io.gcp.bigquery.readfrombigquery"