Valueerror: A Bigquery Table Or A Query Must Be Specified With Beam.io.gcp.bigquery.readfrombigquery
I'm trying to pass a BigQuery table name as a value provider for a apache beam pipeline template. According to their documentation and this StackOverflow answer, it's possible to p
Solution 1:
The table
argument must be passed by name to ReadFromBigQuery
.
BigQuerySource
(deprecated) accepts a table
as the first argument so you can pass one in by position (docs). But ReadFromBigQuery
expects the gcs_location
as the first argument (docs). So if you are porting code from using BigQuerySource
to using ReadFromBigQuery
and you weren't explicitly passing the table in by name, it will fail with the error you received.
Here are two working examples and one that does not work:
import apache_beam as beam
project_id = 'my_project'
dataset_id = 'my_dataset'
table_id = 'my_table'if __name__ == "__main__":
args = [
'--temp_location=gs://my_temp_bucket',
]
# This works:with beam.Pipeline(argv=args) as pipeline:
query_results = (
pipeline
| 'Read from BigQuery'
>> beam.io.ReadFromBigQuery(table=f"{project_id}:{dataset_id}.{table_id}")
)
# So does this:with beam.Pipeline(argv=args) as pipeline:
query_results = (
pipeline
| 'Read from BigQuery'
>> beam.io.ReadFromBigQuery(table=f"{dataset_id}.{table_id}", project=project_id)
)
# But this doesn't work becuase the table argument is not passed in by name.# The f"{project_id}:{dataset_id}.{table_id}" string is interpreted as the gcs_location.with beam.Pipeline(argv=args) as pipeline:
query_results = (
pipeline
| 'Read from BigQuery'
>> beam.io.ReadFromBigQuery(f"{project_id}:{dataset_id}.{table_id}")
)
Post a Comment for "Valueerror: A Bigquery Table Or A Query Must Be Specified With Beam.io.gcp.bigquery.readfrombigquery"