TransWikia.com

Connect to Google Cloud SQL instance from GCP Dataflow (ml metadata)?

Stack Overflow Asked by ntakouris on December 3, 2021

I’m trying to run a very simple example pipeline with tensorflow extended, with ML Metadata.

        metadata_connection_config=metadata.mysql_metadata_connection_config(
        host="10.124.128.3", database="ml_metadata", port=3306, username='root', password='****')

But the pipeline fails. The beam pipeline runs on the us-central1 region ( no network parameter or subnetwork set). Cloud SQL is in the same region. I also tried auth via private IP. I get this error:

RuntimeError: Failed to establish connection to Metadata storage with error: mysql_real_connect failed: errno: 2002, error: Can't connect to MySQL server on '10.124.128.3' (36) [while running 'Run[CsvExampleGen]']

And I don’t know what else to try really. I can’t use SSL certificates, these are the only things that Cloud SQL supports:

https://www.tensorflow.org/tfx/api_docs/python/tfx/orchestration/metadata

The pipeline is being run with the following:

DATAFLOW_BEAM_PIPELINE_ARGS = [
'--project=' + 'ml-experiments-z',
'--runner=DataflowRunner',
'--temp_location=' + 'gs://taxi_dataset/tmp',
'--staging_location=' + 'gs://taxi_dataset/staging',
'--region=' + 'us-central1',
'--maxNumWorkers=' + '1',
'--experiments=shuffle_mode=service',
'--job-name=' + 'schemagen',
]

def create_pipeline():
    no_eval_config = example_gen_pb2.Input(splits=[
        example_gen_pb2.Input.Split(name='train', pattern='taxi_pipeline.csv'),
    ])
    example_gen = CsvExampleGen(input=external_input(
        'gs://taxi_dataset/'), input_config=no_eval_config)
    statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])
    schema_gen = SchemaGen(statistics=statistics_gen.outputs['statistics'])

    return pipeline.Pipeline(
        pipeline_name='ml-experiments-taxi-schema-gen',
        pipeline_root='gs://taxi_dataset',
        components=[example_gen, statistics_gen, schema_gen],
        beam_pipeline_args=DATAFLOW_BEAM_PIPELINE_ARGS,
        metadata_connection_config=metadata.mysql_metadata_connection_config(
            host="10.124.128.3", database="ml_metadata", port=3306, username='root', password='<removed for security>')
    )


if __name__ == '__main__':
    BeamDagRunner().run(create_pipeline()) # from tfx.orchestration.beam.beam_dag_runner import BeamDagRunner

The ml metadata component is optional and the pipeline can run successfully without it already.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP