TransWikia.com

In GCP Dataproc, what is the maximum number of worker nodes we can use in a cluster?

Stack Overflow Asked by Fransisca Sibarani on November 18, 2021

I am about to train a 5 million rows of data containing 7 categorical variables (string), but soon will train a 31 million rows of data.
I am wondering what the maximum number of worker nodes we can use in a cluster, because even if I type something like: 2,000,000, it doesn’t show any indication of an error.

Another question would be, what would be the best way to determine how many worker nodes needed?

Thank you in advance!

One Answer

Max cluster size

Dataproc does not limit number of the nodes in the cluster, but other software can have limitations. For example, it's known that there are YARN cluster deployments that have 10k nodes, so going above that may not work for Spark on YARN that Dataproc runs.

Also, you need to take into account GCE limitations like different quotas (CPU, RAM, Disk, external IPs, etc) and QPS limits and make sure that you have enough of these for such a large cluster.

I think that 1k nodes is a reasonable size to start from for a large Dataproc cluster if you need it, and you can upscale it further to add more nodes as necessary after cluster creation.

Cluster size estimation

You should determine how many nodes you need based on your workload and VM size that you want to use. For your use case it seems that you need to find a guide on how to estimate cluster size for ML training.

Or alternatively you can just do a binary search until you satisfied with a training time. For example, you can start from 500 8-core nodes cluster and if training time is too long then increase cluster size to 600-750 nodes and see if training time decreases as you expect - you can repeat this until you satisfied with training time or until it does not scale/improve anymore.

Answered by Igor Dvorzhak on November 18, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP