Tutorial example NetTrain fails miserably with a GPU

Question

I have been experimenting with GPU NetTrain on AWS now when Mma 12.2 supports remote batch jobs (I don't have an Nvidia GPU to try these things out otherwise). In particular, I'm puzzled by the example on Mathematica documentation tutorial: Sequence Learning and NLP with Neural Networks, Language Modeling, teacherForcingNet variant. After evaluating requisites, this example does training like this: result = NetTrain[teacherForcingNet, <|"Input" -> Keys[trainingData]|>, All, BatchSize -> 64, MaxTrainingRounds -> 5, TargetDevice -> "CPU", ValidationSet -> Scaled[0.1]] A CPU-based run, also if run on AWS, results a network with around 40% loss. I have minimally modified the NetTrain step to perform this on GPU, and to run it on AWS: job = RemoteBatchSubmit[env, NetTrain[teacherForcingNet, <|"Input" -> Keys[trainingData]|>, All, BatchSize -> 64, MaxTrainingRounds -> 5, TargetDevice -> "GPU", ValidationSet -> Scaled[0.1]], TimeConstraint -> Quantity[30, "Minutes"], RemoteProviderSettings -> <|"GPUCount" -> 1|>] When the training job completes, the resulting training object is available as job["EvaluationResult"] (and progress can be actually observed on runtime through job["JobLog"]). The problem is that when CPU-based training results error rate of around 41%, GPU-based run gets stuck at about 82% (effectively without learning anything). What gives? Is this common behaviour for some networks (LeNet on MNIST dataset works just fine on GPU, for instance), a bug that needs fixing, and/or is a workaround available? Neither Method nor WorkingPrecision changes give difference in results.

Tutorial example NetTrain fails miserably with a GPU

Add your own answers!

Ask a Question