"An exception was thrown from a UDF" but there's no UDF

Question

I'm switching from Spark 2.4 to Spark 3, I solved most of the bugs I encountered but I don't know where this one is coming from. I have a dataframe df which I'm trying to write to a table in an Azure SQL Server, here is the line provoking the error and the stacktrace: --> 105 df.write.jdbc(url=JDBCURL, table=table, mode=mode) 106 if verbose: 107 print("Yay it worked!") /databricks/spark/python/pyspark/sql/readwriter.py in jdbc(self, url, table, mode, properties) 1080 for k in properties: 1081 jprop.setProperty(k, properties[k]) -> 1082 self.mode(mode)._jwrite.jdbc(url, table, jprop) 1083 1084 /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args) 1303 answer = self.gateway_client.send_command(command) 1304 return_value = get_return_value( -> 1305 answer, self.gateway_client, self.target_id, self.name) 1306 1307 for temp_arg in temp_args: /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 131 # Hide where the exception came from that shows a non-Pythonic 132 # JVM exception message. --> 133 raise_from(converted) 134 else: 135 raise /databricks/spark/python/pyspark/sql/utils.py in raise_from(e) PythonException: An exception was thrown from a UDF: 'KeyError: None'. Full traceback below: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 654, in main process() File "/databricks/spark/python/pyspark/worker.py", line 646, in process serializer.dump_stream(out_iter, outfile) File "/databricks/spark/python/pyspark/serializers.py", line 231, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/databricks/spark/python/pyspark/serializers.py", line 145, in dump_stream for obj in iterator: File "/databricks/spark/python/pyspark/serializers.py", line 220, in _batched for item in iterator: File "/databricks/spark/python/pyspark/worker.py", line 467, in mapper result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs) File "/databricks/spark/python/pyspark/worker.py", line 467, in result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs) File "/databricks/spark/python/pyspark/worker.py", line 91, in return lambda *a: f(*a) File "/databricks/spark/python/pyspark/util.py", line 109, in wrapper return f(*args, **kwargs) KeyError: None Well, there is no UDF as far as I'm concerned! Some of the columns in my dataframe contain null, but this is not (should not be!) a problem since my table accepts null in these columns. My code in Spark 2.4 could send this kind of dataframe to my SQL server, but now that I've switched to Spark 3 this line fails. I am using Databricks, with a 7.3 runtime, Python 3.7

"An exception was thrown from a UDF" but there's no UDF

Add your own answers!

Ask a Question