Database Administrators Asked on October 28, 2021
We have two separate SQL Servers. On one server we have a data warehouse (DWH), on the other we have sales information database.
Now on the DWH server there is an ETL job that collects the information from the sales server. The job runs daily after midnight. The DWH collects the information via linked server from the sales database.
Now, the most time the ETL job runs without any problems. But sometimes it fails because of query timeout. We have found out, that there is a specific pattern: The failure happens every 11th day. So on the 11th day the ETL job fails to collect information.
The following error occurs:
SQLNCLI11 for linked server “my linked server” returned message “Query timeout expired.”
Note: The job fails usually 10 minutes after start.
We have searched everything and could not found out what this issue causes. Also we know that the amount of data is every time, almost the same. Also there is no any scheduled job that runs all 11 days or something.
The remote query timeout on the linked server is set to 0.
Our next step will be, to turn off the antivirus programm on the sales server, to check if this causes the problem.
Does anyone have any clue or idea, where i can search further to find the problem?
The old SQL Server RAISERROR-hack does the trick for me.
RAISERROR(N'', 0, 1) WITH NOWAIT
If you can call it in your remote code more often than those default 600 seconds (10 minutes) then you can use it.
It just forces buffer flushing and this is enough to overcome your remote query timeout limitation.
In details. I have a stored procedure on a SQL Server instance. That instance is linked.
This procedure calls (especially in loops)
PRINT CONCAT('Buffer spamming to prevent "Query timeout expired"; at ', CONVERT(VARCHAR(12),GETDATE(),114)) RAISERROR(N'', 0, 1) WITH NOWAIT
Then I invoke this procedure through the Linked Server feature on another SQL Server instance.
Warning! It is not tested in SQL Server 2019.
Basically we have two weird options in this situation.
First is to remember to configure remote query timeout
EXECUTE sp_configure 'remote query timeout', <your_value_seconds>;
for every new DB instance.
Other is to remember to use the RAISERROR hack.
Data volumes are changing, right people may change their job. Sooner or later you'll get into this trouble in any way.
Sadly to say but we can't set the
remote query timeout on connection or session levels. IMHO
In current circumstances and for a robustness I would implement some queue analogue in SQL Server. I mean that I request some work and put parameters through the Linked Server feature. Later I check for results, periodically.
On the other weird hand, I have procedures that do not use RAISERROR hack, at all. But they always work stable for hours.
I did my best trying to realise what is going on there.
I can say that they have almost no
SET NOCOUNT ON and their loops do not hung for a long time and sub-procedures' calls are no longer than 10 minutes.
Answered by it3xl on October 28, 2021
This doesn't answer your question with regards to why the query is slow on the 11th day, but hopefully it helps clarify why it fails after 10 minutes.
The remote timeout on the linked server is set to 0.
Intuitively, this might seem like there is no limit.
What it actually does is use the
sp_configure default for remote query timeout, which is 600 seconds (10 minutes).
Setting the query timeout on the linked server to a higher value, perhaps 1200 seconds (20 minutes), will likely allow your job to complete. And hopefully job completion will provide some insight into why it's taking so much longer on this specific day.
I think the setting is a little confusing, as discussed another question here on the site: Linked server connections to Multi-subnet failover cluster
Answered by Josh Darnell on October 28, 2021
1 Asked on October 27, 2021 by erwin-zangwill
3 Asked on October 27, 2021 by patrickdavey
0 Asked on October 27, 2021 by akhilesh-kedarisetty
1 Asked on October 27, 2021 by d4bbi
1 Asked on March 2, 2021
1 Asked on March 1, 2021 by mikhail-aksenov
0 Asked on February 27, 2021 by plisken
2 Asked on February 25, 2021 by develjoe
1 Asked on February 23, 2021 by rameez
4 Asked on February 23, 2021 by user1870400
1 Asked on February 22, 2021 by ramy-khalifa
0 Asked on February 20, 2021 by gfbaggio
0 Asked on February 19, 2021 by immobiluser
3 Asked on February 19, 2021 by ammar-faizi
1 Asked on February 19, 2021 by radu-dumbrveanu
2 Asked on February 18, 2021 by gandalf-stormcrow
1 Asked on February 17, 2021 by caltor
Get help from others!