Database Administrators Asked on January 2, 2022
The situation: I use Service Broker to develop data push approach.
Now I consider a scenario:
Scripts for Service Broker installation in Broker database:
-- installation use master go if exists ( select * from sys.databases where name = 'Broker' ) begin alter database [Broker] set restricted_user with rollback immediate; drop database [Broker]; end go create database [Broker] go alter database [Broker] set enable_broker with rollback immediate; alter database [Broker] set read_committed_snapshot on; alter database [Broker] set allow_snapshot_isolation on; alter database [Broker] set recovery full; go use [Broker] go create message type datachanges_messagetype validation = none; go create contract datachanges_contract ( datachanges_messagetype sent by initiator ); go create queue dbo.datachanges_initiatorqueue with status = on , retention = off , poison_message_handling ( status = on ) on [default]; go create queue dbo.datachanges_targetqueue with status = on , retention = off , poison_message_handling ( status = on ) on [default]; go create service datachanges_initiatorservice on queue datachanges_initiatorqueue ( datachanges_contract ); go create service datachanges_targetservice on queue datachanges_targetqueue ( datachanges_contract ); go -- conversation additional table create table dbo.[SessionConversationsSPID] ( spid int not null , handle uniqueidentifier not null , primary key ( spid ) , unique ( handle ) ) go -- SP which is used to send data from triggers create procedure dbo.trackChanges_send @json nvarchar(max) as begin set nocount on; if ( @json is null or @json = '' ) begin raiserror( 'DWH Service Broker: An attempt to send empty message occurred', 16, 1); return; end declare @handle uniqueidentifier = null , @counter int = 1 , @error int; begin transaction while ( 1 = 1 ) begin select @handle = handle from dbo.[SessionConversationsSPID] where spid = @@SPID; if @handle is null begin begin dialog conversation @handle from service datachanges_initiatorservice to service 'datachanges_targetservice' on contract datachanges_contract with encryption = off; insert into dbo.[SessionConversationsSPID] ( spid, handle ) values ( @@SPID, @handle ); end; send on conversation @handle message type datachanges_messagetype( @json ); set @error = @@error; if @error = 0 break; set @counter += 1; if @counter > 5 begin declare @mes varchar(max) = 'db - ' + @db + '. schema - ' + @sch; raiserror( N'DWH Service Broker: Failed to SEND on a conversation for more than 10 times. Source: %s. Error: %i.', 16, 2, @mes, @error ); break; end delete from dbo.[SessionConversationsSPID] where handle = @handle; set @handle = null; end commit; end go -- And dialogs creation to mitigate hot spot problem on sys.sysdesend table. -- Described here: https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd576261 declare @i int, @spid int, @handle uniqueidentifier select @i = 0, @spid = 50; while (@i < 150*3000) -- 450000 dialogs begin set @i = @i + 1 begin dialog @handle from service datachanges_initiatorservice to service 'datachanges_targetservice' on contract datachanges_contract with encryption = off; if ((@i % 150) = 0) begin set @spid += 1; insert into dbo.SessionConversationsSPID ( spid, handle ) values (@spid, @handle) end end
Typical trigger code in an user database:
create trigger [<SCHEMA>].[<TABLE>_TR_I] on [<SCHEMA>].[<TABLE>] with execute as caller after insert as begin set xact_abort off; set nocount on; declare @rc int = ( select count(*) from inserted ); if ( @rc = 0 ) begin return; end begin try declare @db_name sysname = db_name(); declare @json nvarchar(max); set @json = ( select getutcdate() as get_date, ''I'' as tr_operation, current_transaction_id() as cur_tran_id, ''<TABLE>'' as table_name, @@servername as server_name, @db_name as db_name, ''<SCHEMA>'' as tenant_schemaname , * from inserted for json auto, include_null_values ); exec dbo.trackChanges_send @json = @json; end try begin catch declare @error_message nvarchar(max); set @error_message = ''['' + isnull( cast( error_number() as nvarchar( max ) ), '''' ) +''] '' + isnull( cast( error_severity() as nvarchar( max ) ), '''' ) +'' State: ''+ isnull( cast( error_state() as nvarchar( max ) ), '''' ) +'' Trigger: '' + ''[<SCHEMA>].[<TABLE>_TR_I]'' +'' Line: '' + isnull( cast( error_line() as nvarchar( max ) ), '''' ) +'' Msg: '' + isnull( cast( error_message() as nvarchar( max ) ), '''' ); raiserror( ''DWH Service Broker: An error has been occured while sending data changes. Error: %s'', 0, 0, @error_message ) with log; return; end catch end go
So, my questions are:
Using dbcc page I see that this page belongs to sys.queue_messages_597577167 which is a wrapper for dbo.datachanges_targetqueue. Total amount of waiting sessions on that moment was ~ 450, so it can be a bottleneck.
that in that time triggers were executed pretty long ( more than 10 sec, when usually it’s less then 1 sec ).
It happens in random time, so I don’t see any dependency here
declare @i int = 0; while ( 1 = 1 ) begin declare @mb varbinary( max ); receive top ( 1000 ) @mb = message_body from dbo.datachanges_targetqueue set @i = @@rowcount; if @i = 0 break; end
which is executed periodically can be blocked also because of triggers activity. And I do not understand why.
IS it ok to use one queue and ~800000 triggers? 🙂 I mean maybe there are some thresholds I need to consider.
What are the advantages/disadvantages to use "my" approach ( one db is a sender and a target ) or to use "every db is sender and one target"
IS it ok to use one queue and ~800000 triggers? :) I mean maybe there are some thresholds I need to consider.
No. Not really. You must ensure that your triggers are always short-running or your throughput will suffer.
Having 800000 triggers write to a single queue is not going to be a good idea. A queue is backed by a regular table, and at some scale page hotspots are going to be your bottleneck. And:
Messages sent to services in the same instance of the Database Engine are put directly into the queues associated with these services.
If your target service is on a remote SQL Server instance then the messages will be written and committed to to each database's transmission queue. But for target queues on the same instance messages go directly to the target queue.
I think the bottom line is that writing directly to the target queue is not the right solution here. Imagine having an empty target queue at the time of peak transaction throughput. That queue's backing table simply doesn't have enough pages to spread out the page latching to accommodate a large number of concurrent writers needed in this scenario.
And if all your tables are in the same database, then the transmission queue could become the bottleneck. But the transmission queue has a different structure than normal queues. The transmission queue has a single clustered index:
select i.name index_name, i.type, c.name, c.column_id, t.name type_name, c.max_length, ic.key_ordinal from sys.indexes i join sys.index_columns ic on ic.object_id = i.object_id join sys.columns c on c.object_id = ic.object_id and c.column_id = ic.column_id join sys.types t on t.system_type_id = c.system_type_id and t.user_type_id =c.user_type_id where c.object_id = object_id('sys.sysxmitqueue')
index_name type name column_id type_name max_length key_ordinal ----------- ---- ------------ ----------- --------------------- ---------- ----------- clst 1 dlgid 1 uniqueidentifier 16 1 clst 1 finitiator 2 bit 1 2 clst 1 msgseqnum 8 bigint 8 3
So you won't have hot page contention on the transmission queue, and you'll have as many insertion points as you have dialog conversations (dlgid).
A normal queue has two indexes, a clustered index on
( status, conversation_group_id, priority, conversation_handle, queuing_order )
and a non-clustered index on
( status, priority, queuing_order, conversation_group_id, conversation_handle, service_id )
which you can see with this query
select q.name queue_name, i.name index_name, i.index_id, ic.index_id, i.type, c.name column_name, c.column_id, t.name type_name, c.max_length, ic.key_ordinal from SYS.SERVICE_QUEUES q join sys.internal_tables it ON it.parent_object_id = q.object_id join sys.indexes i on i.object_id = it.object_id join sys.index_columns ic on ic.object_id = i.object_id and ic.index_id = i.index_id join sys.columns c on c.object_id = ic.object_id and c.column_id = ic.column_id join sys.types t on t.system_type_id = c.system_type_id and t.user_type_id =c.user_type_id order by q.object_id, i.index_id, ic.key_ordinal
So you might be better off moving the target service to a remote SQL instance. This would offload and writing and reading of the target queues, and might have less of a bottleneck. Your triggers would only have to put the message on the transmission queue, which is what you thought was happening in the first place.
You can watch the routing and transmission queue usage with an Extended Events session like:
CREATE EVENT SESSION [ServiceBrokerRouting] ON SERVER ADD EVENT sqlserver.broker_dialog_transmission_body_dequeue, ADD EVENT sqlserver.broker_dialog_transmission_queue_enqueue, ADD EVENT sqlserver.broker_forwarded_message_sent, ADD EVENT sqlserver.broker_message_classify, ADD EVENT sqlserver.broker_remote_message_acknowledgement
Also in your current design and in the remote service option, you can see from the index structures how how reusing the right number dialog conversations can optimize the solution. Too few and you have locking and page contention issues. Too many and you have overhead of creating and managing them, and you can't do message batching. It looks like you've already read Reusing Conversations, and are using a conversation-per-session pattern, which Remus recommends for this pattern. It would be interesting to see which index the page latch contention is on, and whether it's a leaf or non-leaf page. But in any case queue tables with concurrent SEND and RECEIVE don't usually have enough pages to spread out page latch contention.
So the design alternative is to have the triggers drop changes on N intermediate queues, and then have activation procs on those forward the messages to the single destination queue. You may still have waits on the destination queue, but they won't be during your triggers. Also in your intermediate-to-final queue activation procedure you can batch up sends and manage conversations and have many fewer dialog conversations (N), so the receiver can actually fetch 1000 messages per call. A single call to RECEIVE can only fetch messages from a single conversation. So if you have thousands of conversations interleaved, you'll always only fetch single rows.
Or simply have N destination queues and have your readers read from all of them.
There's no fundamental reason why you can't get this working, but it's not going to be simple. The scale is immense, and Service Broker is complex. You should also consider 3rd party solutions here. Qlik (Attunity) has a log-based CDC solution for SQL Server that can harvest all the changes from the SQL Server transaction logs without triggers or queues. And there are several other solutions based on SQL Server Change Data Capture. Change Data Capture will stage all the changes for you, and you just have to consume them from your external program. Change Tracking is the most lightweight solution, but doesn't capture the intermediate versions of rows. So you know which rows changed and whether the change was an insert, update, or delete, but you only have the current version of the row to query. But every one of these options is going to be expensive, tricky, and require significant testing.
Answered by David Browne - Microsoft on January 2, 2022
1 Asked on February 4, 2021
2 Asked on February 3, 2021 by leona
3 Asked on February 2, 2021 by emanuele-paolini
1 Asked on January 31, 2021 by miguel-ramires
1 Asked on January 30, 2021 by milkncookiez
2 Asked on January 29, 2021 by chris-jenner
2 Asked on January 28, 2021 by stalinko
1 Asked on January 28, 2021 by thedemonlord
1 Asked on January 26, 2021 by elephantcoder
1 Asked on January 25, 2021 by rdrgtec
2 Asked on January 25, 2021 by dstr
2 Asked on January 24, 2021 by jac
1 Asked on January 23, 2021 by lloyd-thomas
1 Asked on January 18, 2021 by narendra
1 Asked on January 14, 2021 by din
1 Asked on January 14, 2021 by vince-kronlein
2 Asked on January 14, 2021 by nico-m
3 Asked on January 13, 2021 by roy-hinkley
Get help from others!