TransWikia.com

Service Broker locks

Database Administrators Asked on January 2, 2022

The situation: I use Service Broker to develop data push approach.

Now I consider a scenario:

  1. Use separate DB ( called Broker ) which has services, queues and stored procedure to send data.
  2. Use triggers on necessary databases and tables which transform data to JSON and execute SP from Broker database to send data. I have 39 databases / 264632 tables / separated trigger in every table for I/U/D. 793896 triggers at all. Yes, I know that it’s huge but we have such data model and I’m not able to change it.
  3. Now I don’t use Activation Stored Procedure, because some client will consume data from SB target queue.

Scripts for Service Broker installation in Broker database:

-- installation
use master
go
if exists ( select * from sys.databases where name = 'Broker' )
begin
    alter database [Broker] set restricted_user with rollback immediate;
    drop database [Broker];
end
go
create database [Broker]
go
alter database [Broker] set enable_broker with rollback immediate;
alter database [Broker] set read_committed_snapshot on;
alter database [Broker] set allow_snapshot_isolation on;
alter database [Broker] set recovery full;
go
use [Broker]
go
create message type datachanges_messagetype 
  validation = none;
go
create contract datachanges_contract ( datachanges_messagetype sent by initiator );
go
create queue dbo.datachanges_initiatorqueue
  with status = on
     , retention = off
    , poison_message_handling ( status = on )
  on [default];
go
create queue dbo.datachanges_targetqueue
  with status = on
     , retention = off
     , poison_message_handling ( status = on )
  on [default];
go
create service datachanges_initiatorservice 
  on queue datachanges_initiatorqueue
  ( datachanges_contract );
go
create service datachanges_targetservice 
  on queue datachanges_targetqueue
  ( datachanges_contract );
go
-- conversation additional table
create table dbo.[SessionConversationsSPID] (
    spid int not null
  , handle uniqueidentifier not null
  , primary key ( spid )
  , unique ( handle )  
)
go
-- SP which is used to send data from triggers
create procedure dbo.trackChanges_send
  @json nvarchar(max)
as 
begin
    set nocount on;

    if ( @json is null or @json = '' )
    begin
        raiserror( 'DWH Service Broker: An attempt to send empty message occurred', 16, 1);
        return;
    end

    declare @handle  uniqueidentifier = null
          , @counter int = 1
          , @error   int;

    begin transaction

    while ( 1 = 1 )
    begin

        select @handle = handle
          from dbo.[SessionConversationsSPID]
          where spid = @@SPID;

        if @handle is null
        begin

            begin dialog conversation @handle 
                from service datachanges_initiatorservice
                to   service 'datachanges_targetservice'
                  on contract datachanges_contract
                with encryption = off;

            insert into dbo.[SessionConversationsSPID] ( spid, handle )
              values ( @@SPID, @handle );

        end;

        send on conversation @handle 
          message type datachanges_messagetype( @json );

        set @error = @@error;

        if @error = 0
          break;

        set @counter += 1;

        if @counter > 5 
        begin
            declare @mes varchar(max) = 'db - ' + @db + '. schema - ' + @sch;
            raiserror( N'DWH Service Broker: Failed to SEND on a conversation for more than 10 times. Source: %s. Error: %i.', 16, 2, @mes, @error );
            break;
        end

        delete from dbo.[SessionConversationsSPID]
          where handle = @handle;

        set @handle = null;

    end

    commit;
        
end
go
-- And dialogs creation to mitigate hot spot problem on sys.sysdesend table.
-- Described here: https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd576261
declare @i int, @spid int, @handle uniqueidentifier
select @i = 0, @spid = 50;

while (@i < 150*3000) -- 450000 dialogs
begin
       set @i = @i + 1
       begin dialog @handle 
         from service datachanges_initiatorservice
         to service 'datachanges_targetservice'
         on contract datachanges_contract
        with encryption = off;

       if ((@i % 150) = 0)
       begin
              set @spid += 1;
              insert into dbo.SessionConversationsSPID ( spid, handle ) values (@spid, @handle)
       end
       
end

Typical trigger code in an user database:

create trigger [<SCHEMA>].[<TABLE>_TR_I]
    on [<SCHEMA>].[<TABLE>]
    with execute as caller
    after insert
    as
    begin 
    
        set xact_abort off;
        set nocount on;
    
        declare @rc int = ( select count(*) from inserted );
    
        if ( @rc = 0 )
        begin
            return;
        end
    
        begin try

            declare @db_name sysname = db_name();

            declare @json nvarchar(max);
    
            set @json = (
                select getutcdate() as get_date, ''I'' as tr_operation, current_transaction_id() as cur_tran_id, ''<TABLE>'' as table_name, @@servername as server_name, @db_name as db_name, ''<SCHEMA>'' as tenant_schemaname
                      , *
                    from inserted
                    for json auto, include_null_values
            );
    
            exec dbo.trackChanges_send
                 @json = @json;
    
        end try
        begin catch 
            
            declare @error_message nvarchar(max); 
            set @error_message = ''[''       + isnull( cast( error_number()    as nvarchar( max ) ), '''' ) +''] ''
                                             + isnull( cast( error_severity()  as nvarchar( max ) ), '''' )
                                +'' State: ''+ isnull( cast( error_state()     as nvarchar( max ) ), '''' )
                                +'' Trigger: '' + ''[<SCHEMA>].[<TABLE>_TR_I]''
                                +'' Line: '' + isnull( cast( error_line()      as nvarchar( max ) ), '''' )
                                +'' Msg: ''  + isnull( cast( error_message()   as nvarchar( max ) ), '''' );
    
            raiserror( ''DWH Service Broker: An error has been occured while sending data changes. Error: %s'', 0, 0, @error_message ) with log;
    
            return;
        end catch
    end
    go

So, my questions are:

  1. Sometimes I see long PAGELATCH_EX / PAGELATCH_SH waits during trigger execution. The problem is that latches are waited on target_queue table. I don’t understand why I see some kind of hot spot on target queue dbo.datachanges_targetqueue when I send data to Service Broker. I would understand if there was waiting related to sending system tables or transmission queue.
    I see target queue at resource_description column:
    enter image description here

Using dbcc page I see that this page belongs to sys.queue_messages_597577167 which is a wrapper for dbo.datachanges_targetqueue. Total amount of waiting sessions on that moment was ~ 450, so it can be a bottleneck.

Meanwhile xEvent session which tracks trigger execution tells
enter image description here

that in that time triggers were executed pretty long ( more than 10 sec, when usually it’s less then 1 sec ).
It happens in random time, so I don’t see any dependency here

  1. My second problem is related to locking also. And also it happens in random time. My fake script to read data from target queue ( external client emulation ) is
declare @i int = 0;
while ( 1 = 1 )
begin

    declare @mb varbinary( max );
    receive top ( 1000 ) @mb = message_body from dbo.datachanges_targetqueue
    
    set @i = @@rowcount;
    
    if @i = 0
      break;

end

which is executed periodically can be blocked also because of triggers activity. And I do not understand why.

  1. IS it ok to use one queue and ~800000 triggers? 🙂 I mean maybe there are some thresholds I need to consider.

  2. What are the advantages/disadvantages to use "my" approach ( one db is a sender and a target ) or to use "every db is sender and one target"


One Answer

IS it ok to use one queue and ~800000 triggers? :) I mean maybe there are some thresholds I need to consider.

No. Not really. You must ensure that your triggers are always short-running or your throughput will suffer.

Having 800000 triggers write to a single queue is not going to be a good idea. A queue is backed by a regular table, and at some scale page hotspots are going to be your bottleneck. And:

Messages sent to services in the same instance of the Database Engine are put directly into the queues associated with these services.

SEND

If your target service is on a remote SQL Server instance then the messages will be written and committed to to each database's transmission queue. But for target queues on the same instance messages go directly to the target queue.

I think the bottom line is that writing directly to the target queue is not the right solution here. Imagine having an empty target queue at the time of peak transaction throughput. That queue's backing table simply doesn't have enough pages to spread out the page latching to accommodate a large number of concurrent writers needed in this scenario.

And if all your tables are in the same database, then the transmission queue could become the bottleneck. But the transmission queue has a different structure than normal queues. The transmission queue has a single clustered index:

select i.name index_name, i.type, c.name, c.column_id, t.name type_name, c.max_length, ic.key_ordinal
from 
sys.indexes i 
join sys.index_columns ic
  on ic.object_id = i.object_id
join sys.columns c
  on c.object_id = ic.object_id
 and c.column_id = ic.column_id
join sys.types t
  on t.system_type_id = c.system_type_id
  and t.user_type_id =c.user_type_id
where c.object_id =  object_id('sys.sysxmitqueue')

outpts

index_name  type name         column_id   type_name             max_length key_ordinal
----------- ---- ------------ ----------- --------------------- ---------- -----------
clst        1    dlgid        1           uniqueidentifier      16         1
clst        1    finitiator   2           bit                   1          2
clst        1    msgseqnum    8           bigint                8          3

So you won't have hot page contention on the transmission queue, and you'll have as many insertion points as you have dialog conversations (dlgid).

A normal queue has two indexes, a clustered index on

( status, conversation_group_id, priority, conversation_handle, queuing_order )

and a non-clustered index on

( status, priority, queuing_order, conversation_group_id, conversation_handle, service_id )

which you can see with this query

select q.name queue_name, i.name index_name, i.index_id, ic.index_id, i.type, c.name column_name, c.column_id, t.name type_name, c.max_length, ic.key_ordinal
from 
SYS.SERVICE_QUEUES q
join sys.internal_tables it 
    ON it.parent_object_id = q.object_id
join sys.indexes i
    on i.object_id = it.object_id
join sys.index_columns ic
  on ic.object_id = i.object_id
  and ic.index_id = i.index_id
join sys.columns c
  on c.object_id = ic.object_id
 and c.column_id = ic.column_id
join sys.types t
  on t.system_type_id = c.system_type_id
  and t.user_type_id =c.user_type_id
order by q.object_id, i.index_id, ic.key_ordinal

So you might be better off moving the target service to a remote SQL instance. This would offload and writing and reading of the target queues, and might have less of a bottleneck. Your triggers would only have to put the message on the transmission queue, which is what you thought was happening in the first place.

You can watch the routing and transmission queue usage with an Extended Events session like:

CREATE EVENT SESSION [ServiceBrokerRouting] ON SERVER 
ADD EVENT sqlserver.broker_dialog_transmission_body_dequeue,
ADD EVENT sqlserver.broker_dialog_transmission_queue_enqueue,
ADD EVENT sqlserver.broker_forwarded_message_sent,
ADD EVENT sqlserver.broker_message_classify,
ADD EVENT sqlserver.broker_remote_message_acknowledgement

Also in your current design and in the remote service option, you can see from the index structures how how reusing the right number dialog conversations can optimize the solution. Too few and you have locking and page contention issues. Too many and you have overhead of creating and managing them, and you can't do message batching. It looks like you've already read Reusing Conversations, and are using a conversation-per-session pattern, which Remus recommends for this pattern. It would be interesting to see which index the page latch contention is on, and whether it's a leaf or non-leaf page. But in any case queue tables with concurrent SEND and RECEIVE don't usually have enough pages to spread out page latch contention.

So the design alternative is to have the triggers drop changes on N intermediate queues, and then have activation procs on those forward the messages to the single destination queue. You may still have waits on the destination queue, but they won't be during your triggers. Also in your intermediate-to-final queue activation procedure you can batch up sends and manage conversations and have many fewer dialog conversations (N), so the receiver can actually fetch 1000 messages per call. A single call to RECEIVE can only fetch messages from a single conversation. So if you have thousands of conversations interleaved, you'll always only fetch single rows.

Or simply have N destination queues and have your readers read from all of them.

There's no fundamental reason why you can't get this working, but it's not going to be simple. The scale is immense, and Service Broker is complex. You should also consider 3rd party solutions here. Qlik (Attunity) has a log-based CDC solution for SQL Server that can harvest all the changes from the SQL Server transaction logs without triggers or queues. And there are several other solutions based on SQL Server Change Data Capture. Change Data Capture will stage all the changes for you, and you just have to consume them from your external program. Change Tracking is the most lightweight solution, but doesn't capture the intermediate versions of rows. So you know which rows changed and whether the change was an insert, update, or delete, but you only have the current version of the row to query. But every one of these options is going to be expensive, tricky, and require significant testing.

Answered by David Browne - Microsoft on January 2, 2022

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP