Which is a better SQL strategy?

aggle-rithm · Feb 12, 2014

I have a reporting system that depends on triggers in the production database for a bunch of different tables that, when changed, drop a small change record in a table. This table is monitored by another process that looks up the records that have changed and saves them to the reporting database.

As the system grew bigger, I started getting deadlocks, so I divided the change table into a number of different tables for different categories of records. To avoid changing any of the triggers I had so meticulously created in the production database, I put a INSTEAD OF INSERT trigger on the change table which redirected the records to other tables.

Another strategy would have been to create a stored procedure that performs the same function as the trigger, and call that instead of inserting into the table. However, that would involve changing all the triggers.

Which is the better strategy? The re-direct trigger seems semantically clumsy to me, but I don't see the point of changing it to a stored procedure unless there is a net benefit to doing so.

Any insight would be appreciated.

(As I write this it occurs to me that the re-direct trigger may not actually gain me anything. If the insert command locks the table anyway while the redirect is occurring, I haven't solved the problem. Calling a stored procedure would only lock the table I'm re-directing to....I think.)

aggle-rithm · Feb 12, 2014

An example of how it works:

A record is updated in the customer orders table. The update trigger is called which creates a record with the unique record ID, table number, and change type (update). As far as this trigger knows, it's inserting the record into the "changes" table, but since there is an INSTEAD OF INSERT trigger on that table, the record is instead saved to the "orders" table in the reporting database. Within a minute, a separate process grabs the table name and row id, reads the record from the production database, and updates the corresponding record in the reporting database. The production database and reporting database are now synchronized.

Octavo · Feb 13, 2014

aggle-rithm said:
An example of how it works:

A record is updated in the customer orders table. The update trigger is called which creates a record with the unique record ID, table number, and change type (update). As far as this trigger knows, it's inserting the record into the "changes" table, but since there is an INSTEAD OF INSERT trigger on that table, the record is instead saved to the "orders" table in the reporting database. Within a minute, a separate process grabs the table name and row id, reads the record from the production database, and updates the corresponding record in the reporting database. The production database and reporting database are now synchronized.

I'm sorry - as this is a completely useless suggestion, but why rely on triggers to record DB changes, when the controlling application could be recording the audit information instead?

Another useless suggestion --> Why not set up the prd DB to be replicated and then run your triggers on the replicated DB to log the change records. That way prd is not impacted by the creation of the change records.

You would then most likely ditch your reporting database in favour of using the replicated one.

ETA for Clarification

PRD: no triggers, no change tables. Replicates to RPT
RPT: triggers, change tables

IANASQLG

(I am not a SQL guru)

Aepervius · Feb 13, 2014

Just a stupid question, but did you try anotehr venue, like stack overflow or a similar forum for SQL ? That is a rather pointed expert question for a generic forum.

Octavo · Feb 13, 2014

Aepervius said:
Just a stupid question, but did you try anotehr venue, like stack overflow or a similar forum for SQL ? That is a rather pointed expert question for a generic forum.

Because it is well known across teh intarwebs that JREF has the answers to everything

Also, maybe he has and didn't get particularly helpful responses?

GlennB · Feb 13, 2014

aggle-rithm said:
An example of how it works:

A record is updated in the customer orders table. The update trigger is called which creates a record with the unique record ID, table number, and change type (update). As far as this trigger knows, it's inserting the record into the "changes" table, but since there is an INSTEAD OF INSERT trigger on that table, the record is instead saved to the "orders" table in the reporting database. Within a minute, a separate process grabs the table name and row id, reads the record from the production database, and updates the corresponding record in the reporting database. The production database and reporting database are now synchronized.

It's a long time since I worked with sql, and things have undoubtedly changed too, so forgive me if this is a stupid question -

Why the 2-stage insert/update process into the 'reporting' version of ORDERS? Why not just slap the updated 'production' record in there from the outset, plus the 'update' change type?

Dessi · Feb 13, 2014

aggle-rithm said:
I have a reporting system that depends on triggers in the production database for a bunch of different tables that, when changed, drop a small change record in a table. This table is monitored by another process that looks up the records that have changed and saves them to the reporting database.

As the system grew bigger, I started getting deadlocks, so I divided the change table into a number of different tables for different categories of records. To avoid changing any of the triggers I had so meticulously created in the production database, I put a INSTEAD OF INSERT trigger on the change table which redirected the records to other tables.

I would not attempt a fix on the deadlock issue without first tracing and identifying why the deadlock occurred in the first place. If you are using SQL Server, you can enable traceflags 1204 and 1222 to detect deadlocks and print out a diagnostic output of the event.

Its hard to know what strategy is correct without understanding the deadlock occurred in the first place. It may be something simple, like a TABLOCK/TABLOCKX hint on a query where it isn't needed, or a complex trigger graph, but it won't be possible to debug without trace data of the deadlock.

For more context, can you post the trigger code which inserts the change record into your report tables?

aggle-rithm · Feb 13, 2014

Octavo said:
I'm sorry - as this is a completely useless suggestion, but why rely on triggers to record DB changes, when the controlling application could be recording the audit information instead?

Good question. I'm hampered by the fact that I'm working with a third-party ERP system. That system uses the same database for production and reporting, and I have created a separate reporting system.

Another useless suggestion --> Why not set up the prd DB to be replicated and then run your triggers on the replicated DB to log the change records. That way prd is not impacted by the creation of the change records.

You would then most likely ditch your reporting database in favour of using the replicated one.

That's been suggested to me as well, but the reporting system does a number of other things that make this not such a great option.

aggle-rithm · Feb 13, 2014

Dessi said:
I would not attempt a fix on the deadlock issue without first tracing and identifying why the deadlock occurred in the first place. If you are using SQL Server, you can enable traceflags 1204 and 1222 to detect deadlocks and print out a diagnostic output of the event.

Yep...I've done that. Got some useful information.

After the changes I've made, the deadlock situation has improved quite a bit. The deadlocks I see now are demonstrably NOT MY FAULT. (an awesome feeling!)

aggle-rithm · Feb 13, 2014

GlennB said:
It's a long time since I worked with sql, and things have undoubtedly changed too, so forgive me if this is a stupid question -

Why the 2-stage insert/update process into the 'reporting' version of ORDERS? Why not just slap the updated 'production' record in there from the outset, plus the 'update' change type?

I just wanted the trigger to be as minimal as possible. I'm operating under the assumption (possibly incorrect) that saving a record of three or four columns is less resource-intense than one of fifty columns.

The beauty of the second step is that it is asynchronous, so it doesn't add to the duration of the transaction.

GlennB · Feb 13, 2014

aggle-rithm said:
I just wanted the trigger to be as minimal as possible. I'm operating under the assumption (possibly incorrect) that saving a record of three or four columns is less resource-intense than one of fifty columns.

You mean in terms of the performance of the production d/b which is working harder under everyday demand than the reporting d/b? Maybe, though we seem to be looking at a single write in the case I suggest - as we already have the data cached - vs. a write+read+write (write to the reporting CHANGES table, then read from the original production d/b and then write that stuff into reporting) in the 2-stage scenario.

Usual caveats apply. Long time since I worked in this field. I quite miss it, in a funny way - used to really enjoy exercising my brain over this kind of stuff. Beats the hell out of arguing with CTers, though Scrabble can be fun

Octavo · Feb 14, 2014

GlennB said:
Usual caveats apply. Long time since I worked in this field. I quite miss it, in a funny way - used to really enjoy exercising my brain over this kind of stuff. Beats the hell out of arguing with CTers, though Scrabble can be fun

I'm sure after a few months, you'd be just dying to get into a good semantics war with a poe - grass always greener and all that

quixotecoyote · Feb 14, 2014

Sacrifice long-term development and go for an early blitz.

aggle-rithm · Feb 14, 2014

GlennB said:
Usual caveats apply. Long time since I worked in this field. I quite miss it, in a funny way - used to really enjoy exercising my brain over this kind of stuff.

A lot of people are already considering it obsolete technology. No-SQL is all the rage these days, but I find it much simpler to crunch data using the tried-and-true method.

Which is a better SQL strategy?

aggle-rithm

Ardent Formulist

aggle-rithm

Ardent Formulist

Octavo

Illuminator

Aepervius

Non credunt, semper verificare

Octavo

Illuminator

GlennB

Loggerheaded, earth-vexing fustilarian

Dessi

Species Traitor

aggle-rithm

Ardent Formulist

aggle-rithm

Ardent Formulist

aggle-rithm

Ardent Formulist

GlennB

Loggerheaded, earth-vexing fustilarian

Octavo

Illuminator

quixotecoyote

Howling to glory I go

aggle-rithm

Ardent Formulist