Monday, July 10, 2017

Microsoft 70-487: Implement transactions

Exam Objectives

Manage transactions by using the API from System.Transactions namespace; implement distributed transactions; specify transaction isolation level 



Quick Overview of Training Materials


My post: Query data using ADO.NET
My code samples on GitHub

Transaction Basics


My old database textbook describes a transaction as "any one execution of a user program".  If we think of a "user program" in this context as one or more database operations (select, inserts, deletes, stored proc execution, what have you), then we get a bit close to the colloquial sense of "transaction" as an indivisible collection of such operations.  Two of the big stories that transactions address in the database space are concurrency control and crash recovery.

Multiple transactions can run concurrently on a database system, and like operation reordering that occurs at the CPU level, the operations of these transactions may be reordered by the DBMS for optimization purposes.  And just like concurrency in programs, these concurrent operations can produce unintended consequences if we aren't careful.

Because computer systems can crash for a multitude of reasons, it is important that database systems are resilient to these transient failures.  If a transaction is interrupted, it is important that a partial result is not recorded in the system, potentially creating violations to constraints or just generally leaving the data in a bad state.

Transactions provide a number of guarantees to address these concerns. Like all good computer concepts, these are grouped into a handy-dandy acronym, ACID:
  • Atomicity: All of the operations in a transaction must succeed for the transaction to be committed, otherwise the transaction must be rolled back and any partially finished operations undone.  Since operations in a transaction are usually interdependent, partial completion could leave the data in an invalid state.
  • Consistency: Transactions should transition from one valid state to another valid state.  This property is only partially enforceable at the datastore level. for things such as triggers, constraints, etc.  The MSDN ACID article, as well as the DB textbook, both put the responsibility for ensuring consistency on the application developer, while the Wiki page for ACID states that the consistency guarantee basically only applies to the rules defined on the database, and not on the application specific validation requirements.
  • Isolation: Concurrent transactions cannot affect each others state. Maintaining isolation is important to avoid the effects of Read Phenomena:
    • Dirty reads - when transaction A reads data that is being modified by transaction B before B commits changes.
    • Non-repeatable reads - when the same value is read more than once during the same transaction and the value changes because it was updated by another transaction.
    • Phantom reads - similar to a non-repeatable read over a set of results.  When two identical queries produce a different set of rows over the course of a single transaction, because another transaction has inserted rows that meet the query criteria.
  • Durability:  Any transaction that is committed will remain committed, even if the system crashes.  
Within a single resource, transactions can be committed by writing them to a non-volatile storage medium (i.e. the "transaction log").  In transactions spanning multiple resources, a distributed transaction can use a protocol, such as the two-phase commit protocol, to acknowledge commitment. In the two phase model, all resources will "vote" on whether a transaction can be commited or not.  If the ayes have it (unanimous "yes" vote), then the transaction is committed, otherwise it is rolled back. 

The PluralSight course and Introducing System.Transaction whitepaper go into great depth regarding the transaction infrastructure in the Windows environment (even beyond .NET).  Three important concepts to making transactions work are the resource managers, the transaction manager, and the transaction protocol.  Resource managers, which can be either durable or volatile (in-memory), manage the databases, message queues, and other resources that transactions will ultimately interact with.

These resource managers work in coordination with the transaction manager, which for our purposes may be one of two systems:  the Lightweight Transaction Manager (LTM) introduced in .NET 2.0 (for basic, simple transactions), or the Distributed Transaction Coordinator (DTC).  The LTM is basically a pass-through mechanism, delegating transaction management to the underlying implementation present in the durable resource manager.  If more than one durable resource is included in the transaction, or a resource doesn't support LTM, then the transaction is promoted (or escalated) to use the DTC. There may be times when this is unintended and undesirable, so it is useful to understand what can trigger promotion and how to avoid it if need be.  There is also a kernel transaction manager that is used to work with file system or the registry.

Finally, different resource environments will require different transaction protocols.  I thought the PluralSight course also did a good job covering the different protocols in use.  In particular, the LTM uses its own lightweight protocol, and the DTC can use either OleTx or WSAtomicTransaction (WSAT). The lightweight protocol only works with a single durable resource, and only works within the same application domain.  OleTx works across machines and domain boundaries, but is limited to the Windows platform, and apparently has trouble with firewalls.  WSAT has similar capabilities as OleTx, with the addition of being cross platform. 

Modern versions of Windows come with DTC

The above properties describe Atomic, or short lived, transactions.  There is another class of transaction, mentioned briefly in the PluralSight course, called long-running transactions.  The MSDN article on the subject is actually tied to BizTalk, which facilitates these kinds of transactions, but it still outlines the general concept.  Long running transactions might have one or more manual processes involved (such as a management sign off) that preclude finishing the transaction in the millisecond scale timespans typically involved in an atomic transaction.  Because of this, the isolation and atomicity qualities of atomic transactions don't apply.

Without atomicity and isolation, it is necessary for long running transactions to deal with certain issues in a novel way.  These transactions have the concept of a compensation, which is essentially a rollback or undo feature.  Because it is not possible to lock resources for the entire duration of a long-running transaction, concurrent transactions may experience some of the read phenomena described above under "isolation".  Thinking about state for these transactions in the context of eventual consistency.



Using System.Transactions


There are several ways to handle transactions using the facilities of System.Transactions.  The first is to manually create and use a Transaction, calling Commit() or Rollback() depending on whether the code succeeds or fails.  The second, and arguably much easier solution, is to use a transactions "context" provided by the TransactionScope class.  This is almost always shown inside a using() block; commitment requires calling .Complete() on the scope object, whereas failing to do so will cause transactions to roll back.


With manual transactions


There is a short tutorial on MSDN titled "Implementing an Explicit Transaction using CommittableTransaction".  It came in handy when I couldn't figure out why my transaction wasn't showing up in my SQL Server Profiler trace (I didn't make the call to EnlistTransaction).  This was after I figured out that you have to actually call Open() if you want anything interesting at all to happen haha.  Once the kinks were worked out, it's pretty straight forward to use the CommittableTransaction class to create a transaction:


static void Main(string[] args)
{
    string connectionString = ConfigurationManager
        .ConnectionStrings["Default"].ConnectionString;
    using (var conn = new SqlConnection(connectionString))
    {
        conn.Open();
        var tx = new CommittableTransaction();
        conn.EnlistTransaction(tx);
        try
        {
            SqlCommand cmd = new SqlCommand(
                    @"SELECT COUNT(*) FROM Person.Person WHERE PersonType = 'EM'", conn);

            var count = (int)cmd.ExecuteScalar();

            tx.Commit();
        }
        catch(Exception ex)
        {
            tx.Rollback();
        }
    }
}


The EnlistTransaction() method on the connection should not be confused with the EnlistDistributedTransaction() method, which is part of the EnterpriseServices version of transactions, and is a whole other ball of wax (old COM+ stuff).  I played around with it a little but I think there are installs that would need to happen... and I'm not going there.


With TransactionScope


Similar to the CommitableTransaction tutorial referenced above, there is a corresponding tutorial on TransactionScope called "Implementing an Implicit Transaction using Transaction Scope". While really all I did was to copy the code from above and swap out a using() block with a transaction scope for the manual transaction bits, I did make one other change that made things a little more interesting. The tutorial includes adding another nested connection and query to force a promotion to a DTC transaction, which I thought would be interesting to see in SQL Server Profiler.  My first attempt at running the code failed because I hadn't started the DTC service (derp), but when I corrected that (thank you Stack Overflow), the results were interesting.


static void Main(string[] args)
{
    using (TransactionScope scope = new TransactionScope())
    {
        string connStandard = 
            ConfigurationManager.ConnectionStrings["Default"].ConnectionString;
        string connExpress = 
            ConfigurationManager.ConnectionStrings["Express"].ConnectionString;
        using (var conn = new SqlConnection(connStandard))
        {
            conn.Open();

            try
            {
                SqlCommand cmd = new SqlCommand(
                   @"SELECT COUNT(*) FROM Person.Person WHERE PersonType = 'EM'", conn);

                var count = (int)cmd.ExecuteScalar();

                //open up a second connection to a different database
                using (var conn2 = new SqlConnection(connExpress))
                {
                  conn2.Open();
                  SqlCommand cmd2 = new SqlCommand(
                   @"SELECT COUNT(*) FROM Person.Person WHERE PersonType = 'EM'", conn2);

                  var count2 = (int)cmd2.ExecuteScalar();
                }

                scope.Complete();
            }
            catch (Exception ex)
            {
                Debug.WriteLine(ex.Message);
            }
        }
    }
}


When the connection is opened to the second database (this one running in SQL Express), the transaction is promoted to a distributed transaction involving DTC.  Notice that the distributed transaction ID ("TextData" in both windows) is the same in both traces.  While not visible, each database actually has its own unique value in the "TransactionID" column, which I found a little surprising.



Implementing Distributed Transactions


In classic fashion, the treatment of the subject of distributed transactions in the Exam Ref is woeful. It's one page of mansplaining a use case scenario that is too vague to have any value.  A much better resource for learning the mechanisms involved in a distributed transaction is the PluralSight course Understanding .NET and WCF Transactions.  The NPTEL lectures cover the Two Phase commit protocol in some depth, which is helpful for understanding how distributed transactions work in practice.

There are several actors involved in transactions, and the number and type of these actors determines how transactions are managed and whether they are treated as distributed.  The transaction manager coordinates transactions on the local machine as well as between machines. The resource manager controls some resource involved in a transaction (such as a database or message queue).  The transaction protocol controls how nodes involved in the transaction communicate and ultimately commit or abort.

Simple transactions that involve only one durable resource (like a database) may start out using the Lightweight Protocol (as long as that resource supports it).  If the resource does not support LWP (like MSMQ or most non-Microsoft DBMS), then the transaction will automatically be promoted to a distributed transaction managed by MSDTC, though the protocol may still vary depending on the resource managers involved.

We can see this promotion process in the debugger.  Initially the transaction has no distributed transaction id, and it's state is "active".  After being promoted to a distributed transaction (with the addition of another resource), the distributed id is populated and the state of the transaction is now "delegated" (represented by the internal class TransactionStateDelegated):

Before promotion:




After promotion:


Also notice that the transaction is an OleTx transaction.  Within a Windows environment, this is what we would expect.  However, for WS-* based web services, WS-AtomicTransaction (WSAT) may be the protocol in use, depending on the interoperability needs of the application. The PluralSight course does a quick little demo on it, and the MSDN article on using WS-AtomicTransaction briefly explains why you might use it.

I did also find an article on implementing your own resource manager.  It's an old article, but it walks through the process of creating a durable or volatile resource manager. Implementing a resource manager involves implementing the IEnlistmentNotification interface.  This interface includes callback methods to enable it to participate in two phase commit transaction.  These methods are:


  • Commit - notifies that transaction is being committed
  • InDoubt - notifies that the transaction is in doubt
  • Prepare - notifies that the transaction is being prepared for commitment
  • Rollback - notifies that the transaction is being rolled back or aborted

The "remarks" section on each of the MSDN pages for the above methods went a long way toward explaining exactly what these notifications mean in context.  Each method takes some form of Enlistment object, which "facilitates communication between an enlisted [resource manager] and the transaction manager..."



Set Isolation Level


The "isolation level" of a transaction controls how much interaction concurrent requests can have with the resources making up the transaction.  The MSDN article on Transaction Isolation Levels explains the different levels and how they relate to the "read phenomena" touched on above.  That article lists four isolation levels (in order from least to most isolation):

  • Read Uncommitted - Transactions are not isolated from each other. A Read Uncommitted transaction is usually read-only.  Any of the read phenomena are still possible.
  • Read Committed - read locks are held on the current row until the transaction moves off that row, write locks are held until the transaction is committed or rolled back.  This isolation level prevents dirty reads.
  • Repeatable Reads - read locks held on all rows returned, write locks on all rows inserted, updated, or deleted.  Because other transactions cannot update or delete these rows, this isolation level prevents unrepeatable reads.
  • Serializable - read and write locks are held on the entire range of rows affected.  This means that all rows meeting criteria X are locked, and inserts or updates that would result in additional rows meeting the same criteria are prevented.  This prevents phantom reads.
The MSDN article (as well as at least one other blog I've seen on the subject) presents a simple table relating the above isolation levels to the read phenomena that can occur at that isolation level:



Setting the transaction isolation level in code is not terribly complicated.  For System.Transactions based transactions, as well as SqlTransaction transactions (System.Data), it just involves setting a value corresponding to an IsolationLevel enumeration (one for System.Data, one for System.Transactions).  Both enumerations have the same values, just slightly different descriptions in their respective MSDN entries:

  • Chaos - pending changes from more highly isolated transactions cannot be overwritten
  • ReadCommitted - volatile data cannot be read during a transaction, but can be modified
  • ReadUncommitted - volatile data can be read and modified during a transaction
  • RepeatableRead - volatile data can be read but not modified, and new data can be added
  • Serializable - similar to RepeatableRead, but no new data can be added
  • Snapshot - a version of data is stored that one transaction can read while another is modifying the same data.
  • Unspecified - a different isolation level than the one specified is being used (results in error)


The article for the System.Transactions version is pretty sparse, however the System.Data article includes a lengthy bit of demo code that demonstrates most of the levels (everything but Chaos and Unspecified), and how dirty reads, unrepeatable reads, and phantoms may be allowed by each level. I made a slight tweak to point at SQLEXPRESS instead of (local), and to wait on a Console.ReadLine() at the end of execution, but otherwise I was able to run it without error.

So, I won't belabour reproducing that demo here, but I did modify the other transaction demos to change the isolation level from the default (Serializable) to ReadCommitted.  For the System.Transactions based stuff, this involves creating a TransactionOptions object, setting the isolation level, and passing it to the constructor for CommitableTransaction or TransactionScope. For SqlTransaction, you set the value from the System.Data.IsolationLevel enum directly in the constructor.


//Create a TransactionOptions object, used to set isolation level 
//on System.Transactions transactions 
var txOptions = new TransactionOptions();
txOptions.IsolationLevel = 
    System.Transactions.IsolationLevel.ReadCommitted;

//Setting options on a new CommittableTransaction
var tx = new CommittableTransaction(txOptions);

//Setting options on a new TransactionScope
using (TransactionScope scope = 
    new TransactionScope(TransactionScopeOption.Required, txOptions))

//Creating a new SqlTransaction with Isolation Level
//conn is a SqlConnection object
SqlTransaction sqlTx = 
    conn.BeginTransaction(System.Data.IsolationLevel.ReadCommitted);


By floating over the transaction scope object in the debugger, it's possible to drill down into the underlying transaction and see that the isolation level is set to ReadCommitted.



With the CommittableTransaction, on the other hand, there is one less level to drill down into, but otherwise it looks pretty much the same as the TransactionScope.  The SqlTransaction object is set up a little differently, but ultimately it's about the same difference.






No comments:

Post a Comment