Pages

Sunday, June 5, 2011

How to compare the record differences of two similar tables - Part 2 of 2

Permalink: http://bit.ly/1ztV5sU



The rationale behind comparing tables versus using a CHECKSUM TABLE statement can be found in the first part of this entry.

Comparing the record differences of two similar tables can be useful when transferring records from an old database to a new one or when comparing backup tables against the original tables. Depending on specific requirements, it may be necessary to validate that the transfer was successful or to see which specific data in the records of the original and in-use tables have been updated, inserted, or deleted when compared to the backup. The query in the stored procedure below will show the differences caused by updates to current records as well as the record differences caused when new records are inserted or old ones are deleted in one table but not in the other:

DELIMITER $$
DROP PROCEDURE IF EXISTS `compareTables` $$
CREATE DEFINER=`root`@`localhost` PROCEDURE `compareTables`(
  dbTableNameA VARCHAR(133),
  dbTableNameB VARCHAR(133))
BEGIN
  DECLARE dbTableA, dbTableB VARCHAR(129);
  DECLARE dbA, tableA, dbB, tableB VARCHAR(64);
  DECLARE columnList TEXT;
  SET dbTableA = REPLACE(dbTableNameA, '`', '');
  SET dbTableB = REPLACE(dbTableNameB, '`', '');
  SET dbA = LEFT(dbTableA, LOCATE('.', dbTableA) - 1);
  IF (dbA = '')
  THEN
    SET dbA = DATABASE();
  END IF;
  SET tableA = RIGHT(dbTableA, LENGTH(dbTableA) - LOCATE('.', dbTableA));
  SET dbB = LEFT(dbTableB, LOCATE('.', dbTableB) - 1);
  IF (dbB = '')
  THEN
    SET dbB = DATABASE();
  END IF;
  SET tableB = RIGHT(dbTableB, LENGTH(dbTableB) - LOCATE('.', dbTableB));
  SET columnList = getColumnList(dbA, tableA);
  SET @compareTablesStmt = CONCAT(
  'SELECT MIN(`compareTables`) AS \'Tables Compared\', ',
     columnList, '
  FROM (
    SELECT \'', dbTableNameA, '\' AS `compareTables`, ',
     columnList, '
    FROM `', dbA, '`.`', tableA, '`
    UNION ALL
    SELECT \'', dbTableNameB, '\' AS `compareTables`, ',
     columnList, '
    FROM `', dbB, '`.`', tableB, '`)
  AS `compareTables`
  GROUP BY ', columnList, '
  HAVING COUNT(*) = 1'
  );
  PREPARE compareTablesStmt FROM @compareTablesStmt;
  EXECUTE compareTablesStmt;
  DEALLOCATE PREPARE compareTablesStmt;
END $$
DELIMITER ;

The stored procedure can be called by passing the two tables to be compared as parameters and can be in the forms: `db`.`table`, db.table, `table`, or table. Passing the names wrapped in backticks and specifying from which database the table is from is optional. If no database is specified, the SP will use the current one. The SP is dependent on a user defined function to get the column names dynamically. The getColumnList() function can be found here.

For TEMPORARY TABLES, you will have to manually input the columns of the tables compared. This cannot be done dynamically purely inside MySQL since as stated in a previous blog entry: "TEMPORARY TABLEs are invisible from the `information_schema` database (as of MySQL version 5.6)".

Here is a sample call:

 CALL compareTables(
 '`origDB`.`origTable`', 'backupTable'
 );

And a sample result set:

Tables ComparedcustomerNumbercontactLastNamecontactFirstName
`origDB`.`origTable`103fooCarine
backupTable103SchmittCarine
`origDB`.`origTable`999foobarfoobarz



The first two rows compares the records with the Primary Key or ID = 103 and shows that the data under `contactLastName` are different between the original and backup tables. The third row shows a record that is present in the original table but not in the backup table.

See part 1 of this entry for the rationale behind the SELECT statement using a UNION ALL clause used by the stored procedure.

9 comments:

Anonymous said...

This procedure is very useful. The function 'getColumnList' in the procedure works just great.

Thanks for sharing.

0v34c10ck said...

@Anonymous

Thanks. I'm glad you found this entry useful.

There are other equally useful stored procedures and functions on this blog, just use the SEARCH THIS BLOG gadget and type in the keywords you'd like to search for. You'll find tabbed results from the entire blog, anything linked from this entry, and the web.

shantanu said...

Thanks for sharing. It is very useful. But it would have been great if it returned only the primary key and the column that is different (in case if there is only column that has been changed in other cases it should return all as it is doing currently).

0v34c10ck said...

@shantanu

Yes it can be done. One way is to modify the above SP to instead CREATE a TEMPORARY TABLE AS using the dynamic SELECT statement.

Then create a dynamic SQL statement inside a flow control statement that will go through each column of the TEMPORARY TABLE to check for the EXISTence of differences and code an IF-THEN-ELSEIF construct based on the conditions you mentioned. One possible return will then be a dynamic SELECT statement with the PRIMARY KEY and the columns that are found to have differences.

You can dynamically retrieve the PRIMARY KEY using the formulas from here.

Anonymous said...

Thank you very much for the stored procedure.

Is there a way to output the results into a new table, previously created (SELECT INTO xpto) but creating an autoincrementing primary key?

Anonymous said...

It's not a terrible approach, but you could probably just use a checksum to see matches unless you're under heavy load. And sometimes...if you have a 1 TB data store or something like that...you'll have to run it a few times.

0v34c10ck said...

Thank you for your feedback. As stated in part 1:

"The CHECKSUM TABLE result sets of two similar tables only indicate if there are differences between the two tables. It does not tell you what exactly the differences are between the two. Does `tableA` have an updated value for one of its records that `tableB` does not have? Or does `tableA` have an extra row that `tableB` does not have? Depending on specific business requirements, the CHECKSUM TABLE statement may be sufficient. If you need to determine what the actual differences are, there is a way to go about doing this automatically and dynamically by creating a stored procedure."

See How to compare the record differences of two similar tables - Part 1 of 2

Anonymous said...

great post...
will be more helpful if you can set ignore columns

0v34c10ck said...

Yes, I can see why that would be helpful. If I get a chance to write the modified SP, I'll share it. Thank you.