Pages

Showing posts with label Functions. Show all posts
Showing posts with label Functions. Show all posts

Sunday, June 12, 2011

Truly Random and Complex Password Generator - Part 2 of 2

Permalink: http://bit.ly/1tMM9h2



In the first part of this entry, it was shown how its possible that a password from a normal user would significantly be weaker than that from a complex and randomly generated one.

Note: in the strictest sense, there is no such thing as an uncrackable password. Passwords can be uncrackable only in theory, i.e. the mathematical probability of a password being guessed correctly-- no matter how infinitesimally small the odds are, the possibility of a right guess is always present. Also, passwords are uncrackable only technically-- given enormous resources and time any password can be cracked.

Here is a function for a truly random and complex password generator which is based on the formulas given in the first part of this entry:

DELIMITER $$
DROP FUNCTION IF EXISTS `randomPasswordGenerator` $$
CREATE DEFINER=`root`@`localhost` FUNCTION `randomPasswordGenerator`(
  ) RETURNS varchar(64) CHARSET utf8
BEGIN
  DECLARE charCount TINYINT(1) DEFAULT 0;
  DECLARE charDiceRoll TINYINT(2);
  DECLARE randomChar CHAR(1);
  DECLARE randomPassword CHAR(8) DEFAULT '';
  REPEAT
    SET charCount = charCount + 1;
    SET charDiceRoll = 1 + FLOOR(RAND() * 94);
    IF (charDiceRoll <= 32)
    THEN
      SET randomChar = ELT(charDiceRoll,
      '`', '~', '!', '@', '#', '$', '%', '^',
      '&', '*', '(', ')', '-', '=', '_', '+',
      '[', ']', '{', '}', '\\', '/', '|', '?',
      ';', ':', '\'', '"', ',', '.', '<', '>');
    ELSEIF (charDiceRoll >= 33)
      AND (charDiceRoll <= 68)
    THEN
      SET charDiceRoll = charDiceRoll - 33;
      SET randomChar = CONV(
        charDiceRoll,
        10, 36);
    ELSE
      SET charDiceRoll = charDiceRoll - 59;
      SET randomChar = LOWER(
        CONV(
          charDiceRoll,
          10, 36)
      );
    END IF;
    SET randomPassword = CONCAT(randomPassword, randomChar);
  UNTIL (charCount = 8)
  END REPEAT;
  RETURN randomPassword;
END $$
DELIMITER ;

This function will return an 8-character password string. Each character has an equal chance of 1/94 to be generated. Given a short period of time and a normal amount of resources, this qualifies as a theoretical technically uncrackable password. It can be modified to return a longer password length or even a random length, say between 8-12 characters long. A separate user defined function, randomRangePicker(), can be used, if refactoring is desired.



The output can be checked with a simple SELECT statement:

SELECT randomPasswordGenerator();

See the first part of this entry or a similar random string/name generator.

Wednesday, June 8, 2011

Truly Random and Complex Password Generator - Part 1 of 2

Permalink: http://bit.ly/1pJlpHz



Skip to the 2nd part for the code snippet.

Its an important matter of security to enforce complex passwords that have a sufficient length. From personal experience, if you ask a normal user to create their own passwords, their passwords will be based on a character set consisting of 36 case-insensitive alphanumeric characters: a-z, 0-9 instead of the full 94 character set typable on all keyboard layouts. Also, most normal users would use dictionary based passwords with a predictable pattern: dictionary words at the beginning and numbers at the end.

Relying solely on the client-side or front-end to enforce the creation of passwords of at least 8 characters long and the use of special characters will not be practical in preventing the use of dictionary words as well as the usage of a certain pattern. Whatever the mechanism is on the client-side, the backend MySQL database should complement it.

Assigning complex passwords to users will, in effect, increase the number of characters from 36 to 94. By making the password randomly generated, the predictability of dictionary words and pattern matching is removed. The number of possible passwords is substantially increased. For an 8-character password string, under a reasonable time limitation, say 6 hours, and using a single modern computer, this results to a theoretical technically uncrackable password:

SELECT FORMAT(POW(32, 8), 0); 
  -- Results to 1,099,511,627,776 possible combinations. Note that the number of possible combinations is greatly reduced when the user limits the password to use dictionary words and pattern matching. This results to a crackable password in a short period of time.  

  
 SELECT FORMAT(POW(94, 8), 0); 
  -- Results to 6,095,689,385,410,816 possible combinations. By being randomly generated, the number of combinations is not reduced as explained above. This results to a theoretical technically uncrackable password given a short period of time.  

A password generator, to be truly random, should satisfy the following:
  • The character set for the generator should include all the typable characters on any keyboard layout: 

    a-z, A-Z, 0-9,
    and ` ~ ! @ # $ % ^ & * ( ) - = _ + [ ] { } \ / | ? ; : ' " , . < >

    This results to 26 + 26 + 10 + 32 = 94 characters.
  • Each of the allowed characters should all have an equal chance of being generated.

For practical purposes, we'll take aside arguments on password complexity versus password length, and we'll assume an 8-character password string. To generate any of the 62 alphanumeric characters, we'll use a base 36 statement as the formula:

SELECT CONV(
          FLOOR(
            RAND() * 36),
      10, 36);

Using a base 36 statement gives us the most compact alphanumeric numeral system. The case sensitivity will be based on odds from a random number range in order to include the LOWER case of the alphabet.

The special characters can be generated by using the ELT function as the basis for the formula like:

SELECT ELT(1 + FLOOR(RAND() * 32),
      '`', '~', '!', '@', '#', '$', '%', '^',
      '&', '*', '(', ')', '-', '=', '_', '+',
      '[', ']', '{', '}', '\\', '/', '|', '?',
      ';', ':', '\'', '"', ',', '.', '<', '>');




In the continuation of this entry is an example of a true random and complex password generator function.

Sunday, May 29, 2011

A better way to get Primary Key columns

Permalink: http://bit.ly/1o0NdpY



When an application asks MySQL for the Primary Key of a table, there are several ways to go about doing this. A fast way would be to use these statements:

DESCRIBE `dbName`.`tableName`;
-- or
SHOW INDEX FROM `dbName`.`tableName` 
WHERE `Key_name` = 'PRIMARY';

The result set would have to be parsed in order to get the column names. This is not a recommended way to get the PK columns due to its limited usefulness as the column names cannot be returned INTO a variable.

Another method often used is this SELECT statement that uses a table JOIN:

SELECT k.`COLUMN_NAME`
FROM `information_schema`.`TABLE_CONSTRAINTS` t
JOIN `information_schema`.`KEY_COLUMN_USAGE` k
USING (`CONSTRAINT_NAME`, `TABLE_SCHEMA`, `TABLE_NAME`)
WHERE t.`CONSTRAINT_TYPE` = 'PRIMARY KEY'
 AND t.`TABLE_SCHEMA` = 'dbName'
 AND t.`TABLE_NAME` = 'tableName';

Depending on the number of tables and databases, this method may be slightly slower than the first method shown above. However, this method is more flexible as it allows a return of the output INTO a variable.

A better way to get the Primary Key columns is this SELECT statement that does not use a table JOIN:

SELECT `COLUMN_NAME`
FROM `information_schema`.`COLUMNS`
WHERE (`TABLE_SCHEMA` = 'dbName')
  AND (`TABLE_NAME` = 'tableName')
  AND (`COLUMN_KEY` = 'PRI');

Limitations: This is faster than the second method that uses a table JOIN. However, only the first method using DESCRIBE or SHOW INDEX can retrieve the PK columns of a TEMPORARY TABLE. This is because TEMPORARY TABLEs are invisible from the `information_schema` database (as of MySQL version 5.6).

Using the above formula, a function that retrieves the PK columns by passing the database and table names as parameters can be done. See the snippet below:

DELIMITER $$
DROP FUNCTION IF EXISTS `getPKColumns` $$
CREATE DEFINER=`root`@`localhost` FUNCTION `getPKColumns`(
  dbName VARCHAR(64),
  tableName VARCHAR(64)) RETURNS text CHARSET utf8
BEGIN
  DECLARE PKColumns TEXT;
  SELECT GROUP_CONCAT(`COLUMN_NAME` SEPARATOR '`, `')
  FROM `information_schema`.`COLUMNS`
  WHERE (`TABLE_SCHEMA` = dbName)
    AND (`TABLE_NAME` = tableName)
    AND (`COLUMN_KEY` = 'PRI')
    INTO PKColumns;
  SET PKColumns = CONCAT('`', PKColumns, '`');
  RETURN PKColumns;
END $$
DELIMITER ;

Here's a sample that uses a SELECT statement to pass the database and table names in order to use the function:

SELECT getPKColumns('db', 'table');

The Primary Key column names returned are wrapped in backticks in case the name of the column turns out to be a reserved word. The function will correctly wrap the backticks even for cases where the table has multi-column Primary Keys. This implementation allows the function's return to be usable in many different scenarios such as in dynamic SQL statements. The output will be like:

`primaryKeyColumn1`, `primaryKeyColumn2`



Since Primary Keys are typically set up to be a single column or for lookup tables usually no more than two columns, the GROUP_CONCAT function can safely rely on the default value of group_concat_max_len which is at 1024 characters. There is no need to increase group_concat_max_len since a column name cannot exceed more than 64 characters which means there is very little risk of the result being truncated should the character count of the output exceed 1024.

Wednesday, May 25, 2011

Safe DML Options

Permalink: http://bit.ly/VNLTQe



The Safe DML project provides automatic creation of backups and an undo functionality for MySQL. These two abilities do not rely on the command line shell and can simply be executed by queries.

$DML() Options

Inside the stored procedure, $DML(), you can find the following options that can be set:

-- Switches logging on/off
DECLARE logging BOOLEAN DEFAULT FALSE;
-- Clears the logs per call
DECLARE clearLogs BOOLEAN DEFAULT TRUE;
-- Set to FALSE to backup only the current db in use
DECLARE backupAllDB BOOLEAN DEFAULT TRUE;
-- Disables filtering out of unsupported statements
DECLARE dmlFilter BOOLEAN DEFAULT TRUE;

  • The logging option enables/disables logs written by Safe DML into the `debug` table in the `$backup` database. Logging is useful for development work when adding new features or updating Safe DML. Disable for production use to improve performance.
  • Additionally, you can set logs to clear for each time $DML() is used. Enable this option only when needed and only during development work since keeping this BOOLEAN set to TRUE can cause the `debug` table to consume too much disk space over time especially when used in a production environment.
  • $DML() can be used for just the current database by setting this option to FALSE. This is also useful for keeping disk space consumption of backup tables low if only the current database needs an UNDO functionality.
  • Statements not supported by Safe DML has an option to be filtered out from being passed. Only INSERT, UPDATE, DELETE, and REPLACE statements are currently supported. Since SELECT and SHOW statements do not modify data, you can pass these statements into $DML() if so desired but these will not be recorded in the `history` table inside the `$backup` database. 

Other statements that are allowed to be used inside transactions will be supported in a future version. Options on the size restriction on the maximum number of `history` table records and flushing schedule of table backups are under development.

See the change logs for updates on new features.

$UNDO() Options

For the $UNDO() stored procedure, you can also set its options for logging and clearLogs. To invoke an undo command, you can either use the UNDO keyword by calling the $DML() stored procedure or call the $UNDO() stored procedure directly.

To UNDO the last statement:

CALL $DML("UNDO");
-- or
CALL $DML("UNDO LAST STATEMENT");
-- or
CALL $UNDO('');
-- or
CALL $UNDO('LAST STATEMENT');

To UNDO all changes:

CALL $DML("UNDO ALL")
-- or
CALL $UNDO('ALL');

To UNDO multiple statements or to a specific point in time, first determine the commands you want to revert by parsing the `history` table for the id of the final command to be reverted:

SELECT * FROM `$backup`.`history`;
-- take note of the dmlStringId

Then call the desired stored procedures, where n is the dmlStringId of the final command to be reverted.

CALL $DML("UNDO n");
-- or
CALL $UNDO(n);



For example, there are 3 records in the `history` table with dmlStringIds 1, 2, and 3. The following call will have the effect of undoing the 3rd command in the record, then the 2nd command, in sequence, while keeping the 1st command with dmlStringId = 1, untouched:

CALL $DML("UNDO 2");

To be continued.

Monday, May 23, 2011

Safe DML

Permalink: http://bit.ly/1vSmnGm



There is no native undo ability inside MySQL. Thus, the common methods to workaround the problem is through creating backup dumps and enabling binary logging, using transactions, and requiring the WHERE clause in Data Manipulation Language commands by using the safe updates option. These methods have drawbacks:
  1. Creating backups via mysqldump and using binary logging to revert to a point in time will have the same effect as an undo functionality. However, these are executed via the command line shell. Since these tools are not executed inside MySQL, this method may not be convenient and presents limitations on when it can be used.
  2. Transactions allow you to "undo" as long as you have not committed your data manipulation changes. Imagine if you discover data manipulation changes that you wish to undo after the last transaction commit. It is impossible.
  3. Using the safe updates option when running MySQL is meant to be a way to prevent undesirable changes by requiring the WHERE clause. Requiring the WHERE clause does not provide an undo ability.

Without a feasible undo ability, mistakes can lead to frustrations and several hours of work lost to having to manually re-enter previous data.

This small project is aimed to provide the missing undo ability, all inside MySQL and allows the user to undo previous DML changes independent of the session. To fully automate the undo functionality, the DML command needs to be contained in a stored procedure. The example below shows a data manipulation statement encapsulated in quotation marks so that single quotes can be used by the statement itself. The statement is passed as a parameter of the $DML() SP:

CALL $DML(
  "UPDATE `dbName`.`tableName`
  SET `columnName` = 'updateValue'
  WHERE `primaryKey` = 100"
);

The Safe DML stored procedure, $DML(), automates the creation of backups by copying table records into a created database named `$backup`. The undo functionality can be invoked by calling the $UNDO() stored procedure:

CALL $UNDO('');
-- or
CALL $DML('UNDO');

This reverts the last DML command. There are options to undo all previous DML commands, by UNDO('ALL') or just up to a specific number of changes. UNDO options can be found in this blog entry: http://mysql-0v34c10ck.blogspot.com/2011/05/safe-dml-options.html.

How does it work? $DML() will accept any data manipulation statement no matter how complex. It will gracefully handle commands that are in the wrong syntax or that cause errors. In such cases, no changes will take effect and the appropriate MySQL error message will be thrown to indicate what the problem with the statement is.

Before executing the DML statement, $DML() will create snapshots of all tables from all databases. There is an option to create backup copies for only the current database and this is discussed in Safe DML Options. The creation of a snapshot only takes effect when $DML() is called for the first time when the `$backup` database does not exist yet.

Note: The backup copies are considered "snapshots" since these are data copies only. The primary key, indexes, and foreign key constraints of the tables are not backed up. The snapshots are stored in the `$backup` database and if any DDL statements are executed, for example, a new column was added in a table, it is recommended that the `$backup` database be dropped so that it can be appropriately re-created. This flush process may become automated in a future version of this project.

Inside the created `$backup` database, there will be 3 tables that are used by Safe DML: `history`, `tableref`, and `debug`. The `history` table contains records of the DML commands executed using $DML(). The `tableref` table contains records about each table snapshot. The third table, `debug`, contains logs written by Safe DML and is useful for development work when adding new features or updating Safe DML.

$DML() stored procedure does not manipulate any table from any database other than its own `$backup` database. This allows the user to create TRIGGERS and CONSTRAINTS in their tables since Safe DML does not rely on those.



The project, Safe DML, is currently in development. Only INSERT, UPDATE, DELETE, and REPLACE statements are currently supported. This will expand in a future version. 

To be continued.

Saturday, May 7, 2011

True Random Database and Table Name Generator - Part 2 of 2

Permalink: http://bit.ly/QuBLVB



Read part 1 for the rationale behind the code.

As discussed in the first part of this blog entry, we'll be utilizing a statement that uses base 36 to generate the random name. We will be adding the $ and _ characters using the ELT function. Here is a true random database and table name generator:

DELIMITER $$
DROP FUNCTION IF EXISTS `randomNameGenerator` $$
CREATE DEFINER=`root`@`localhost` FUNCTION `randomNameGenerator`(
 ) RETURNS varchar(64) CHARSET utf8
BEGIN
 DECLARE numberOfChars, charDiceRoll TINYINT(2);
 DECLARE charCount TINYINT DEFAULT 0;
 DECLARE randomChar CHAR(1);
 DECLARE randomName VARCHAR(64) DEFAULT '';
 SET numberOfChars = randomRangePicker(1, 64);
 REPEAT
  SET charCount = charCount + 1;
  SET charDiceRoll = randomRangePicker(1, 38);
  IF (charDiceRoll <= 2)
  THEN
   SET randomChar = ELT(charDiceRoll, '$', '_');
  ELSE
   SET charDiceRoll = charDiceRoll - 3;
   SET randomChar = LOWER(
    CONV(
     charDiceRoll,
     10, 36)
   );
  END IF;
  SET randomName = CONCAT(randomName, randomChar);
 UNTIL (charCount = numberOfChars)
 END REPEAT;
 RETURN randomName;
END $$
DELIMITER ;

Total of 38 characters: 36 case-insensitive alphanumeric characters a-z, 0-9, including $ and _. For code portability between Windows and Linux, the random name generated is all in lowercase characters. Each character has a 1/38 chance to be generated.

Note that we utilized the random number range picker function mentioned in a previous blog entry. Here's an explanation of the code:
  • The length of the name is random. Anywhere from 1 to 64 characters long can be generated. The maximum number of characters is based on the value of numberOfChars.
  • A character will be generated based on the result of a dice roll from 1 to 38. The 1 to 38 range represents each of the 38 characters.
  • The name will be created random character by random character by the CONCAT function until the length is equal to numberOfChars.

Here's a version of the above function that doesn't depend on the random number range picker function:

DELIMITER $$
DROP FUNCTION IF EXISTS `randomNameGenerator` $$
CREATE DEFINER=`root`@`localhost` FUNCTION `randomNameGenerator`() RETURNS varchar(64) CHARSET utf8
BEGIN
 DECLARE numberOfChars, charDiceRoll TINYINT(2);
 DECLARE charCount TINYINT DEFAULT 0;
 DECLARE randomChar CHAR(1);
 DECLARE randomName VARCHAR(64) DEFAULT '';
 SET numberOfChars = 1 + FLOOR(RAND() * 64);
 REPEAT
  SET charCount = charCount + 1;
  SET charDiceRoll = 1 + FLOOR(RAND() * 38);
  IF (charDiceRoll <= 2)
  THEN
   SET randomChar = ELT(charDiceRoll, '$', '_');
  ELSE
   SET charDiceRoll = charDiceRoll - 3;
   SET randomChar = LOWER(
     CONV(
       charDiceRoll,
       10, 36)
   );
  END IF;
  SET randomName = CONCAT(randomName, randomChar);
 UNTIL (charCount = numberOfChars)
 END REPEAT;
 RETURN randomName;
END $$
DELIMITER ;

You can check out the function's output by:

SELECT randomNameGenerator();



This function can be modified to become a random string generator of fixed length or even to become a random password generator. Additional characters can be added to the ELT function, and the statement that uses base 36 can be converted to randomly generate uppercase and lowercase characters. If numbers are not desired to be in the random name, simply modify the range for the base 36 statement so that it will not generate 0-9.

Back to part 1.



Updated: 10/19/2012

Thursday, May 5, 2011

True Random Database and Table Name Generator - Part 1 of 2

Permalink: http://bit.ly/UZY7xT



Skip to part 2 to go straight to the code snippet.

See also a similar generator: Truly Random and Complex Password Generator

Database names and table names have certain restrictions in MySQL:
  • The maximum name length for both are 64 characters
  • Allowed characters are a-z, A-Z, 0-9, $, and _

It is possible to create a table or database with a dot (.) in its name, however this is not recommended as it will cause some of MySQL's built-in functions to not work as expected.

Using uppercase characters in names are also not recommended. The case sensitivity of the name is dependent on the underlying operating system where the MySQL server is installed. For example, in Linux, the name "dbName" is different from "dbname" but are both the same in Windows. For consistency and to allow the database to be portable between the two, as well as to future-proof your database in case at some point you suddenly need to port from Linux to Windows, using all lower case names is recommended.

A database or table random name generator, to be truly random, should satisfy the following:
  • The length of the name should be random, between 1 to 64 characters
  • The character set for the generator should include all the allowed characters, except uppercase characters since these are not recommended as explained above
  • Each of the allowed characters should all have an equal chance of being generated

Using the MD5() function will not satisfy the above conditions and it won't be a true random string generator. For a simple example:
SELECT MD5(RAND());

The above example will always be 32 characters and its first character will never be 0. Therefore, this violates the condition that every character should all have an equal chance of being generated as the character 0 has no chance of being generated as a first character. However, the main defect here is that the function MD5() uses only A-F and 0-9, a range of only 16 characters.

Another approach is to use the ELT function. This function can be used to generate a random character which can then be concatenated to form a random string. A simple example to create a random character:

SELECT ELT(1 + FLOOR(RAND() * 38), 
  '$', '_', 'a', 'b', 'c', 'd', 'e',
  'f', 'g', 'h', 'i', 'j', 'k', 'l',
  'm', 'n', 'o', 'p', 'q', 'r', 's',
  't', 'u', 'v', 'w', 'x', 'y', 'z', 
  0, 1, 2, 3, 4, 5, 6, 7, 8, 9
);

This statement can be used in a user defined function to create a random string with a random length ranging from 1 to 64. However, this implementation is lengthy and there is a cleaner way to do the same by using the following formula:

SELECT LOWER(
  CONV(
    FLOOR(
      RAND() * 36
    ),
  10, 36)
);

This is a shorter way to generate a-z, 0-9. The characters $ and _ can be added as will be shown in the 2nd part of this blog entry.

Here's an explanation of the formula:
  • Implicit conversion between numbers and strings automatically occurs during expression evaluation at run-time.
  • The CONV statement converts the random integer from a base 10 number to a base 36 number. Using base 36 gives the most compact case-insensitive alphanumeric numeral system. This converts the number to range from 0-9 and A-Z.



The second part of this blog entry will show an example of a user defined function that uses the statement which uses base 36 as a true random database and table name generator.



Updated: 10/15/2012

Sunday, May 1, 2011

Random Number Range Picker

Permalink: http://bit.ly/R5rizQ



A function that can randomly pick an integer from a given range can be used by applications that need a number dice roll and random generation of strings, numbers, and even random generation of complex passwords. Let's say you would like to randomly pick a number from 1 to 10. This gives us 10 choices to randomly pick, not 9 as can be mistakenly thought of since 10 minus 1 equals 9. The same is with the range of choices from 0 to 10. This gives 11 possibilities, not 10. To illustrate:
  • The range of choices should include the value of the lower end of the range
  • It should also include the value of the higher end of the range

To generate a random number from a given lower value and a higher value, use the formula below, where minRange is the lower end of the range and maxRange is the higher end of the range:

minRange + FLOOR(RAND() * (maxRange - minRange + 1))

The formula gives minRange a random integer to add with, where the return from the  RAND() function is passed to the math flooring function. The return from RAND() is multiplied with the difference between minRange and maxRange to set the decimal range that can be randomly created. The + 1 added to the difference between the two values is needed so that the range includes the maxRange number itself. This addition of an extra 1 can easily be missed which results in erroneous results especially when the lower value for the range is a negative number.

To illustrate the common mistakes on how a formula for a simple random range picker often leads to a wrong analysis, let us first fill the values with minRange = 1 and maxRange = 10. This gives us:

SELECT 1 + FLOOR(RAND() * 10);

The common mistake here is that some devs will think a range from 1 to 10 should result in a formula that would include "RAND() * 9" when they use a range formula of 10 - 1 = 9.

And filling the values with minRange = -10 and maxRange = 10 gives us:

SELECT -10 + FLOOR(RAND() * 21);

The common mistake happens when a quick mental analysis of a range formula is made, some devs think that the solution for the second formula will result to "RAND() * 20".

The given formula is used in the function below. The integers for minRange and maxRange can be positive or negative, including zero. The range of numbers returned will correctly include the values of minRange and maxRange as was explained above.

DELIMITER $$
DROP FUNCTION IF EXISTS `randomRangePicker` $$
CREATE DEFINER=`root`@`localhost` FUNCTION `randomRangePicker`(
 minRange INT,
 maxRange INT) RETURNS int(11)
BEGIN
 DECLARE pick INT;
 SET pick = minRange + FLOOR(RAND() * (maxRange - minRange + 1));
 RETURN pick;
END $$
DELIMITER ;



To use the function, simply pass the two needed parameters like:

SELECT randomRangePicker(-1, 1);

This will return either -1, 0, or 1.

See the following user defined functions for samples where randomRangePicker() can be used:



Updated: 10/14/2012

Thursday, April 28, 2011

A function to get all the columns of any table from any database

Permalink: http://bit.ly/VP174V



Certain complex MySQL SELECT and subquery statements will not allow the use of the * wildcard and you will need to fill in the entire column list of a given table. Consider the following simplified example, a SELECT statement that contains 3 columns. The asterisk here refers to all columns, which is actually the 3 columns listed in the GROUP BY clause:

SELECT
 IF(
  EXISTS(
  SELECT *
  FROM (
   SELECT *
   FROM `dbName_A`.`tableName_A`
   UNION ALL
   SELECT *
   FROM `dbName_B`.`tableName_B`
   AS `compareTables`
  GROUP BY `column_1`, `column_2`, `column_3`
  HAVING COUNT(*) = 1),
 1, 0);

Imagine if it were dozens of columns instead of just 3. You can't simply put in the * wildcard like 'GROUP BY * '. The above example will not work without the GROUP BY clause and you'll need to type in all the column names. A solution is to create a function that returns a list of the column names for you. Here it is:

DELIMITER $$
DROP FUNCTION IF EXISTS `getColumnList` $$
CREATE DEFINER=`root`@`localhost` FUNCTION `getColumnList`(
  dbName VARCHAR(64),
  tableName VARCHAR(64)) RETURNS text CHARSET utf8
BEGIN
  DECLARE columnList TEXT;
  SET group_concat_max_len = 65533;
  SELECT GROUP_CONCAT(`COLUMN_NAME` SEPARATOR '`,`')
  FROM `information_schema`.`COLUMNS`
  WHERE (`TABLE_SCHEMA` = dbName)
    AND (`TABLE_NAME` = tableName)
  INTO columnList;
  SET columnList = CONCAT('`', columnList, '`');
  RETURN columnList;
END $$
DELIMITER ;

This function has 1 limitation: as of version 5.5 of MySQL, it is not possible to get a column list into a variable if it comes from a temporary table. This is due to the fact that temporary tables are invisible within the `information_schema` database. This will hopefully be resolved in a future version of MySQL.

To use the function, simply pass the database name and table name parameters to the function such as:

SET @columnList = getColumnList('dbName_A', 'tableName_A');

A list of the column names in the form similar to the above example will be returned:

`column_1`, `column_2`, `column_3` 

The first example can now be re-written in a simpler way. Now, it does not matter how many columns there are, the function will take care of it all:

SELECT
 IF(
  EXISTS(
  SELECT *
  FROM (
   SELECT *
   FROM `dbName_A`.`tableName_A`
   UNION ALL
   SELECT *
   FROM `dbName_B`.`tableName_B`
   AS `compareTables`
  GROUP BY @columnList
  HAVING COUNT(*) = 1),
 1, 0);



This function is especially useful for dynamic MySQL statements and is robust enough to handle a wide variety of usage types. Here's a run down on the function's underlying code:
  • TEXT variable type is used for the string return since it is the most practical character length among the variable choices. Smaller string types are too small at 255 maximum character length, while the next larger option, MEDIUMTEXT, is at 16,777,215 characters is just too big and would be a resource hog. TEXT data type can hold a string up to a maximum length of 65,535 characters.
  • group_concat_max_len has a default value of 1,024. If the column list concatenated is longer than 1,024 characters, it will be truncated. The max length of the GROUP_CONCAT clause should match the character length of the TEXT data type in order to maximize its usefulness. The value is set at 65,533 since 2 backticks (explained below) surrounding the string will add up to 65,535 characters. This change is not permanent and only remains valid for the duration of the active session. You may increase the maximum to a much larger value, but 65 thousand characters should pretty much handle most cases.
  • Column names are surrounded in backticks in case they contain reserved words. Not enclosing column names in backticks can be problematic since the function may be used in a variety of ways and may cause unpredictable results if not done so.



Updated: 10/11/2012