• SQL Spreads Blog
  • Simple Data Validation in SQL

    Andy McDonald - Oct 24, 2021

    What is data validation in SQL and why is it important?  Data validation is a method for checking the accuracy and quality of data.  Information in databases is constantly being updated, deleted, queried, or moved by multiple people or processes, so ensuring that data is valid at all times is essential. In this article, we’re going to explain how to add some simple validation rules in SQL.  We’ll also look at how a product like the SQL Spreads Excel Add-In can make this process a little easier.

    What is Data Validation in SQL?

    Data validation is the method for checking the accuracy and quality of data.  It is often performed prior to adding, updating, or processing data.  Similarly, when we want to merge data from disparate sources we often talk of ‘cleansing’ the data – in other words validating it. When validating data, we can check if the data is:

    • complete (ie no blank or null values)
    • unique (ie no duplicate values)
    • consistent with what we expect (eg a decimal between a certain range)

    Some examples of how we’d use these checks include:

    • in a payroll table, we may want to set minimum and maximums for the ‘salary’ column to ensure values entered are within acceptable bounds.
    • in a customer table, when storing address details, we’d need to ensure that the zip code is numeric and the correct length.
    • in a product table, we may want to limit the values in the ‘color’ column to a defined list.

    Some of you may be saying to yourself “data validation like this should be performed in the application layer, not the database layer”.  Of course, validation in the application layer is crucial.  But there are instances where updates could be performed directly in the database.  It is also good practice to make sure that the database has some form of data validation even it also exists elsewhere.

    Although it may sound obvious, we should also point out that although we can use rules to validate data entry, we can’t necessarily ensure the correctness of a value.  For example, we may have a validation rule to limit entries in a salary column to values between 5000 and 250,000.  This still wouldn’t prevent an incorrect value from being mistakenly entered (even if it falls within the allowable range).

    There are some other constraints we can apply during validation, which are slightly more advanced.  These include checking if a link between two tables will be destroyed (ie a Foreign Key constraint), setting a default value if none is provided, and also setting a primary key constraint.

    We’re going to take a look in more detail at the three constraints we’re most likely to use to validate our data – the NOT NULL, UNIQUE, and CHECK constraints.

    Constraints in SQL

    Constraints in SQL Server are rules that limit the data that goes into our tables.  These constraints ensure the validity of the data and help maintain the integrity of the database.  Constraints can be defined when tables are created or afterward and can apply to individual or multiple columns.

    If we insert data in the column that meets the constraint rule criteria, SQL Server inserts data successfully. However, if data violates the constraint, the insert statement is aborted with an error message.

    NOT NULL Constraint

    SQL Server allows NULL values, which translate to ‘unknown value’.  There are valid use cases for using NULLS, but there are also obvious cases where we can’t accept a NULL value.  In these cases, we can define a NOT NULL constraint on a column.

    In the example below, we are creating an employees table, and specifying that all columns except ‘MiddleName’ will not accept NULLs.

    CREATE TABLE Employees
    (
    EmployeeID  INT NOT NULL,
    FirstName Varchar(100) NOT NULL,
    MiddleName Varchar(50) NULL,
    LastName Varchar(100) NOT NULL,
    Gender char(1) NOT NULL,
    Address Varchar(200) NOT NULL
    );

    If we need to update existing columns to be NOT NULL, then we can either use an ALTER TABLE statement.  Alternatively, we can use SQL Server Management Studio (SSMS) – right-click on the table and click ‘Design’ to make the changes.  

    SSMS Table Design Not Null

    UNIQUE Constraint

    We typically use the UNIQUE constraint on ID columns. In the example below, we’re creating a simple table and specifying that that ‘EmployeeID’ should be unique (and not NULL).

    CREATE TABLE Employees
    (
    EmployeeID INT NOT NULL UNIQUE,
    FirstName Varchar(100) NOT NULL,
    MiddleName Varchar(50) NULL,
    LastName Varchar(100) NOT NULL,
    Gender char(1) NOT NULL,
    Address Varchar(200) NOT NULL
    );

     

    CHECK Constraint

    A check constraint consists of a logical expression to determine what values are valid.  A simple example would be in a payroll database where we want to specify a maximum value that can be entered.  The syntax for the CHECK constraint when creating a table is shown below.

    CREATE TABLE table_name
    (
    column1 datatype [ NULL | NOT NULL ],
    column2 datatype [ NULL | NOT NULL ],
     
    ...
    CONSTRAINT constraint_name
    CHECK (column_name condition)
    );

    So, for our payroll example, we could use the following to create a table that enforces a check constraint on values entered into the Salary column:

    CREATE TABLE dbo.Payroll
    (
    EmployeeID int PRIMARY KEY,
    EmployeeType INT,
    Salary decimal(9,2),
    CONSTRAINT CK_Payroll_Salary_1 CHECK (EmployeeType = 1 and Salary > 0  and Salary < 200000.00)
    );

    If we need to add a check constraint to a column in an existing table, we can use the ALTER statement:

    ALTER TABLE dbo.Payroll
    ADD CONSTRAINT CK_Payroll_Salary_1
    CHECK (EmployeeType = 1 and Salary > 0  and Salary < 200000.00);

    And to remove a check constraint we can use the following:

    ALTER TABLE dbo.Payroll
    DROP CONSTRAINT CK_Payroll_Salary_1;

    Finally, it’s often useful to temporarily enable or disable check constraints, and we can do this as follows: To enable a check constraint:

    ALTER TABLE dbo.Payroll
    WITH CHECK CHECK CONSTRAINT CK_Payroll_Salary_1;

    To disable a check constraint:

    ALTER TABLE dbo.Payroll
    NOCHECK CONSTRAINT CK_Payroll_Salary_1;

    As you can see, Check Constraints are easy to create and flexible in terms of usage.

    How to Add Validation in SQL Spreads

    The SQL Spreads Excel Add-in makes it easy to update data in SQL Server from Excel. It also makes it easy to add SQL data validation rules from within Excel using a data post-processing script.

    The Data Post-processing script is a SQL query that will be executed as the last step in the transaction that updates the database with the changes made in Excel.  By default, SQL Spreads will always validate all entered values against the Data Types in SQL Server.  The process described below is a more advanced option that is similar to the CHECK constraint we described earlier.

    To add a Data Post-processing script in SQL Spreads, open Document Settings and click the Edit Post-Save SQL Query button.

    SQL Spreads Documents Settings

    In the Post-Save SQL Query dialog box, we can now enter our validation script.  The validation script contains the logic that we want to check and an error message that we can display to the user.

    The example below shows a SQL query that checks if any of the values entered in the Salary column of the Payroll table are invalid (ie less than or equal to 0 or greater than 200000 for Employees of type 1).

    IF (
    SELECT COUNT([Salary]) 
    FROM [demo].[dbo].[Payroll] 
    WHERE EmployeeType = 1 AND (Salary <= 0  OR Salary > 200000.00)) 
    > 0   
    RAISERROR('The Salary for Employee_Type 1 must be between 0 and 200000',16,1);

    SQL Spreads Post-Save SQL Query

    Now, if the user enters an invalid value in the Salary column when trying to save to the database, the following message will be displayed, and the update transaction will be rolled back.  

    SQL Spreads Post-Save Warning Message

    Summary

    Performing data validation in SQL is important to maintain the integrity of your database.  In this article, we’ve explained some simple ways for you to do this using constraint scripts in SQL Server Management Studio.

    We also showed you how you can add validation checks from within Excel, using the Data Post-processing feature in SQL Spreads.

    If you do a lot of updating of data from Excel to SQL Server, or if you have users who love Excel but are not used to database tools, then download the free trial of the SQL Spreads Excel Add-In.

Leave a Reply


No comments yet. Be the first!