Advance SQL Queries Overview

 

 

Throughout the semester we have been gradually processing in sequential order the steps outlined by professionals to design a relational database than can implemented in any relational database management system.  The first step is to obtain the requirements and understand the problem.  This involves writing a statement of understanding after interviewing the people involved with the currently implemented system.   The second step was to create a conceptual view of the entities and relationships extracted from the problem space that you are trying to create a relational database for.  This involves understanding the business rules that the stakeholders of the system currently utilize. The third step is to create a logical representation of the database.  This includes translating the conceptual ER diagram into a normalized relational schema.  Once a normalized relational schema is created, the data dictionary can be finalized.  The data dictionary is the metadata that is needed by the users of the database. The metadata is also referred to as the data dictionary.  Once the finalized schema and data dictionary is approved, the physical database is then implemented.  Over the past couple of weeks we have been discussing and illustrating the nature of the RDBMS in how it evaluates joins.  We have also continued to practice running queries in a real time database environment.  Up until last week, you were provided DDL script to create the database and the tables in the database.  You were also provided with the DML INSERT INTO commands to insert and populate the tables with data.  Last week you studied and practice creating our own DDL statements to create databases and tables.  You explored the structure of the CREATE TABLE command and some of its supplement modifier statements.  In the first week of the course, we examined three basic generic data types that we classified as text, numeric and Boolean. Last week, you studied a variety of sub types of these main categories that were SQL specific.   

 

Under the numeric category there was the floating point and integer data types. Under the integer data types there was the INT. Under floating point there was the double data type.  The main difference between the floating point data types is the size of the floating point number and the storage space needed to hold the value.   The double data type has a 64 bit size.  Under the text data type there is the VARCHAR data types and CHAR data types.   Keep in mind, in certain DBMS applications there is a difference in the char data type.  In the certain DBMS such as Oracle CHAR is a constraint on the number of characters that must be placed in the attribute. MySQL does not enforce this constraint.   

The DATE data type enforces that a correct date is entered into the attribute.  Different DBMS application structure the date differently.  The structure of the date attribute in MySQL is 'YYYY-MM-DD HH:MM:SS'.  You can optionally use a sub set of the date structure excluding the time 'YYYY-MM-DD'.

 

 This Weeks Highlight

 

In order to follow along with the query examples below, you must create the database and tables in the attached SQL file below.  Additionally, you must create the tables in the proper order.

WorkerBranch.sql Download WorkerBranch.sql 

 

Aggregation Functions in SQL and Additional Modifiers to the SELECT Query

 

Below are list of a few functions that return summarized values from a SELECT query.  These are called aggregate functions.

 

MIN

returns the smallest value in a given column

MAX

returns the largest value in a given column

SUM

returns the sum of the numeric values in a given column

AVG

returns the average value of a given column

COUNT

returns the total number of values in a given column

COUNT(*)

returns the number of rows in a table

 

Aggregate functions are used to compute and summarize a "returned column of numeric data" from your SELECT statement. They basically summarize the results of a particular column of selected data. We are covering these here since they are required by another topic that we will be covering this week in class. The "GROUP BY" modifier  is an additional reserve word that you can use with the select statement. Aggregated Functions are required for use with the "GROUP BY" clause.  However, these functions can be used without the "GROUP BY" clause. For example:

 

SELECT AVG(salary)

 

FROM worker;

 

This statement will return a single result, which contains the average value of everything value returned in the salary column from the Worker table.

Another example:

SELECT AVG(salary)

 

FROM worker;

 

This statement will return a single result, which contains the average value of everything value returned in the salary column from the Worker table that are female employees.

 Example:

SELECT AVG(salary)

FROM worker;

WHERE gender = 'F';

 

This statement will return the count of all employees in the table

Example:

 

SELECT Count(*)

 

FROM worker;

This particular statement is slightly different from the other aggregate functions since there isn't a column supplied to the count function. This statement will return the number of rows in the employees table.

 

 

 

 

 

The “GROUP BY” clause of the SELECT Statement

The GROUP BY clause will gather all of the rows that contain data in the specified column(s) and it will then allow aggregate functions to be performed on the one or more columns. This will produce sub totals in each category being grouped. This can best be explained by an example:

GROUP BY clause syntax:

 

SELECT column1,

SUM(column2)

 

FROM "list-of-tables"

 

GROUP BY "column-list";

Let's say you would like to retrieve a list of the highest paid salaries in each dept:

 

SELECT max(salary), deptID_FK

 

FROM worker

 

GROUP BY deptID_FK;

This statement will select the maximum salary for the people in each unique department. Basically, the salary for the person who makes the most in each department will be displayed. Their, salary and their department will be returned.

 

Let’s take a look at an example using real time data. You need to loas the customer and orders table into a database to run this example.  Let’s take a look at the items_ordered table. Let's say you want to group everything of quantity 1 together, everything of quantity 2 together, everything of quantity 3 together, etc. If you would like to determine what the largest cost item is for each grouped quantity (all quantity 1's, all quantity 2's, all quantity 3's, etc.), you would enter:

 

/*

SELECT quantity, max(price)

 

FROM items_ordered

 

GROUP BY quantity;

*/

Enter the statement in above, and take a look at the results to see if it returned what you were expecting. Verify that the maximum price in each Quantity Group is really the maximum price.

 

 

HAVING clause

The HAVING clause allows you to specify conditions on the aggregated rows for each group - in other words, which of the aggregated rows should be selected and displayed in the result set of the query. The selection will be based on the conditions you specify in the HAVING clause. The HAVING clause must follow the GROUP BY clause if you are going to use it.

HAVING clause syntax:

 

SELECT column1,

SUM(column2)

 

FROM "list-of-tables"

 

GROUP BY "column-list"

 

HAVING "condition";

 

The HAVING clause can best be described by an example. Let's say you have an Worker table containing the workers's last name, department ID, salary, and gender. If you would like to select the average salary for each WORKER in each department, you could enter:

 

SELECT deptID_FK, avg(salary)

 

FROM WORKER

 

GROUP BY deptID_FK;

But, let's say that you want to ONLY calculate & display the average if their salary is over 1000:

 

SELECT deptID_FK, avg(salary)

 

FROM WORKER

 

GROUP BY deptID_FK

 

HAVING avg(salary) > 1000;

 

 

If you wanted to include the Department Name to this projection, you would simply have to JOIN the Department Table to Query.  You would then add the columns of any one department record that matches the department code in the worker record to the individual rows of the result set rows of the query. This is when Department ID Code in Worker matches the Department ID Code in Department.  When connecting the table on the one side to the table on the many side of a one to many cardinality relationship in a multiple table query, the JOIN statement connects the two tables when the primary key attribute is equaling the foreign key attribute. When they are equal, the result set row has access to all attributes in both instances that are connecter. In the example below the worker instance has access to the connected department instance an thus can reference the department name of the connected department instance. 

 

SELECT deptID_FK, avg(salary), deptName

FROM WORKER

JOIN Dept ON dept.deptID = Worker.DeptID_FK

GROUP BY deptID_FK

HAVING avg(salary) > 1000;

 

Now if you wanted to include the Branch Table in the query, you must use the Department Table as a linking table between the Worker table and the Branch Table.  As noted and illustrated above, when Department ID Code in Worker matches the Department ID Code in Department, the resulting row is a combination of both instances that match.  Now once that match is established, another connection or join can be made.  The connected department record contains the foreign key attribute linking it to the department that the it is in.  The Department Table is on the one side of a one to many relationship with the Worker Table.  It is also on the many side of a one to many relationship with the Branch Table.  You can set another join to the Branch Table connecting the one Branch Table Record that matches the value branch ID attribute in the Department Table.  This will combine the attributes all three instances of the three tables into one combined row, giving you access to the data in the attributes of all three combined instances.  Once the join clause is added to the query, simply specify the attribute name of the value want access to.  In the case below, additionally, we want to display the branch name of the branch that the worker works for  

 

SELECT deptID_FK, avg(salary), deptName, branchName

FROM WORKER

JOIN Dept ON dept.deptID = Worker.DeptID_FK

JOIN Branch ON Branch.branchID = Dept.branchID_FK

GROUP BY deptID_FK

HAVING avg(salary) > 1000;

 

   

 

  

Review Exercises (note: yes, they are similar to the group by exercises, but these contain the HAVING clause requirements

 

ORDER BY clause

ORDER BY is an optional clause which will allow you to display the results of your query in a sorted order (either ascending order or descending order) based on the columns that you specify to order by.

 

ORDER BY clause syntax:

 

SELECT column1, SUM(column2) FROM "list-of-tables" ORDER BY "column-list" [ASC | DESC];

[ ] = optional

This statement will select the employee_id, dept, name, age, and salary from the employee_info table where the dept equals 'Sales' and will list the results in Ascending (default) order based on their Salary.

 

ASC = Ascending Order - the default:  This is from the lowest item to the highest item

 

DESC = Descending Order- this must be supplied:  This is from the highest item to the lowest item

 

For example:

SELECT employee_id, dept, name, age, salary FROM employee_info WHERE dept = 'Sales' ORDER BY salary;

If you would like to order based on multiple columns, you must separate the columns with commas. For example:

 

SELECT workerID, deptID, lname,commision , salary

 

 

FROM worker

 

WHERE deptID_FK = 'SALES'

 

ORDER BY salary, commision DESC;

 

 

The IN & BETWEEN Clause to the SELECT Statement

 

 

SELECT col1, SUM(col2)

 

FROM "list-of-tables"

 

WHERE col3 IN (list-of-values);

 

 

SELECT col1, SUM(col2)

 

FROM "list-of-tables"

 

WHERE col3 BETWEEN value1 AND value2;

 

The IN conditional operator is really a set membership test operator. That is, it is used to test whether or not a value (stated before the keyword IN) is "in" the list of values provided after the keyword IN.

 

For example:

 

 

SELECT workerID, lname, salary

FROM worker

WHERE lastname IN ('KELLY', 'BAKER', 'WARD', 'DENNIS');

 

 

This statement will select the workerID,  last name, salary from the worker table where the last name is equal to either: Hernandez, Jones, Roberts, or Ruiz. It will return the rows if it is ANY of these values.

 

The IN conditional operator can be rewritten by using compound conditions using the equals operator and combining it with OR - withthe exact same output results:

 

Example 

 

SELECT workerID, lname, salary

 

FROM worker

 

WHERE lname = 'KELLY' OR lname = 'BAKER' OR lname = 'WARD' OR lname = 'HINES';

 

 

As you can see, the IN operator is much shorter and easier to read when you are testing for more than two or three values. In the above example you will get an empty set if you run the query on our Worker Database because none of these names are last name values in the worker table. 

 

SELECT workerID, lname, salary

 FROM worker

WHERE lname = 'Hernandez' OR lname = 'Jones' OR lname = 'Roberts' OR lname = 'Ruiz';

 

 

In the example below you will receive four rows in the result set since all four workers last names are in the Worker Table

 

SELECT workerID, lname, salary

FROM worker

WHERE lname = 'KELLY' OR lname = 'BAKER' OR lname = 'DENNIS' OR lname = 'WARD';

 

 

You can also use NOT IN to exclude the rows in your list.

 

EXAMPLE 

SELECT workerID, lname, salary

FROM worker

WHERE lastname NOT IN ('KELLY', 'BAKER', 'WARD', 'DENNIS');

 

In the example above the result set will contain all instances of workers in the Worker Table whose last name is not Kelly, Baker, Ward, or Dennis

 

 

Once again we can identify the department and or Branch that the worker is assigned to by joining the Department Table or the (Department Table and Branch Table) to the query

 

SELECT workerID, lame AS 'Last Name', salary, deptName AS 'Department Name'

JOIN Department ON Worker.deptID_FK = Dept.deptID

FROM worker

WHERE lastname NOT IN ('KELLY', 'BAKER', 'WARD', 'DENNIS');

 

 

In the query below you see that one join uses the full database name and the other does not.  The way the attributes are named makes it unnecessary to use the full database name in the first join clause in the SELECT Statement. You only need to use the full database name when two attributes in two tables being used in a query have the same name.  In worker the foreign key attribute is named deptID_FK and the primary key attribute is named deptID, there is no ambiguity for the database to determine which attribute is being referenced in the query.  In the second join clause yo can clearly see that there is no confusion for the database, the full database name is not being used to reference the attributes in both tables. The only reason to use the full database name is when two or more attributes from two or more tables being used in a multiple table query have the same. 

SELECT workerID, lame AS 'Last Name', salary, deptName AS 'Department Name', branchName

JOIN Department ON Worker.deptID_FK = Dept.deptID

JOIN Branch ON branchID_FK = BranchID

FROM worker

WHERE lastname NOT IN ('KELLY', 'BAKER', 'WARD', 'DENNIS');

 

 

 

 

The BETWEEN conditional operator is used to test to see whether or not a value (stated before the keyword BETWEEN) is "between" the two values stated after the keyword BETWEEN.

 

For example:

 

 SELECT workerid,gender, lname, salary

 

FROM worker

 

WHERE salary BETWEEN 500 AND 1000;

 

This statement will select the workerID, gender, last name, and salary from the worker table where the salary is between 500 and 1000 (including 500 and 1000).  The end points 500 and 1000 may or may not be included in the result set depending on the design of the DBMS that your are using. 

 

This statement can also be rewritten without the BETWEEN operator:

 

SELECT workerid,gender, lname, salary

 

FROM worker

WHERE salary>= 500 AND salary<= 1000;

 

 

You can also use NOT BETWEEN to exclude the values between your range.

 

For Example:

 

 SELECT workerid,gender, lname, salary

 FROM worker

 WHERE salary NOT ( BETWEEN 500 AND 1000);

 

In the example above, the result set would be worker instances with salaries  below 500 and above 1000.  The end points 500 and 1000 may or may not be included in the result set depending on the design of the DBMS that your are using.