Advance SQL Queries Overview

Throughout the semester we have been gradually processing in sequential order the steps outlined by professionals to design a relational database than can implemented in any relational database management system. The first step is to obtain the requirements and understand the problem. This involves writing a statement of understanding after interviewing the people involved with the currently implemented system. The second step was to create a conceptual view of the entities and relationships extracted from the problem space that you are trying to create a relational database for. This involves understanding the business rules that the stakeholders of the system currently utilize. The third step is to create a logical representation of the database. This includes translating the conceptual ER diagram into a normalized relational schema. Once a normalized relational schema is created, the data dictionary can be finalized. The data dictionary is the metadata that is needed by the users of the database. The metadata is also referred to as the data dictionary. Once the finalized schema and data dictionary is approved, the physical database is then implemented. Over the past couple of weeks we have been discussing and illustrating the nature of the RDBMS in how it evaluates joins. We have also continued to practice running queries in a real time database environment. Up until last week, you were provided DDL script to create the database and the tables in the database. You were also provided with the DML INSERT INTO commands to insert and populate the tables with data. Last week you studied and practice creating our own DDL statements to create databases and tables. You explored the structure of the CREATE TABLE command and some of its supplement modifier statements. In the first week of the course, we examined three basic generic data types that we classified as text, numeric and Boolean. Last week, you studied a variety of sub types of these main categories that were SQL specific.

Under the numeric category there was the floating point and integer data types. Under the integer data types there was the INT. Under floating point there was the double data type. The main difference between the floating point data types is the size of the floating point number and the storage space needed to hold the value. The double data type has a 64 bit size. Under the text data type there is the VARCHAR data types and CHAR data types. Keep in mind, in certain DBMS applications there is a difference in the char data type. In the certain DBMS such as Oracle CHAR is a constraint on the number of characters that must be placed in the attribute. MySQL does not enforce this constraint.

The DATE data type enforces that a correct date is entered into the attribute. Different DBMS application structure the date differently. The structure of the date attribute in MySQL is 'YYYY-MM-DD HH:MM:SS'. You can optionally use a sub set of the date structure excluding the time 'YYYY-MM-DD'.

This Weeks Highlight

In order to follow along with the query examples below, you must create the database and tables in the attached SQL file below. Additionally, you must create the tables in the proper order.

WorkerBranch.sql Download WorkerBranch.sql

Aggregation Functions in SQL and Additional Modifiers to the SELECT Query

Below are list of a few functions that return summarized values from a SELECT query. These are called aggregate functions.

MIN	returns the smallest value in a given column
MAX	returns the largest value in a given column
SUM	returns the sum of the numeric values in a given column
AVG	returns the average value of a given column
COUNT	returns the total number of values in a given column
COUNT(*)	returns the number of rows in a table

Aggregate functions are used to compute and summarize a "returned column of numeric data" from your SELECT statement. They basically summarize the results of a particular column of selected data. We are covering these here since they are required by another topic that we will be covering this week in class. The "GROUP BY" modifier is an additional reserve word that you can use with the select statement. Aggregated Functions are required for use with the "GROUP BY" clause. However, these functions can be used without the "GROUP BY" clause. For example:

SELECT AVG(salary)

FROM worker;

This statement will return a single result, which contains the average value of everything value returned in the salary column from the Worker table.

Another example:

SELECT AVG(salary)

FROM worker;

This statement will return a single result, which contains the average value of everything value returned in the salary column from the Worker table that are female employees.

Example:

SELECT AVG(salary)

FROM worker;

WHERE gender = 'F';

This statement will return the count of all employees in the table

Example:

SELECT Count(*)

FROM worker;

This particular statement is slightly different from the other aggregate functions since there isn't a column supplied to the count function. This statement will return the number of rows in the employees table.

The “GROUP BY” clause of the SELECT Statement

The GROUP BY clause will gather all of the rows that contain data in the specified column(s) and it will then allow aggregate functions to be performed on the one or more columns. This will produce sub totals in each category being grouped. This can best be explained by an example:

GROUP BY clause syntax:

SELECT column1,

SUM(column2)

FROM "list-of-tables"

GROUP BY "column-list";

Let's say you would like to retrieve a list of the highest paid salaries in each dept:

SELECT max(salary), deptID_FK

FROM worker

GROUP BY deptID_FK;

This statement will select the maximum salary for the people in each unique department. Basically, the salary for the person who makes the most in each department will be displayed. Their, salary and their department will be returned.

Let’s take a look at an example using real time data. You need to loas the customer and orders table into a database to run this example. Let’s take a look at the items_ordered table. Let's say you want to group everything of quantity 1 together, everything of quantity 2 together, everything of quantity 3 together, etc. If you would like to determine what the largest cost item is for each grouped quantity (all quantity 1's, all quantity 2's, all quantity 3's, etc.), you would enter:

SELECT quantity, max(price)

FROM items_ordered

GROUP BY quantity;

Enter the statement in above, and take a look at the results to see if it returned what you were expecting. Verify that the maximum price in each Quantity Group is really the maximum price.

HAVING clause

The HAVING clause allows you to specify conditions on the aggregated rows for each group - in other words, which of the aggregated rows should be selected and displayed in the result set of the query. The selection will be based on the conditions you specify in the HAVING clause. The HAVING clause must follow the GROUP BY clause if you are going to use it.

HAVING clause syntax:

SELECT column1,

SUM(column2)

FROM "list-of-tables"

GROUP BY "column-list"

HAVING "condition";

The HAVING clause can best be described by an example. Let's say you have an Worker table containing the workers's last name, department ID, salary, and gender. If you would like to select the average salary for each WORKER in each department, you could enter:

SELECT deptID_FK, avg(salary)

FROM WORKER

GROUP BY deptID_FK;

But, let's say that you want to ONLY calculate & display the average if their salary is over 1000:

SELECT deptID_FK, avg(salary)

FROM WORKER

GROUP BY deptID_FK

HAVING avg(salary) > 1000;

If you wanted to include the Department Name to this projection, you would simply have to JOIN the Department Table to Query. You would then add the columns of any one department record that matches the department code in the worker record to the individual rows of the result set rows of the query. This is when Department ID Code in Worker matches the Department ID Code in Department. When connecting the table on the one side to the table on the many side of a one to many cardinality relationship in a multiple table query, the JOIN statement connects the two tables when the primary key attribute is equaling the foreign key attribute. When they are equal, the result set row has access to all attributes in both instances that are connecter. In the example below the worker instance has access to the connected department instance an thus can reference the department name of the connected department instance.

SELECT deptID_FK, avg(salary), deptName

FROM WORKER

JOIN Dept ON dept.deptID = Worker.DeptID_FK

GROUP BY deptID_FK

HAVING avg(salary) > 1000;

Now if you wanted to include the Branch Table in the query, you must use the Department Table as a linking table between the Worker table and the Branch Table. As noted and illustrated above, when Department ID Code in Worker matches the Department ID Code in Department, the resulting row is a combination of both instances that match. Now once that match is established, another connection or join can be made. The connected department record contains the foreign key attribute linking it to the department that the it is in. The Department Table is on the one side of a one to many relationship with the Worker Table. It is also on the many side of a one to many relationship with the Branch Table. You can set another join to the Branch Table connecting the one Branch Table Record that matches the value branch ID attribute in the Department Table. This will combine the attributes all three instances of the three tables into one combined row, giving you access to the data in the attributes of all three combined instances. Once the join clause is added to the query, simply specify the attribute name of the value want access to. In the case below, additionally, we want to display the branch name of the branch that the worker works for

SELECT deptID_FK, avg(salary), deptName, branchName

FROM WORKER

JOIN Dept ON dept.deptID = Worker.DeptID_FK

JOIN Branch ON Branch.branchID = Dept.branchID_FK

GROUP BY deptID_FK

HAVING avg(salary) > 1000;

Review Exercises (note: yes, they are similar to the group by exercises, but these contain the HAVING clause requirements

ORDER BY clause

ORDER BY is an optional clause which will allow you to display the results of your query in a sorted order (either ascending order or descending order) based on the columns that you specify to order by.

ORDER BY clause syntax:

SELECT column1, SUM(column2) FROM "list-of-tables" ORDER BY "column-list" [ASC | DESC];

[ ] = optional

This statement will select the employee_id, dept, name, age, and salary from the employee_info table where the dept equals 'Sales' and will list the results in Ascending (default) order based on their Salary.

ASC = Ascending Order - the default: This is from the lowest item to the highest item

DESC = Descending Order- this must be supplied: This is from the highest item to the lowest item

For example:

SELECT employee_id, dept, name, age, salary FROM employee_info WHERE dept = 'Sales' ORDER BY salary;

If you would like to order based on multiple columns, you must separate the columns with commas. For example:

SELECT workerID, deptID, lname,commision , salary

FROM worker

WHERE deptID_FK = 'SALES'

ORDER BY salary, commision DESC;

The IN & BETWEEN Clause to the SELECT Statement

SELECT col1, SUM(col2)

FROM "list-of-tables"

WHERE col3 IN (list-of-values);

SELECT col1, SUM(col2)

FROM "list-of-tables"

WHERE col3 BETWEEN value1 AND value2;

The IN conditional operator is really a set membership test operator. That is, it is used to test whether or not a value (stated before the keyword IN) is "in" the list of values provided after the keyword IN.

For example: