Aggregate functions and Group By

Aggregated Functions

An aggregated function takes a generic number of tables and returns a single element as output, usually numeric.

select count(*) from Persona

count(*)
5

Within the parentheses, I can include one or more fields, which will be counted only if they are all different from NULL.

It is generally good practice to use either the selector * or the primary key (because it cannot be equal to NULL by definition).

Group By

Suppose we want to display pairs of the type (Age, Number of people of that age).

select Eta, count(CodiceFiscale) from Persona

INCORRECT Method

Eta	count(CodiceFiscale)
22	?
35	?
55	?

A query like this would result in an error, as it combines two different data types; Age is not aggregated, while count(SocialSecurityNumber) is. The system does not know which rows of the table to consider for counting and which to exclude, meaning the system does not know which attribute to group the count by.

select Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Eta

Eta	NumeroPersone
22	3
35	1
55	1

By using the clause group by Age, the system performs count(SocialSecurityNumber) wherever it encounters multiple Age values that are the same.

Every time an aggregated function is used along with a field from a certain table, the clause group by FieldList must be used, where FieldList must be a superset of the fields present in the select, meaning that at least the fields present in the select must be included in the group by, along with possibly other fields.

select Nome, Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Nome, Eta

Nome	Eta	NumeroPersone
Lucia	22	2
Domiziano	22	1
Mario	35	1
Luciano	55	1

select Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Nome, Eta

Eta	NumeroPersone
22	2
22	1
35	1
55	1

These two queries group the same data, showing the triples (Name, Age, Number of people with the same name and age), except that in the second query the Name is not shown, as it is not present in the select.

The query does not produce an error since all fields in the group by are present in the select (Age).

--Corretto, in quanto (Eta) ha un sottoinsieme di (Nome, Eta)
select Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Nome, Eta
 
--Non Corretto, in quanto (Nome, Eta) non è un sottoinsieme di (Eta)
select Nome, Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Eta

Fundamental aggregate functions

avg(Field): returns the average value of Field
sum(Field): returns the sum of values in Field
count(Field): returns the number of rows where Field is not equal to NULL
min(Field): returns the minimum value of Field present in the table
max(Field): returns the maximum value of Field present in the table

Retrieving Rows with Maximum or Minimum Value

As we have seen, the functions max() and min() return the maximum and minimum values of a certain specified table; however, it is not clear how we can find the row that has that maximum or minimum value.

It is easy to find the maximum average salary among all jobs:

select max(StipendioMedio) from Lavoro

max(StipendioMedio)
60000

If we want to find the Job that offers the maximum average salary, we can think that the above query returns a single value, working with aggregate functions, and therefore I can insert this value within a where clause:

select * from Lavoro
where StipendioMedio = (select max(StipendioMedio) from Lavoro)

Id	Nome	StipendioMedio	EsperienzaNecessari
3	Chirurgo	60000	10

Note

I can use this method with all aggregate functions.

tags: databases SQL

Quartz 4

Explorer

Aggregate functions and Group By

Aggregated Functions

Group By

Fundamental aggregate functions

Retrieving Rows with Maximum or Minimum Value

Graph View

Table of Contents

Backlinks