Aggregated Functions

An aggregated function takes a generic number of tables and returns a single element as output, usually numeric.

select count(*) from Persona
count(*)
5

Within the parentheses, I can include one or more fields, which will be counted only if they are all different from NULL.

It is generally good practice to use either the selector * or the primary key (because it cannot be equal to NULL by definition).

Group By

Suppose we want to display pairs of the type (Age, Number of people of that age).

select Eta, count(CodiceFiscale) from Persona

INCORRECT Method

Etacount(CodiceFiscale)
22?
35?
55?

A query like this would result in an error, as it combines two different data types; Age is not aggregated, while count(SocialSecurityNumber) is. The system does not know which rows of the table to consider for counting and which to exclude, meaning the system does not know which attribute to group the count by.

select Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Eta
EtaNumeroPersone
223
351
551

By using the clause group by Age, the system performs count(SocialSecurityNumber) wherever it encounters multiple Age values that are the same.

Every time an aggregated function is used along with a field from a certain table, the clause group by FieldList must be used, where FieldList must be a superset of the fields present in the select, meaning that at least the fields present in the select must be included in the group by, along with possibly other fields.

select Nome, Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Nome, Eta
NomeEtaNumeroPersone
Lucia222
Domiziano221
Mario351
Luciano551
select Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Nome, Eta
EtaNumeroPersone
222
221
351
551

These two queries group the same data, showing the triples (Name, Age, Number of people with the same name and age), except that in the second query the Name is not shown, as it is not present in the select.

The query does not produce an error since all fields in the group by are present in the select (Age).

--Corretto, in quanto (Eta) ha un sottoinsieme di (Nome, Eta)
select Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Nome, Eta
 
--Non Corretto, in quanto (Nome, Eta) non è un sottoinsieme di (Eta)
select Nome, Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Eta

Fundamental aggregate functions

  • avg(Field): returns the average value of Field
  • sum(Field): returns the sum of values in Field
  • count(Field): returns the number of rows where Field is not equal to NULL
  • min(Field): returns the minimum value of Field present in the table
  • max(Field): returns the maximum value of Field present in the table

Retrieving Rows with Maximum or Minimum Value

As we have seen, the functions max() and min() return the maximum and minimum values of a certain specified table; however, it is not clear how we can find the row that has that maximum or minimum value.

It is easy to find the maximum average salary among all jobs:

select max(StipendioMedio) from Lavoro
max(StipendioMedio)
60000

If we want to find the Job that offers the maximum average salary, we can think that the above query returns a single value, working with aggregate functions, and therefore I can insert this value within a where clause:

select * from Lavoro
where StipendioMedio = (select max(StipendioMedio) from Lavoro)
IdNomeStipendioMedioEsperienzaNecessari
3Chirurgo6000010

Note

I can use this method with all aggregate functions.


tags: databases SQL