Aggregated Functions
An aggregated function takes a generic number of tables and returns a single element as output, usually numeric.
select count(*) from Persona| count(*) |
|---|
| 5 |
Within the parentheses, I can include one or more fields, which will be counted only if they are all different from NULL.
It is generally good practice to use either the selector * or the primary key (because it cannot be equal to NULL by definition).
Group By
Suppose we want to display pairs of the type (Age, Number of people of that age).
select Eta, count(CodiceFiscale) from PersonaINCORRECT Method
| Eta | count(CodiceFiscale) |
|---|---|
| 22 | ? |
| 35 | ? |
| 55 | ? |
A query like this would result in an error, as it combines two different data types; Age is not aggregated, while count(SocialSecurityNumber) is. The system does not know which rows of the table to consider for counting and which to exclude, meaning the system does not know which attribute to group the count by.
select Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Eta| Eta | NumeroPersone |
|---|---|
| 22 | 3 |
| 35 | 1 |
| 55 | 1 |
By using the clause group by Age, the system performs count(SocialSecurityNumber) wherever it encounters multiple Age values that are the same.
Every time an aggregated function is used along with a field from a certain table, the clause group by FieldList must be used, where FieldList must be a superset of the fields present in the select, meaning that at least the fields present in the select must be included in the group by, along with possibly other fields.
select Nome, Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Nome, Eta| Nome | Eta | NumeroPersone |
|---|---|---|
| Lucia | 22 | 2 |
| Domiziano | 22 | 1 |
| Mario | 35 | 1 |
| Luciano | 55 | 1 |
select Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Nome, Eta| Eta | NumeroPersone |
|---|---|
| 22 | 2 |
| 22 | 1 |
| 35 | 1 |
| 55 | 1 |
These two queries group the same data, showing the triples (Name, Age, Number of people with the same name and age), except that in the second query the Name is not shown, as it is not present in the select.
The query does not produce an error since all fields in the group by are present in the select (Age).
--Corretto, in quanto (Eta) ha un sottoinsieme di (Nome, Eta)
select Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by Nome, Eta
--Non Corretto, in quanto (Nome, Eta) non è un sottoinsieme di (Eta)
select Nome, Eta, count(CodiceFiscale) as NumeroPersone from Persona
group by EtaFundamental aggregate functions
- avg(Field): returns the average value of Field
- sum(Field): returns the sum of values in Field
- count(Field): returns the number of rows where Field is not equal to NULL
- min(Field): returns the minimum value of Field present in the table
- max(Field): returns the maximum value of Field present in the table
Retrieving Rows with Maximum or Minimum Value
As we have seen, the functions max() and min() return the maximum and minimum values of a certain specified table; however, it is not clear how we can find the row that has that maximum or minimum value.
It is easy to find the maximum average salary among all jobs:
select max(StipendioMedio) from Lavoro| max(StipendioMedio) |
|---|
| 60000 |
If we want to find the Job that offers the maximum average salary, we can think that the above query returns a single value, working with aggregate functions, and therefore I can insert this value within a where clause:
select * from Lavoro
where StipendioMedio = (select max(StipendioMedio) from Lavoro)| Id | Nome | StipendioMedio | EsperienzaNecessari |
|---|---|---|---|
| 3 | Chirurgo | 60000 | 10 |
Note
I can use this method with all aggregate functions.