Inner Join

Let’s say we have these two tables:

  • The Person table:

    NameSurnameFiscalCodeAgeGender
    DomizianoScarcelli00122M
    MarioRossi00235M
    LuciaDel Perro00322F
    LucianoMarchesi00455M
    LuciaGasparri00522F
  • And the Job table

    FCPersonIDJobSalary
    001223000
    002140000
    003422000

And we want to find the salary of each person that has a job. We need to combine the rows where Person.FiscalCode is equal to the Job.FCPerson. We can do that with a join.

SQL

-- Implicit join
SELECT *
FROM Person p, Job j
WHERE p.FiscalCode = j.FCPerson;
 
-- Explicit join
SELECT *
FROM Person p
JOIN PersonJob j
ON p.FiscalCode = j.FCPerson;

This query returns the Cartesian product between the two tables, filtering for those rows that satisfy the join condition.

NameSurnameFiscalCodeAgeGenderFCPersonIDJobSalary
DomizianoScarcelli00122M001223000
MarioRossi00235M002140000
LuciaDel Perro00322F003422000

The condition p.FiscalCode = j.FCPerson is essential; otherwise, the system would return all possible pairs between the rows of the tables, including pairs of rows corresponding to data from two different people.

Note

The join condition connects the Foreign Key of one table to the Primary Key of the other.**

Pandas In Pandas, we can use the DataFrame.merge() or the DataFrame.join() functions to do a join.

With .merge() the result is straight-forward.

result = person.merge(job, left_on='FiscalCode', right_on='FCPerson')

With .join, we first have to set the indexes of the columns we want to join using set_index(), then perform the join, and do a reset_index in order to make the FiscalCode index a column again.

person_indexed = person.set_index('FiscalCode')
job_indexed = job.set_index('FCPerson')
result = person_indexed.join(job_indexed, how='inner').reset_index()

Left Join

todo: modifica in modo da avere pandas + tabelle aggiornate in inglese + risultati corretti

The Left Join performs the same operation as the classic Join, but takes all the rows from the first table, including those that do not have a corresponding match in the “right” table (i.e., the table that appears second in the query) according to the imposed condition.

Unlike the inner join, it only uses explicit notation.

SELECT *
FROM Person p
LEFT JOIN Job j
ON p.FiscalCode = j.FCPerson;

This query returns the Cartesian product between the Persona and PersonaLavoro tables, filtering for those rows that satisfy the condition Person.FiscalCode = Job.FCPerson, but also including those people who do not have a job, i.e., who do not have a corresponding entry in Job.FCPerson.

In other words, it takes the inner join between the tables, but then appends also the rows in the left which key do not appear in the right table.

NameSurnameFiscalCodeAgeGenderFCPersonIDJobSalary
DomizianoScarcelli00122M001223000
MarioRossi00235M002140000
LuciaDel Perro00322F003422000
LucianoMarchesi00455MNULLNULLNULL
LuciaGasparri00522FNULLNULLNULL

Pandas In Pandas, this can be achieved with .merge() using the parameter how='left':

result_left = person.merge(job, how='left', left_on='FiscalCode', right_on='FCPerson')

Right Join

Opposite to the left join, the right join performs a classic join but also returns all those rows from the right table (the second) that do not have a corresponding match in the left table (the first) according to the imposed condition.

SELECT *
FROM Job j
RIGHT JOIN Person p
ON p.FiscalCode = j.FCPerson;
FCPersonIDJobSalaryNameSurnameFiscalCodeAgeGender
001223000DomizianoScarcelli00122M
002140000MarioRossi00235M
003422000LuciaDel Perro00322F
NULLNULLNULLLucianoMarchesi00455M
NULLNULLNULLLuciaGasparri00522F

Pandas Same as the left join, we can use the merge(how="right") to perform the right join in Pandas:

result_right = person.merge(job, how='right', left_on='FiscalCode', right_on='FCPerson')

tags: databases SQL