RIGHT JOIN
is an esoteric function within the SQL language, and hardly seen in the true world, as a result of virtually each RIGHT JOIN
can simply be expressed as an equal LEFT JOIN
. The next two statements are equal:
-- Common
SELECT c.first_name, c.last_name, p.quantity
FROM buyer AS c
LEFT JOIN fee AS p ON c.customer_id = p.customer_id
-- Esoteric
SELECT c.first_name, c.last_name, p.quantity
FROM fee AS p
RIGHT JOIN buyer AS c ON c.customer_id = p.customer_id
It’s not unreasonable to anticipate these two statements to supply the identical execution plan on most RDBMS, on condition that they’re logically equal. Since we’ve grown used to studying issues from left to proper and high to backside, I don’t suppose RIGHT JOIN
will turn out to be extra common any time quickly.
There may be, nonetheless, one place within the SQL language the place RIGHT JOIN
is surprisingly ubiquitous!
The MERGE assertion
Why is it a shock? As a result of that place doesn’t use the identical syntax for becoming a member of two tables. However that’s precisely what occurs within the MERGE
assertion. Let’s take a look at the next MERGE
assertion that takes:
- A staging desk (
SOURCE
desk) to load information from - An odd desk (
TARGET
desk) to retailer information into
With a schema like this:
CREATE TABLE book_to_book_store (
book_id BIGINT NOT NULL REFERENCES guide,
identify TEXT NOT NULL REFERENCES book_store,
inventory INT NOT NULL,
PRIMARY KEY (book_id, identify)
);
CREATE TABLE book_to_book_store_staging AS
SELECT * FROM book_to_book_store
WITH NO DATA;
A question that might be typical of an ETL job:
-- The goal desk
MERGE INTO book_to_book_store AS t
-- The supply desk
USING book_to_book_store_staging AS s
-- The RIGHT JOIN predicate
ON t.book_id = s.book_id AND t.identify = s.identify
-- The actions for every row, based mostly on RIGHT JOIN matching
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED THEN INSERT (book_id, identify, inventory)
VALUES (s.book_id, s.identify, s.inventory);
That is merely taking all of the rows from the BOOK_TO_BOOK_STORE_STAGING
desk, and merges them into BOOK_TO_BOOK_STORE
:
- If the row already exists (there’s a
MATCH
), then theSTOCK
is up to date - If the row doesn’t exist already (there’s no
MATCH
), then the row is inserted
However we don’t use this supply -> goal syntactic order, we first specify the goal desk BOOK_TO_BOOK_STORE
, after which we RIGHT JOIN
the BOOK_TO_BOOK_STORE_STAGING
desk to it. Give it some thought this fashion:
SELECT *
FROM book_to_book_store AS t
RIGHT JOIN book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.identify = s.identify
And, if we consider a RIGHT JOIN
not as a Venn diagram, however as a cartesian product as follows, then it may be seen simply what is finished per MATCH
or non-MATCH
:
|t.identify |t.book_id|t.inventory|s.identify |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Path |1 |1 | <-- NOT MATCHED
|Paper Path |3 |2 |Paper Path |3 |0 | <-- MATCHED
As at all times with a RIGHT JOIN
, each row from proper facet of the be part of is matched with an identical row from the left facet of the be part of, or an empty row of NULL
values, if there’s no such match. After this MERGE
, we wish the ensuing information to be up to date as follows:
|t.identify |t.book_id|t.inventory|s.identify |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED
|Faraway Land|1 |1 |Paper Path |1 |1 | <-- NOT MATCHED
|Paper Path |3 |0 |Paper Path |3 |0 | <-- MATCHED
That is how the MERGE
assertion works.
Notice, I mentioned earlier than that the JOIN
is producing a cartesian product. In contrast to with SELECT
statements, nonetheless, there’s a limitation to MERGE
the place the cartesian product should not produce any duplicate matches per TARGET
row, because the order of actions wouldn’t be outlined if there have been a number of SOURCE
rows per TARGET
row.
Deleting rows
MERGE
is extra highly effective than simply performing INSERT
and UPDATE
. It could possibly additionally DELETE
rows. Let’s assume that we wish a staging desk’s STOCK = 0
to imply that the row ought to be deleted, as an alternative of the STOCK
being set to 0
. Then we will write:
MERGE INTO book_to_book_store AS t
USING book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.identify = s.identify
WHEN MATCHED AND s.inventory = 0 THEN DELETE
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED THEN INSERT (book_id, identify, inventory)
VALUES (s.book_id, s.identify, s.inventory);
Now, with the above staging information, we’ll take away the final row as an alternative of updating it:
|t.identify |t.book_id|t.inventory|s.identify |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED : INSERT
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Paper Path |1 |1 |Paper Path |1 |1 | <-- NOT MATCHED : INSERT
| | | |Paper Path |3 |0 | <-- MATCHED : DELETE
The RIGHT JOIN
semantics remains to be the identical, simply the motion is totally different now, relying on the further AND
clause of the WHEN MATCHED
clause.
Matching by supply
Some RDBMS assist an much more highly effective vendor particular variant of MERGE
, which ought to be added to the IEC/ISO 9075 commonplace, in my view. The BY TARGET
/ BY SOURCE
clause. Let’s take a look on the following assertion:
MERGE INTO book_to_book_store AS t
USING book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.identify = s.identify
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED BY TARGET THEN INSERT (book_id, identify, inventory)
VALUES (s.book_id, s.identify, s.inventory)
WHEN NOT MATCHED BY SOURCE THEN DELETE;
Including a WHEN NOT MATCHED BY SOURCE
clause has the easy impact of turning the RIGHT JOIN
operation right into a FULL JOIN
operation. Consider it this fashion:
SELECT *
FROM book_to_book_store AS t
FULL JOIN book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.identify = s.identify
Now, the consequence may look one thing like this:
|t.identify |t.book_id|t.inventory|s.identify |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Path |1 |1 | <-- NOT MATCHED BY TARGET
|Paper Path |3 |2 | | | | <-- NOT MATCHED BY SOURCE
The phrases NOT MATCHED BY TARGET
and NOT MATCHED BY SOURCE
are fairly self-explanatory when visualised as above, and doubtless much less complicated to rookies than LEFT JOIN
and RIGHT JOIN
. I wouldn’t thoughts SQL syntax to be enhanced in a method that it could be attainable to determine whether or not a NULL
worth originating from an OUTER JOIN
is because of:
- The supply information containing the
NULL
worth - The row being
NOT MATCHED
by the “different facet” of theOUTER JOIN
Think about a hypothetical syntax like this:
SELECT c.first_name, c.last_name, p.quantity
FROM buyer AS c
LEFT JOIN fee AS p ON c.customer_id = p.customer_id
WHERE p IS NOT MATCHED BY JOIN -- Successfully an ANTI JOIN
Anyway…
When deleting rows, this method is rather more handy than having to depend on an interpretation of the semantics of information, comparable to STOCK = 0
that means a deletion. We now have absent rows within the SOURCE
desk (the staging) desk, which merely imply the row have to be deleted, if that’s how we wish to mannequin issues. So, after operating the above MERGE
assertion, we’ll get this consequence once more:
|t.identify |t.book_id|t.inventory|s.identify |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET : INSERT
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Faraway Land|1 |1 |Paper Path |1 |1 | <-- NOT MATCHED BY TARGET : INSERT
| | | | | | | <-- NOT MATCHED BY SOURCE : DELETE
No less than the next RDBMS assist the BY SOURCE
and BY TARGET
clauses:
- Databricks
- Firebird 5
- PostgreSQL 17
- SQL Server
Given how helpful that is, I’ll anticipate extra RDBMS to undertake this T-SQL syntax, quickly. jOOQ 3.20 has added assist for it, and a future model of jOOQ might emulate it by transferring the FULL JOIN
into the USING
clause.