Friday, March 14, 2025
HomeProgrammingA SQL MERGE assertion performs actions based mostly on a RIGHT JOIN

A SQL MERGE assertion performs actions based mostly on a RIGHT JOIN


RIGHT JOIN is an esoteric function within the SQL language, and hardly seen in the true world, as a result of virtually each RIGHT JOIN can simply be expressed as an equal LEFT JOIN. The next two statements are equal:

-- Common
SELECT c.first_name, c.last_name, p.quantity
FROM buyer AS c
LEFT JOIN fee AS p ON c.customer_id = p.customer_id

-- Esoteric
SELECT c.first_name, c.last_name, p.quantity
FROM fee AS p
RIGHT JOIN buyer AS c ON c.customer_id = p.customer_id

It’s not unreasonable to anticipate these two statements to supply the identical execution plan on most RDBMS, on condition that they’re logically equal. Since we’ve grown used to studying issues from left to proper and high to backside, I don’t suppose RIGHT JOIN will turn out to be extra common any time quickly.

There may be, nonetheless, one place within the SQL language the place RIGHT JOIN is surprisingly ubiquitous!

The MERGE assertion

Why is it a shock? As a result of that place doesn’t use the identical syntax for becoming a member of two tables. However that’s precisely what occurs within the MERGE assertion. Let’s take a look at the next MERGE assertion that takes:

  • A staging desk (SOURCE desk) to load information from
  • An odd desk (TARGET desk) to retailer information into

With a schema like this:

CREATE TABLE book_to_book_store (
  book_id BIGINT NOT NULL REFERENCES guide,
  identify TEXT NOT NULL REFERENCES book_store,
  inventory INT NOT NULL,

  PRIMARY KEY (book_id, identify)
);

CREATE TABLE book_to_book_store_staging AS 
SELECT * FROM book_to_book_store
WITH NO DATA;

A question that might be typical of an ETL job:

-- The goal desk
MERGE INTO book_to_book_store AS t

-- The supply desk
USING book_to_book_store_staging AS s

-- The RIGHT JOIN predicate
ON t.book_id = s.book_id AND t.identify = s.identify

-- The actions for every row, based mostly on RIGHT JOIN matching
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED THEN INSERT (book_id, identify, inventory) 
VALUES (s.book_id, s.identify, s.inventory);

That is merely taking all of the rows from the BOOK_TO_BOOK_STORE_STAGING desk, and merges them into BOOK_TO_BOOK_STORE:

  • If the row already exists (there’s a MATCH), then the STOCK is up to date
  • If the row doesn’t exist already (there’s no MATCH), then the row is inserted

However we don’t use this supply -> goal syntactic order, we first specify the goal desk BOOK_TO_BOOK_STORE, after which we RIGHT JOIN the BOOK_TO_BOOK_STORE_STAGING desk to it. Give it some thought this fashion:

SELECT *
FROM book_to_book_store AS t
RIGHT JOIN book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.identify = s.identify

And, if we consider a RIGHT JOIN not as a Venn diagram, however as a cartesian product as follows, then it may be seen simply what is finished per MATCH or non-MATCH:

|t.identify      |t.book_id|t.inventory|s.identify      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Path |1 |1 | <-- NOT MATCHED
|Paper Path |3 |2 |Paper Path |3 |0 | <-- MATCHED

As at all times with a RIGHT JOIN, each row from proper facet of the be part of is matched with an identical row from the left facet of the be part of, or an empty row of NULL values, if there’s no such match. After this MERGE, we wish the ensuing information to be up to date as follows:

|t.identify      |t.book_id|t.inventory|s.identify      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED
|Faraway Land|1 |1 |Paper Path |1 |1 | <-- NOT MATCHED
|Paper Path |3 |0 |Paper Path |3 |0 | <-- MATCHED

That is how the MERGE assertion works.

Notice, I mentioned earlier than that the JOIN is producing a cartesian product. In contrast to with SELECT statements, nonetheless, there’s a limitation to MERGE the place the cartesian product should not produce any duplicate matches per TARGET row, because the order of actions wouldn’t be outlined if there have been a number of SOURCE rows per TARGET row.

Deleting rows

MERGE is extra highly effective than simply performing INSERT and UPDATE. It could possibly additionally DELETE rows. Let’s assume that we wish a staging desk’s STOCK = 0 to imply that the row ought to be deleted, as an alternative of the STOCK being set to 0. Then we will write:

MERGE INTO book_to_book_store AS t
USING book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.identify = s.identify
WHEN MATCHED AND s.inventory = 0 THEN DELETE
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED THEN INSERT (book_id, identify, inventory) 
VALUES (s.book_id, s.identify, s.inventory);

Now, with the above staging information, we’ll take away the final row as an alternative of updating it:

|t.identify      |t.book_id|t.inventory|s.identify      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED : INSERT
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Paper Path |1 |1 |Paper Path |1 |1 | <-- NOT MATCHED : INSERT
| | | |Paper Path |3 |0 | <-- MATCHED : DELETE

The RIGHT JOIN semantics remains to be the identical, simply the motion is totally different now, relying on the further AND clause of the WHEN MATCHED clause.

Matching by supply

Some RDBMS assist an much more highly effective vendor particular variant of MERGE, which ought to be added to the IEC/ISO 9075 commonplace, in my view. The BY TARGET / BY SOURCE clause. Let’s take a look on the following assertion:

MERGE INTO book_to_book_store AS t
USING book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.identify = s.identify
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED BY TARGET THEN INSERT (book_id, identify, inventory) 
VALUES (s.book_id, s.identify, s.inventory)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

Including a WHEN NOT MATCHED BY SOURCE clause has the easy impact of turning the RIGHT JOIN operation right into a FULL JOIN operation. Consider it this fashion:

SELECT *
FROM book_to_book_store AS t
FULL JOIN book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.identify = s.identify

Now, the consequence may look one thing like this:

|t.identify      |t.book_id|t.inventory|s.identify      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Path |1 |1 | <-- NOT MATCHED BY TARGET
|Paper Path |3 |2 | | | | <-- NOT MATCHED BY SOURCE

The phrases NOT MATCHED BY TARGET and NOT MATCHED BY SOURCE are fairly self-explanatory when visualised as above, and doubtless much less complicated to rookies than LEFT JOIN and RIGHT JOIN. I wouldn’t thoughts SQL syntax to be enhanced in a method that it could be attainable to determine whether or not a NULL worth originating from an OUTER JOIN is because of:

  • The supply information containing the NULL worth
  • The row being NOT MATCHED by the “different facet” of the OUTER JOIN

Think about a hypothetical syntax like this:

SELECT c.first_name, c.last_name, p.quantity
FROM buyer AS c
LEFT JOIN fee AS p ON c.customer_id = p.customer_id
WHERE p IS NOT MATCHED BY JOIN -- Successfully an ANTI JOIN

Anyway…

When deleting rows, this method is rather more handy than having to depend on an interpretation of the semantics of information, comparable to STOCK = 0 that means a deletion. We now have absent rows within the SOURCE desk (the staging) desk, which merely imply the row have to be deleted, if that’s how we wish to mannequin issues. So, after operating the above MERGE assertion, we’ll get this consequence once more:

|t.identify      |t.book_id|t.inventory|s.identify      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET : INSERT
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Faraway Land|1 |1 |Paper Path |1 |1 | <-- NOT MATCHED BY TARGET : INSERT
| | | | | | | <-- NOT MATCHED BY SOURCE : DELETE

No less than the next RDBMS assist the BY SOURCE and BY TARGET clauses:

  • Databricks
  • Firebird 5
  • PostgreSQL 17
  • SQL Server

Given how helpful that is, I’ll anticipate extra RDBMS to undertake this T-SQL syntax, quickly. jOOQ 3.20 has added assist for it, and a future model of jOOQ might emulate it by transferring the FULL JOIN into the USING clause.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments