Monday, May 30, 2022
HomeData ScienceLyft Knowledge Scientist Interview Query Walkthrough | by Nathan Rosidi | Could,...

Lyft Knowledge Scientist Interview Query Walkthrough | by Nathan Rosidi | Could, 2022


A 3 step answer to an SQL query requested throughout an interview on the ride-hailing firm

Lyft Data Scientist Interview Question Walkthrough
Picture created by the writer on Canva

Practically 20 million folks use Lyft to maneuver round their cities. The corporate’s essential product is a user-friendly cell software that enables customers to seamlessly transfer from level A to level B.

Behind the consumer friendliness of the smartphone app, Lyft places quite a lot of effort into amassing and analyzing information about its customers and drivers. This helps them plan and optimize their sources to offer probably the most environment friendly inside metropolis transportation contemplating the value, time, and degree of consolation.

Interviewers at Lyft attempt to ask the appropriate questions to rent succesful information scientists. On this article, we’ll stroll you thru one of many widespread Lyft information scientist interview questions, the place candidates must calculate driver churn price primarily based on obtainable information.

Lyft is on the lookout for information scientists who might help the corporate by writing environment friendly and readable SQL queries. To do this, candidates have to totally perceive SQL ideas.

Let’s take a look at some important SQL ideas which might help you get the information scientist job at Lyft.

COUNT()

When going right into a information science interview, it’s necessary to know tips on how to use all mixture features. This contains their syntax, doable arguments, and output values. Studying “The Final Information to SQL Mixture Features” might help you higher perceive this SQL function.

To reply this specific query, a candidate wants to know the COUNT() mixture operate. You need to know that the COUNT() operate returns the variety of rows which have a price within the specified column. If there’s a NULL worth, the COUNT() operate won’t add that row to the whole.

Different helpful operations {that a} candidate would possibly have to carry out embody: getting the variety of rows with distinctive values in a selected column, getting the variety of whole data, getting the variety of data with NULL values in a selected column, giving an alias to the consequence, and so forth.

It’s additionally helpful to know tips on how to use the GROUP BY assertion to summarize the outcomes of mixture features akin to COUNT().

Within the best-case situation, you have to be accustomed to a few of the examples of how COUNT() can be utilized in follow. To reach on the remaining reply to this query, we’re going to use COUNT() to search out the variety of rows with a NULL worth within the specified column. Then we will discover the ratio between this quantity and the whole variety of rows within the desk.

One other necessary element is that the COUNT() mixture operate returns an integer worth. In our answer, we divide one output of a COUNT() operate by one other to get a decimal ratio. To get the specified consequence, we might want to convert one of many values to a float.

Changing Worth Varieties

When fixing a query, it’s necessary to know what kind of values you’re coping with. If wanted, you must be capable of convert one kind of worth into one other to carry out arithmetic operations and get correct outcomes.

A candidate ought to know tips on how to convert a date worth to a string, an integer to a float, a string to an integer, or some other mixture. One of the crucial necessary features to know is CAST(), however there’s additionally a shorthand syntax for altering the kind of worth.

On this specific query, we’re coping with the variations between integer division and float division. Discovering a solution requires an intensive understanding of the distinction between two arithmetic operations. You also needs to know after they happen and why divisions are of 1 kind or one other.

For instance, we’d have to divide the results of one COUNT() operate by one other. Each of those are integer values. If we wish to get a float worth as a solution, we’ve to carry out a float division. For that, we’ve to transform one among them to a float worth.

AS key phrase to offer alias

Figuring out the syntax and sensible purposes of the AS command might help you write simply comprehensible code. Descriptive column names also can assist others perceive obtainable information.

candidate ought to know tips on how to give alias names to desk references, subqueries, and even CTEs.

The AS command is commonly used with different SQL statements, akin to WITH and SELECT. Figuring out tips on how to mix these statements with the AS command is crucial to writing environment friendly queries.

Ideally, you must know different necessary particulars of how the AS key phrase works, for instance, the truth that an alias solely exists in the course of the question.

Lyft Data Scientist Interview Question Walkthrough
Picture created by the writer on Canva

Discovering Consumer Purchases

On this query, candidates must calculate a driver churn price. That is necessary for Lyft because it helps the corporate guarantee a gentle provide of drivers in any location.

On this query, we’re requested to search out the motive force churn price for all areas. We don’t must filter by area or time.

Finding User Purchases
Screenshot from StrataScratch

Understanding the idea of churn price is critical to give you a method to calculate it.

Hyperlink to the query: https://platform.stratascratch.com/coding/10016-churn-rate-of-lyft-drivers

Dataset for Finding User Purchases
Screenshot from StrataScratch

Earlier than arising with an method, take note of varieties of values in every of those columns.

All of the obtainable information for this query is contained within the 4 columns of lyft_drivers desk, so there isn’t a lot to research.

Data Assumptions
Screenshot from StrataScratch

Let’s check out 4 columns on this desk:

index — The values on this column are consecutive numbers assigned to every driver file. It’s secure to imagine that every one rows can have a price on this column.

start_date — This worth represents the date when the motive force first signed up with Lyft. It’s secure to imagine that every one rows can have a price on this column.

end_date — We have to test the worth on this column to search out the variety of energetic drivers. We’d like this quantity to calculate the churn price of drivers all through the years.

yearly_salary — The query doesn’t point out driver earnings, so we will ignore this column.

The truth that there isn’t any driver_id column tells us that we will assume that each file is exclusive.

Fastidiously studying the query is critical to unravel any SQL drawback. When you perceive the duty at hand, wanting on the obtainable information is all it’s worthwhile to get the reply.

This query is a first-rate instance of how previewing the information might help you discover the reply. For those who do take a look at the information, you’ll see that every row represents one driver.

All data comprise a price within the start_date column, however some have an empty end_date column. This might signify the scenario when a driver signed up for Lyft and by no means stopped (an energetic driver).

Then again, the rows which have each begin and finish values clearly signify the circumstances when drivers stopped working for Lyft. We might want to discover the variety of each energetic and inactive drivers to calculate the churn price.

When you logically join all of the dots, you can begin writing the question.

Calculating the motive force churn price means getting the share of drivers who had been as soon as driving for Lyft, however stopped for one motive or one other.

To discover a answer for any SQL problem, it’s worthwhile to take note of the phrasing of the query.

This specific query specifies that the output needs to be a ratio, which is all the time going to contain a division of 1 worth (half) by one other (whole). After wanting on the information, you’ll notice that we’ve to divide drivers who give up by the variety of all drivers.

The query doesn’t specify a time interval for counting the drivers. So there isn’t any have to filter the drivers by sign-up date or the tip date.

An necessary step for fixing this Lyft information scientist interview query is to translate an summary idea like churn price right into a method. Then you’ll be able to merely plug the values into that method to get the reply.

Earlier than writing any SQL code, it’s a good suggestion to explain broad steps to unravel the issue:

  • Get the whole variety of drivers and inactive drivers
  • Convert one of many integers to a float to get a decimal worth
  • Divide the variety of inactive drivers by the variety of all drivers

Previewing the desk might help us establish the drivers who’re now not driving for Lyft. If the end_date column for a driver file is empty, it implies that the motive force continues to be energetic. If the motive force has a selected date within the end_date column, it implies that driver is now not energetic.

Figuring out how the COUNT() mixture operate works might help you resolve this drawback in only a few strains of code. We are able to use COUNT(*), to get the variety of all drivers, and use COUNT(end_date) to get the variety of inactive drivers (those who comprise a price within the end_date column).

The one drawback is that the COUNT() mixture operate returns an integer worth, so we will probably be dividing integer by integer. The consequence will probably be 0, as an alternative of a decimal worth to signify the ratio. It’s because we’re doing an integer division, which all the time returns complete numbers in consequence.

We are able to do a float division by changing one of many values to a float. Then we will calculate a driver churn price, which is a decimal worth.

You should utilize the CAST() operate to show one of many integers right into a float, or use a shorthand double colon syntax to take action.

Operating the question will present that the column identify for our remaining output is just not descriptive. As a remaining step, we should use the AS command to offer the column a descriptive identify, akin to ‘driver_churn_rate’.

Step 1: Get the variety of inactive drivers and whole drivers

We’re going to use the COUNT(end_date) mixture operate to get the variety of inactive drivers (rows with a price within the end_date column).

We are able to additionally use COUNT(*) to get the whole variety of drivers. It’s secure to make use of the asterisk to get the variety of all data within the desk.

Clearly, we’ve to specify the desk to extract values from.

SELECT COUNT(end_date),
COUNT(*)
FROM lyft_drivers

If we run this code, we’ll see the variety of inactive drivers, vs the variety of ALL drivers:

Output for Lyft data science interview question
Screenshot from StrataScratch

Step 2: Discover the ratio

To seek out the ratio, we have to divide the variety of inactive drivers by the variety of all data. Contemplating each values are integers, first, we have to convert one among them to a float, so the results of the division will probably be a decimal worth.

We are able to use the shorthand syntax (::) to transform an integer worth to a float:

SELECT COUNT(end_date)::float/ COUNT(INDEX)
FROM lyft_drivers

If we run the question, it’ll return a solution:

Output for Lyft data science interview question
Screenshot from StrataScratch

The one drawback left is the column identify.

Step 3: Identify the column

Now, we simply want to call the column to explain what the worth represents. For that, we’re going to use the AS command in SQL.

SELECT COUNT(end_date)::float/ COUNT(INDEX) AS driver_churn_rate
FROM lyft_drivers

Now, once we run this code, the column has a way more comprehensible identify driver_churn_rate

Output for Lyft data science interview question
Screenshot from StrataScratch

If we test our remaining answer, it’s accepted.

Churn price is a crucial metric for a lot of corporations. Fixing this Lyft information scientist interview query will educate you what the idea represents and tips on how to calculate it primarily based on obtainable information.

StrataScratch has every little thing to put together you for an information scientist interview. You possibly can test in case your answer is appropriate and take a look at different folks’s options to know other ways to method the issue.

Our platform permits you to type by means of information science interview questions for every information science firm, so you’ll be able to put together for an interview together with your desired employer. It additionally contains non-technical questions and every little thing else it’s worthwhile to get a job.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments