Sandwich Information Science 2: Electrical Boogaloo

I advised myself I wouldn’t do it once more. The final time almost broke me. And but, simply after I thought I used to be out, they pull me again in.

In opposition to my higher judgement, I did one other sandwich information science undertaking. Fortunately, this one was considerably easier.

I work at Sq., and their NYC workplace is in SoHo. Whereas there are a lot of causes not to enter the workplace these days, one draw is that I can decide up lunch at Alidoro, a tiny Italian sandwich store that’s close by. The sandwiches are the quintissential European antithesis to American sandwiches; they encompass solely a pair, extraordinarily prime quality components.

From these few components emerge 40 several types of sandwiches, and these 40 sandwiches kind an impenetrable menu.

Naively, it’s possible you’ll assume you’ll be able to decide a sandwich that appears near what you need after which customise it. Maybe you want to the Romeo however with some contemporary mozzarella? Properly then maybe you’ll be mistaken as a result of customization just isn’t allowed. Did I point out that there are some Soup Nazi vibes to this place? You’ll be able to solely order what’s on the menu, and it took the worldwide pandemic to lastly break their will to stay money solely.

Some individuals wish to discover new objects on a menu, whereas I at all times exploit the one which I’ve been proud of. Working example: I get the Fellini on Foccacia each time. Nonetheless, I keep in mind what it was wish to be a newcomer and encounter that impenetrable menu.

And so, this weblog publish is my try at information visualization. My objective is to visualise the menu in such a means that one can rapidly scan it to discover a sandwich they want. As an added bonus, I’ll shut with some statistical modeling of the sandwich pricing.

Packaging and Presentation

Like a lot of my weblog posts, I wrote this one in a Jupyter pocket book. Whereas I would like to point out the total code for the weblog publish, I didn’t need this publish to be as impenetrable as Alidoro’s menu. I made a small sandmat bundle to deal with a lot of the code. The bundle, together with the Jupyter pocket book model of this weblog publish may be discovered on GitHub right here.

%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

from sandmat import scrape, sorting, viz

To start out, we have to get the menu and switch it into “information”. For no matter cause, I didn’t really feel like utilizing pandas for this weblog publish, so every part we’ll take care of will likely be collections of dataclasses.

Inside the sandmat bundle, I make two dataclasses: Ingredient and Sandwich. Many of the fields are self-explanatory with the exception for the ingredient classes. For these, I manually classify components into meat, cheese, topping, or dressing. In hindsight, I in all probability ought to’ve made this an Enum area.

@dataclass(frozen=True)
class Ingredient:
    title: str
    class: str


@dataclass(frozen=True)
class Sandwich:
    title: str
    components: Tuple[Ingredient]
    worth: float

I take advantage of Lovely Soup to scrape the menu web page of the Alidoro web site and seize the part of the HTML that pertains to the menu. I then do some parsing, cleansing, and categorization with a view to flip the menu into a listing of Sandwich objects.

URL = "https://www.alidoronyc.com/menu/menu/"

sandwiches = scrape.get_sandwiches(URL)

print(f"{len(sandwiches)} sandwiches discovered.")
print("Displaying the primary two:n")
for sandwich in sandwiches[:2]:
    print(sandwich)
    print()

40 sandwiches discovered.
Displaying the primary two:

Sandwich(title="Matthew", components=(Ingredient(title="prosciutto", class='meat'), Ingredient(title="contemporary mozzarella", class='cheese'), Ingredient(title="dressing", class='dressing'), Ingredient(title="arugula", class='topping')), worth=14.0)

Sandwich(title="Alyssa", components=(Ingredient(title="smoked hen breast", class='meat'), Ingredient(title="contemporary mozzarella", class='cheese'), Ingredient(title="arugula", class='topping'), Ingredient(title="dressing", class='dressing')), worth=14.0)

Ingredient Rank and File

I wish to show the sandwiches in “matrix” kind. Every sandwich will likely be a row, every ingredient will likely be a column, and the values of the matrix will point out if a sandwich has a selected ingredient. What’s left is to resolve on an order to the sandwich rows and an order to the ingredient columns.

In my preliminary method, I coded up a touring salesman drawback wherein sandwiches had been cities and the overlap in components between any two sandwiches was the “distance” between sandwiches. It might’ve made for the proper title (“Touring Sandwich Downside”, clearly), however, opposite to the numerical answer, the consequence was visually suboptimal.

Fortunately, it is a drawback the place we are able to depend on area experience. As a sandwich eater myself, I thought of how I sometimes decide a sandwich. I typically take a look at the meats first, then the cheeses, after which every part else. Okay, let’s kind the ingredient columns by class “rank”: meat, cheese, topping, dressing. Inside every class, how about utilizing the recsys go-to of sorting in descending order of recognition? Combining class rank and recognition provides us our full ingredient column order. In SQL, we’d wish to do one thing like

SELECT
  class
  , CASE 
    WHEN class = 'meat' THEN 1
    WHEN class = 'cheese' THEN 2
    WHEN cateogry = 'topping' THEN 3
    WHEN class = 'dressing' THEN 4
  END AS category_rank
  , ingredient
  , COUNT(DISTINCT sandwich) as num_sandwiches
FROM sandwich_ingredients
GROUP BY class, category_rank, ingredient
ORDER BY category_rank ASC, num_sandwiches DESC

ranked_categories = sorting.get_ranked_categories(sandwiches)
ordered_ingredients = sorting.get_ordered_ingredients(ranked_categories)

For ordering our sandwich rows, let’s kind them by a particular key which is a tuple that comprises their hottest ingredient in every class the place the tuple is so as of meat, cheese, topping, dressing.

ordered_sandwiches = sorting.get_ordered_sandwiches(sandwiches, ranked_categories)

Visualizing the Matrix

Lastly, with our ordered components and sandwiches, we are able to visualize the Alidoro sandwich menu as a matrix.

sandwich_mat = viz.make_sandwich_matrix(ordered_sandwiches, ordered_ingredients)

fig, ax = viz.plot_sandwiches(sandwich_mat, ordered_sandwiches, ordered_ingredients)
plt.present();

Only for prosciuttos and giggles, I made a decision to deal with my sandwich matrix as a design matrix. I’ll match a linear regression on the sandwich matrix with the sandwich worth because the goal variable. The mannequin coefficients will thus be the value of every ingredient, and a bias time period will maintain the bottom worth of the sandwich (which incorporates the bread). As you’ll be able to see, the mannequin is fairly well-calibrated! I suppose Alidoro’s sandwich pricing is fairly constant.

import statsmodels.api as sm
import numpy as np

y = np.array([sandwich.price for sandwich in ordered_sandwiches])
X = sandwich_mat.copy()

X = sm.add_constant(X, prepend=True)

mannequin = sm.OLS(y, X)
res = mannequin.match()

res.abstract(
    yname="Value ($)", xname=["Base Sandwich Price"] + record(ordered_ingredients)
)


OLS Regression Outcomes

  Dep. Variable:         Value ($)       R-squared:             0.971


  Mannequin:                    OLS          Adj. R-squared:        0.940


  Methodology:              Least Squares     F-statistic:           31.39


  Date:              Solar, 26 Sep 2021    Prob (F-statistic):  1.48e-10


  Time:                  10:22:02        Log-Chance:       9.6979


  No. Observations:           40         AIC:                   22.60


  Df Residuals:               19         BIC:                   58.07


  Df Mannequin:                   20                                     


  Covariance Sort:       nonrobust                                   




                           coef      std err       t       P>|t|   [0.025     0.975]  


  Base Sandwich Value        8.0451      0.265     30.334   0.000      7.490      8.600


  prosciutto                 2.1138      0.166     12.769   0.000      1.767      2.460


  sopressata                 1.9554      0.152     12.875   0.000      1.638      2.273


  smoked hen breast      2.0618      0.182     11.323   0.000      1.681      2.443


  tuna                       1.7025      0.171      9.940   0.000      1.344      2.061


  salami                     2.1288      0.279      7.641   0.000      1.546      2.712


  capicollo                  2.0982      0.327      6.421   0.000      1.414      2.782


  mortadella                 3.0738      0.359      8.573   0.000      2.323      3.824


  sardines or mackerel       2.4387      0.375      6.497   0.000      1.653      3.224


  contemporary mozzarella           1.3168      0.174      7.581   0.000      0.953      1.680


  smoked mozzarella          1.3141      0.210      6.271   0.000      0.875      1.753


  m. bel paese               1.2748      0.223      5.707   0.000      0.807      1.742


  provolone cheese           1.3559      0.250      5.429   0.000      0.833      1.879


  arugula                    1.2985      0.129     10.076   0.000      1.029      1.568


  artichokes                 1.2708      0.140      9.074   0.000      0.978      1.564


  solar dried tomatoes         1.2414      0.147      8.458   0.000      0.934      1.549


  candy roasted peppers      1.1692      0.135      8.637   0.000      0.886      1.453


  sizzling peppers                1.0734      0.183      5.850   0.000      0.689      1.458


  caponata of eggplant       1.0643      0.210      5.074   0.000      0.625      1.503


  dressing                   1.0242      0.172      5.963   0.000      0.665      1.384


  olive paste                0.5690      0.285      1.998   0.060     -0.027      1.165




  Omnibus:        14.030    Durbin-Watson:         2.450


  Prob(Omnibus):   0.001    Jarque-Bera (JB):     17.010


  Skew:           -1.089    Prob(JB):           0.000202


  Kurtosis:        5.337    Cond. No.               15.8

Notes:
[1] Customary Errors assume that the covariance matrix of the errors is accurately specified.

OLS Regression Outcomes
Dep. Variable:	Value ($)	R-squared:	0.971
Mannequin:	OLS	Adj. R-squared:	0.940
Methodology:	Least Squares	F-statistic:	31.39
Date:	Solar, 26 Sep 2021	Prob (F-statistic):	1.48e-10
Time:	10:22:02	Log-Chance:	9.6979
No. Observations:	40	AIC:	22.60
Df Residuals:	19	BIC:	58.07
Df Mannequin:	20
Covariance Sort:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Base Sandwich Value	8.0451	0.265	30.334	0.000	7.490	8.600
prosciutto	2.1138	0.166	12.769	0.000	1.767	2.460
sopressata	1.9554	0.152	12.875	0.000	1.638	2.273
smoked hen breast	2.0618	0.182	11.323	0.000	1.681	2.443
tuna	1.7025	0.171	9.940	0.000	1.344	2.061
salami	2.1288	0.279	7.641	0.000	1.546	2.712
capicollo	2.0982	0.327	6.421	0.000	1.414	2.782
mortadella	3.0738	0.359	8.573	0.000	2.323	3.824
sardines or mackerel	2.4387	0.375	6.497	0.000	1.653	3.224
contemporary mozzarella	1.3168	0.174	7.581	0.000	0.953	1.680
smoked mozzarella	1.3141	0.210	6.271	0.000	0.875	1.753
m. bel paese	1.2748	0.223	5.707	0.000	0.807	1.742
provolone cheese	1.3559	0.250	5.429	0.000	0.833	1.879
arugula	1.2985	0.129	10.076	0.000	1.029	1.568
artichokes	1.2708	0.140	9.074	0.000	0.978	1.564
solar dried tomatoes	1.2414	0.147	8.458	0.000	0.934	1.549
candy roasted peppers	1.1692	0.135	8.637	0.000	0.886	1.453
sizzling peppers	1.0734	0.183	5.850	0.000	0.689	1.458
caponata of eggplant	1.0643	0.210	5.074	0.000	0.625	1.503
dressing	1.0242	0.172	5.963	0.000	0.665	1.384
olive paste	0.5690	0.285	1.998	0.060	-0.027	1.165

Omnibus:	14.030	Durbin-Watson:	2.450
Prob(Omnibus):	0.001	Jarque-Bera (JB):	17.010
Skew:	-1.089	Prob(JB):	0.000202
Kurtosis:	5.337	Cond. No.	15.8

We are able to examine this mannequin visually by plotting the costs of the entire components. I had no thought mortadella was the most costly meat.

viz.plot_ingredients(ordered_ingredients, res)

And final however not least, we are able to evaluate the sandwich worth to the mannequin’s predicted worth with a view to get an thought if any sandwich’s worth is wildly inconsistent. Most sandiwch costs are constant, though the Gabriella is outwardly cheaper than anticipated at $11.00 for (solely!) contemporary mozzarella, dressing, and arugula. I don’t know if I’d name that low-cost, however, then once more, neither is SoHo.

y_pred = mannequin.predict(res.params)

chart = viz.plot_actual_vs_pred(y, y_pred, ordered_sandwiches)
chart.properties(width=400, peak=400)

Sandwich Information Science 2: Electrical Boogaloo

Packaging and Presentation

Ingredient Rank and File

Visualizing the Matrix

Construction Your Knowledge Science Venture — The Double Diamond Mannequin | by Wouter van Heeswijk, PhD | Might, 2022

Startup’s loss is IT’s achieve

Doing Freelance Knowledge Science Consulting in 2019

LEAVE A REPLY Cancel reply

Most Popular

How you can Repair the 0x000007b Error Code on Home windows 10/11?

Easy methods to Share Images With Micro.weblog — Free Instagram Various

17 Strategies to Optimize Laravel Efficiency

ASRock Unveils X670E Taichi and Taichi Carrara Version Motherboards

Recent Comments

ABOUT US

POPULAR POSTS

How you can Repair the 0x000007b Error Code on Home windows 10/11?

Easy methods to Share Images With Micro.weblog — Free Instagram Various

17 Strategies to Optimize Laravel Efficiency

POPULAR CATEGORY