Rolex Yacht-Master 40 Diamond Pave Dial Men's Watch 126655-0005

Rolex Yacht-Master 40 Diamond Pave Dial Men's Watch 126655-0005

rolex yacht master pave dial

You May Also Like

Rolex Yacht-Master 40 Chocolate Dial Men's Watch 126621-0001

Subscribe to our mailing list

Join the wait list.

Prestige Time Logo

  • Audemars Piguet
  • Baume & Mercier
  • Bell & Ross
  • Girard Perregaux
  • Glashutte Original
  • Grand Seiko
  • Jaeger LeCoultre
  • Jaquet Droz
  • Maurice Lacroix
  • Nomos Glashutte
  • Patek Philippe
  • Ulysse Nardin
  • Orbita Winders & Cases
  • All men's watches
  • SHOP BY PRICE
  • under $2,000
  • $2,000 - $5,000
  • $5,001 - $8,000
  • $8,001 - $12,000
  • $12,001 and up
  • SHOP MEN'S SALE
  • All men's sale
  • SHOP SALE BY PRICE
  • SHOP BY MOVEMENT
  • Manual Wind
  • SHOP WINDERS
  • All women's watches
  • SHOP WOMEN'S SALE
  • All women's sale
  • Returns & Exchanges
  • Payment and Shipping
  • Warranty and Repair
  • International Ordering
  • WATCH TRADE-IN
  • ADVANCED SEARCH
  • Yacht-Master
  • Yacht-Master 40mm
  • 126655 Pave Diamond

126655 Pave Diamond

  • Add to Wishlist
  • Compare watch

Displayed rates are only for informational purposes and do not reflect on the actual rates you may be charged by the financial institution handling your transaction.

Euro€51,367.99
GB Pound£43,825.09
AU $$81,480.72

Prestige Time, LLC accepts payment in US Dollars only. Rates do not include taxes, duties, shipping, insruance, or any other expenses associated with the purchase.

  • Ask a question
  • Email a friend

Model #: 126655 Pave Diamond Rolex Yacht-Master 40mm Mens Watch

  • Watch Details
  • Shipping & Returns
  • Warranty/Guarantee
  • Questions & Reviews
Case Shape Round
Case Dimensions 40mm
Case Material 18kt Everose Gold
Dial Color Diamonds
Crystal Scratch Resistant Sapphire
Bezel Bi-Directional Rotating
Screw-in Crown Yes
Does not allow contact with water
30m/99ft - 50m/165ftAllows for contact with water such as washing hands and rain
50m/165ft - 100m/330ftAllows for light poolside swimming
100m/330ft - 200m/660ftAllows for swimming, snorkeling and showering (do not expose to hot water)
200m/660ft - 500m/1650ftAllows for impact water sports such as board diving and scuba diving
500m/1650ft +Appropriate for serious deep water diving.

Learn more about Water Resistance.

Rolex is a registered trademark of Rolex USA. Prestige Time LLC is not an authorized dealer for Rolex and is in NO WAY affiliated with Rolex SA or Rolex USA. All Rolex watches sold by Prestige Time LLC are in UNWORN condition.

  • Polished 18kt Everose gold case.
  • Polished 18kt Everose gold Triplock screw-down crown.
  • Polished 18kt Everose gold bi-directional rotating bezel with a ribbed edge for a sure grip.
  • Matt black Cerachrom bezel insert with polished finished raised Arabic Numerals & graduating indexes.
  • Pave diamond set dial.
  • Applied polished Everose gold rimmed hour markers with luminescent fill.
  • Polished 18kt Everose gold hands with luminescent fill.
  • Date window displayed at the 3 o'clock position with a Cyclops magnifier.
  • Rolex in-house Superlative Chronometer (COSC + Rolex certification after casing) caliber 3235, with Paramagnetic blue Parachrom hairspring & high-performance Paraflex shock absorbers. The movement beats at 28,800 vph, contains 31 jewels & has an approximate power reserve of 70 hours.

Alternate model # m126655-0005

Mens rolex 126655 pave diamond yacht-master 40mm watch.


Click Here to Prequalify

Receive a 2% discount with a bank-to-bank wire transfer on watches over $1,500.

We accept personal and cashier checks issued from US banks only. Allow 3-7 days for funds to clear. Receive a 2% discount with this payment method.

Receive a 2% discount with a bank-to-bank wire transfer on watches over $1,500.

All packages shipped via Prestige Time LLC are fully insured against loss, theft and damage during transit. All packages are shipped with an "Adult Signature" requirement prior to release by the FedEx driver. Before signing please inspect the package to ensure it has not been tampered with or damaged. View complete Shipping policies for US orders .

Domestic shipping fees (USA only):
FedEx Standard (3 business days delivery) FREE ($25.00 value)
FedEx 2nd Day (2 business days delivery) $35.00 ($60.00 value)
FedEx Next Day (next business day delivery) $50.00 ($75.00 value)
FedEx Priority Overnight delivery (delivery before 10:30 AM) $80.00 ($105.00 value)

Shipments to Alaska, PR and Hawaii: Orders shipped to Alaska, Puerto Rico and Hawaii will be charged a flat fee of $60 and are usually delivered within two to three business days from shipping.

All quoted prices and actual charges are in US Dollars.

Our preferable method of shipping is via DHL and FedEx Priority International. All parcels are insured for the full value against loss, theft or damage while in transit. Before signing please inspect the package to ensure it has not been tampered with or damaged. View complete Shipping policies for International orders .

International shipping fees:
up to $5,000 $60 $70 - $100
$5,000 - $8,000 $80 $100 - $150
$8,000 and up $100 $150+

All prices quoted are before your local taxes, duties, VAT, GST, PST or any other such charges. These fees will be assessed and charged separately when the watch clears customs in your country. These fees are the customer's responsibility.

U.S. CUSTOMERS:

  • We offer a 14 day return policy from the day of receipt.
  • Returns are subject to a 2% restocking fee (waived for an exchange of equal or close to equal value).
  • Some watches are considered a "special order" in which case the watch is a final sale or has a shortened return policy. Should the watch be considered a "special order" we will inform you prior to shipping.

INTERNATIONAL CUSTOMERS:

  • All international orders are considered a "special order", which means that the sale is final & non-returnable

Recent changes to sales tax laws for remote sellers (e-commerce businesses) require us to collect sales tax on orders shipped into the states listed below. Sales tax is calculated in the shopping cart:

Arizona Massachusetts Pennsylvania
California Michigan Tennessee
Colorado New Jersey Texas
Florida New York Virginia
Georgia North Carolina Washington
Illinois Ohio

Prestige Time specializes in the sale of authentic watches at discounted prices. We have been selling watches online since 1999! All new/unworn watches sold by Prestige Time, LLC are guaranteed to be 100% genuine, unworn, in the original box and with the original serial numbers intact. View our detailed return and warranty policies. Your satisfaction is guaranteed.

This watch is covered by the Prestige Time 5 Year Warranty from the date of sale.

  • Manufacturing defects
  • Battery replacement
  • Finishes such as (but not limited to) scratches, nicks, coatings, etc.
  • Crystal or glass damage
  • Straps such as (but not limited to) leather, fabric or rubber
  • Damage due to wear and tear
  • Damage resulting from wear under conditions exceeding the manufacturer’s water resistant rating.
  • Damage due to physical shock and/or accidental abuse.

BBB logo

  • Watches are our specialty - it is all we do and we do it well. The PrestigeTime.com website is dedicated to watches - we don't also sell bags, flatware and toys, etc.
  • We are available to answer the phone during business hours. Call us and you will be connected with a live, polite and knowledgeable representative.
  • Our customer service is par none! We encourage you to Google our name and compare our customer service reviews to our competitors'.
  • Our website is easy and clear to navigate with a secure shopping cart, designed for a pleasant shopping experience.
  • Our website is full of relevant and detailed information. We update our site constantly so check back often for special offers and new products .
  • We thoroughly inspect the product prior to shipping. We ship securely and professionally.
  • We have a fair and reasonable return policy .
  • Our support and interaction does not end with a sale. We remain just as accessible post sale as we were before and during the sale! Our after-sales support goes above and beyond what you may have come to expect from an online retailer.
  • Our rate of repeat customers is a testament to the Prestige Time shopping experience. Indeed, we have become personal friends with many of our customers.
  • We respect your privacy and your inbox. At your choice, we will periodically send you newsletters and promotional emails. We will NOT barrage you with a relentless stream of daily or even weekly emails.

126655

Rate and review this Rolex 126655 Pave Diamond Yacht-Master 40mm :

Rolex 126655 Pave Diamond Yacht-Master 40mm

Post a public question about this product. Accepted questions and its answers are generally posted in 2-4 business days.

Want to learn more about this watch? Please fill out the form below and we will respond to you shortly. For immediate assistance, please call us at 800-470-2343.

Availability

Share this watch with a friend via email. Please fill out the information below and your email will be sent.

Share

Please fill out the form below to check on the availability of an item. We will respond to you shortly with availability and shipping information.

Available

Prestige Time, LLC accepts payment in US Dollars only. Rates do not include taxes, duties, shipping, insurance, or any other expenses associated with the purchase.

 alt=

Thank you for submitting your information. A confirmation email has been sent to . In order to complete the sign-up process, you need to verify your email address by clicking on the link sent to you in that email.

Thank you for signing up.

Sincerely, The Prestige Time Team

Stay up to date on great deals! Subscribe for our promotional emails and newsletters.

tracking

This store requires javascript to be enabled for some features to work correctly.

SAVE 50% ON ZODIAC NECKLACES

IN-STORE EXCLUSIVE- FROM THE VAULT EVENT

  • Ready to Ship: Natural Diamond
  • Ready to Ship: Lab Diamond

Shop by Shape

Asscher

WORK WITH A DIAMOND SPECIALIST

You imagine it, we create it

EAST-WEST ENGAGEMENT RINGS

You asked, we answered

  • Engagement Rings
  • Diamond Rings
  • Gemstone Rings
  • Plain Gold Rings
  • All Earrings

Necklaces and Chains

  • All Women's Necklaces
  • Name, Number, and Personalized Necklaces
  • Custom and Engravable Pendants
  • Cross Necklaces
  • Men's Necklaces
  • All Bracelets
  • Tennis Bracelets
  • Fine Jewelry Sets ♡
  • Jewelry for Kids
  • Personalized Necklaces
  • $500 and Under
  • $2000 and Under
  • Custom Jewelry Inquiry

Women's

  • All Women's Bands
  • Plain Gold Bands
  • Ring Enhancers
  • Shared Prong Diamond Bands
  • All Men's Bands
  • Men's Plain Gold Bands
  • Tungsten Bands

Watch Specials

  • TUESDAY TOP 5
  • WATCH SALE FRIDAY
  • All Men's Watches
  • All Women's Watches

Buy/Sell/Trade

  • Looking for a Watch?
  • Sell or Trade
  • Lifetime Warranty
  • Return and Exchange Policy
  • Partnerships & Activations
  • East-West Collection
  • Engagement Ring Wednesday
  • Fine Jewelry: New Arrivals
  • Pre-Loved Designer Jewelry
  • Pink Tag Sale

Rolex Yacht-Master 40mm Pave Dial Watch Ref# 126655

Brand Rolex   
Model         Yacht-Master 
Reference # 126655
Case Size and Material 40mm / Rose Gold
Bracelet Style and Material Oysterflex w/ Rose Gold Clasp
Dial Factory Pave Dial with Dot Hour Markers
Bezel Black Ceramic
Movement Automatic Winding Movement
Power Reserve 70 Hours
Water Resistance 100m / 330ft
Crystal Scratch Resistant Sapphire Crystal
Condition Pre-Owned; Excellent Condition
Box Rolex Box
Papers Yes
Year or Est. Production 2021
Internal Sku 590-15949
Included Warranty Happy Jewelers 2-Year Limited Warranty
Appraisal  Included in box 

SHIPPING POLICY

*All jewelry and watch orders require signature upon delivery.

**We currently ship to the United States, Canada, and Australia.

Average shipping times:  Please note that our processing times still apply. 

UPS Ground: 3-6 business days (excluding weekends and holidays)

UPS Expedited- available upon request to select addresses- fees will apply

FedEx Expedited- available upon request to select addresses- fees will apply

*The purchaser is responsible for covering any expenses related to a package returned to us as a result of multiple unsuccessful delivery attempts.

**International orders may be required to pay duty or VAT. These are the responsibility of the purchaser and will be collected by the carrier before your package can be delivered. Happy Jewelers is not responsible for returned packages due to unpaid duty or VAT.

Once your piece is complete, we will email you with tracking information.

SHIPPING POLICY +

RETURN AND EXCHANGE POLICY

View our full Return and Exchange Policy: here

ENGAGEMENT RINGS

We do not offer any full refunds on engagement rings.

Any natural engagement ring returned within 30 days of purchase is subject to a  50% restocking fee . After 30 days, returned natural engagement rings will be subject to fair market value.

A Happy Jewelers  lab diamond  may be returned,  within 30 days of purchase minus   a 50% restocking fee for store credit only to be used towards a fine jewelry purchase . No returns on lab diamonds after 30 days.

Engagement rings may be upgraded by purchasing a new diamond that is a minimum of double the original purchase price.

FINE JEWELRY (NON-CUSTOM)

We are happy to offer a full refund or store credit for a return or exchange on non-custom jewelry pieces within  10 days of receiving your jewelry .

Jewelry must be returned in the same condition and working order, including packaging, as originally received.

All returns, after 10 days and up to 90 days, are subject to a  50% restocking fee and store credit only towards a fine jewelry purchase.  

We do not allow exchanges on any altered or engraved items.

All purchases on markdown items, including Sale of the Day, Pink Tag Sale, and  Pre-owned Designer Jewelry  are  final sale and not eligible  for return/exchange or refund.

All watch sales are final .

RETURNS POLICY +

Thanks! We will notify you when this product becomes available!

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trusted by Celebrities and Pro Athletes

rolex yacht master pave dial

Country/region

  • Australia (USD $)
  • Canada (USD $)
  • United States (USD $)

© 2024 Happy Jewelers

Find anything you need

rolex yacht master pave dial

You have %itemCount% in your cart. Total being %total%

Rolex Yacht-Master 126655 Rose Gold Pave Diamond Dial

Thumbnail for Luxury Watch Rolex Yacht Master 40 Rose Gold Pave Diamond Dial 126655 wrist aficionado

Contact Us: (212)422-2600

Reference Number 126655
Model Yacht-Master 40
Movement Automatic
Case Material Rose Gold
Bracelet Material Black Rubber
Dial Pave Diamond Set
Case Diameter 40mm
Year
Condition
Box & Papers Original box, original papers

Description

Embark on a journey of luxury and elegance with the remarkable Rolex Yacht-Master 40 Rose Gold Pave Diamond Dial 126655. This exquisite timepiece is a true testament to Rolex's mastery of watchmaking artistry and showcases the perfect balance between functionality and opulence.  The 40mm Everose gold case exudes a warm and radiant glow, elevating your style with its timeless beauty. The bezel, adorned with dazzling diamonds, adds an extra touch of brilliance and sets this watch apart from the rest. Every diamond is carefully selected and meticulously set, creating a stunning display that captivates all who behold it.  The pave diamond dial is a true masterpiece, featuring a myriad of sparkling diamonds set in a wave pattern. The contrast between the lustrous diamonds and the elegant rose gold hands and hour markers creates a mesmerizing visual effect that is sure to turn heads wherever you go.  Crafted with the utmost precision, the Rolex Yacht-Master 40 Rose Gold Pave Diamond Dial 126655 is powered by the self-winding Caliber 3235 movement, ensuring exceptional accuracy and reliability. With a power reserve of approximately 70 hours, you can trust this timepiece to keep you on time and in style.  The Oysterflex bracelet combines the durability of a metal bracelet with the flexibility and comfort of a rubber strap. Its innovative design and high-performance elastomer ensure a secure and comfortable fit on your wrist, while the Everose gold folding clasp adds a touch of refinement.  Water-resistant up to 100 meters, this Yacht-Master is ready to accompany you on your nautical adventures, reflecting the spirit of the open seas and the allure of luxury. The Rolex Yacht-Master 40 Rose Gold Pave Diamond Dial 126655 is a testament to your discerning taste and appreciation for the finest craftsmanship.

We are an independently owned and operated business with boutique locations in New York,  Miami, and Beverly Hills. Feel free to stop by either location. If you aren’t finding what you’re looking for, please contact us. We wholeheartedly believe that every first-time buyer will be a client for life.

Free Insured Domestic Overnight FedEx Shipping is included in the price listed. 
We ship internationally. International customers are responsible for the cost of shipping along with the custom taxes and/or duties of the receiving country.

All of our watches are backed by a 1 Year Limited Warranty on the movement of the watch. The warranty does not cover loss, theft, minor cosmetic damage to your watch. Any damage caused by negligence or mishandling of the timepiece is not covered including water damage.

Products are usually delivered in 1-2 Business days.

New York Location 19 E 62nd St. New York New York, NY 10065 (212) 422-2600 Open Monday - Friday 10 AM - 6 PM

Miami Location Setai Hotel Miami Beach 2001 Collins Ave Miami Beach, FL 33139 (305) 520-6958 Open Monday - Sunday 10 AM - 8 PM

Beverly Hills Location Waldorf Astoria Hotel 9850 Wilshire Blvd Beverly Hills, CA 90210 (310) 294-6305 Open Monday - Sunday 10 AM - 8 PM

[email protected]

Media Enquiries

Sell Your Watch

Wire Transfer

Bitpay Payment

Privacy Policy

Cookie Policy

Authenticity & Service Guarantee

Accessibility

Refund Policy

Terms of Service

  • Скидки дня
  • Справка и помощь
  • Адрес доставки Идет загрузка... Ошибка: повторите попытку ОК
  • Продажи
  • Список отслеживания Развернуть список отслеживаемых товаров Идет загрузка... Войдите в систему , чтобы просмотреть свои сведения о пользователе
  • Краткий обзор
  • Недавно просмотренные
  • Ставки/предложения
  • Список отслеживания
  • История покупок
  • Купить опять
  • Объявления о товарах
  • Сохраненные запросы поиска
  • Сохраненные продавцы
  • Сообщения
  • Развернуть корзину Идет загрузка... Произошла ошибка. Чтобы узнать подробнее, посмотрите корзину.

Oops! Looks like we're having trouble connecting to our server.

Refresh your browser window to try again.

Shop Luxury Watches

Best sellers.

  • Rolex Submariner Date 126610
  • Rolex Cosmograph Daytona 116500
  • Rolex Datejust 41 126334
  • Rolex Datejust 36 126234
  • Rolex Datejust 41 126300
  • GMT-Master II
  • Cosmograph Daytona
  • Oyster Perpetual

Popular Collections

  • Omega Seamaster
  • Omega Speedmaster
  • Breitlling Chronomat
  • Cartier Ballon Bleu
  • Tag Heuer Formula 1
  • Patek Philippe

Iconic Brands

  • Grand Seiko
  • Audemars Piguet
  • Richard Mille

Clocks Rolex 42 White Gold 226659-0002

Want to sell the same watch?

See all watches available in Moscow

Buying watches

Found cheaper? Please let us know! We will offer the best price!

data engineering capstone project github

Ibm capstone data engineering project.

This project explored several data engineering technologies, concepts and skills that I acquired while completing the IBM Data Engineering Professional Certificate. You can find all the screenshots and scripts pertaining to this project on GitHub.

Data Platform Architecture and OLTP Database

  • Designed and implemented a data platform using MySQL as an OLTP database, and another using MongoDB.

PostgreSQL Data Warehouse

Data analytics and ibm cognos dashboards.

  • Loaded the data into IBM Cognos Analytics and created dashboards.

ETL & Data Pipeline (Airflow, Python and Bash)

  • Automated the process of loading data from MySQL to a PostgreSQL data warehouse.
  • Used Airflow to create a pipeline that analyzes web server logs, extracts the required lines and fields, transforms and loads the data.

Big Data Analytics with PySpark

  • Used PySpark and data from a webserver to analyze search terms, and loaded a pretrained sales forecasting model to predict the forecast for a future year based on given sales data.

Below is a summary of some of the tasks I performed and some of the screenshots I took during the project.

In the first section of the project, I created a table on MySQL for sales data. And then I inserted sales data from a sales_data.sql into the table. I also queried the table, performed operations and exported the data.

data engineering capstone project github

I performed a similar operation with another database in MongoDB. I imported a file into it, performed queries, created an index to improve query performance and exported the database.

data engineering capstone project github

I also designed and created a star schema for a database which was supposed to hold ecommerce data on PostgreSQL. Then I performed several queries on the database, from simple select queries to groupingsets, cubes, rollups and created a materialized view.

data engineering capstone project github

I imported a dataset into IBM Cognos Dashboards and created dashboards such as a bar graph to show mobile phone sales in each quarter, a line graph to show sales for each month of 2022, and a pie chart to show sales for three product categories.

data engineering capstone project github

I automated the process of retrieving the latest records from a MySQL table and inserting them into a PostgreSQL data warehouse. Below are the Python functions that fetch the records, insert them and the output I got after executing the script.

data engineering capstone project github

I used Airflow to create a data pipeline that extracts specific IP addresses from a access log file and loads them into a destination file.

data engineering capstone project github

I used PySpark to load a sales prediction model, apply it to a sales data set, and predict the sales for the year 2023.

data engineering capstone project github

Instantly share code, notes, and snippets.

@RithikaJ

RithikaJ / M3ExploratoryDataAnalysis-lab.ipynb

  • Download ZIP
  • Star ( 0 ) 0 You must be signed in to star a gist
  • Fork ( 0 ) 0 You must be signed in to fork a gist
  • Embed Embed this gist in your website.
  • Share Copy sharable link for this gist.
  • Clone via HTTPS Clone using the web URL.
  • Learn more about clone URLs
  • Save RithikaJ/196ba9dc228f90d2c93b6593f046ee9e to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
" <img src=\"https://gitlab.com/ibm/skills-network/courses/placeholder101/-/raw/master/labs/module%201/images/IDSNlogo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n",
"</center>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# **Exploratory Data Analysis Lab**\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Estimated time needed: **30** minutes\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this module you get to work with the cleaned dataset from the previous module.\n",
"\n",
"In this assignment you will perform the task of exploratory data analysis.\n",
"You will find out the distribution of data, presence of outliers and also determine the correlation between different columns in the dataset.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Objectives\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this lab you will perform the following:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Identify the distribution of data in the dataset.\n",
"\n",
"* Identify outliers in the dataset.\n",
"\n",
"* Remove outliers from the dataset.\n",
"\n",
"* Identify correlation between features in the dataset.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hands on Lab\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import the pandas module.\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the dataset into a dataframe.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/LargeData/m2_survey_data.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Respondent</th>\n",
" <th>MainBranch</th>\n",
" <th>Hobbyist</th>\n",
" <th>OpenSourcer</th>\n",
" <th>OpenSource</th>\n",
" <th>Employment</th>\n",
" <th>Country</th>\n",
" <th>Student</th>\n",
" <th>EdLevel</th>\n",
" <th>UndergradMajor</th>\n",
" <th>...</th>\n",
" <th>WelcomeChange</th>\n",
" <th>SONewContent</th>\n",
" <th>Age</th>\n",
" <th>Gender</th>\n",
" <th>Trans</th>\n",
" <th>Sexuality</th>\n",
" <th>Ethnicity</th>\n",
" <th>Dependents</th>\n",
" <th>SurveyLength</th>\n",
" <th>SurveyEase</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>4</td>\n",
" <td>I am a developer by profession</td>\n",
" <td>No</td>\n",
" <td>Never</td>\n",
" <td>The quality of OSS and closed source software ...</td>\n",
" <td>Employed full-time</td>\n",
" <td>United States</td>\n",
" <td>No</td>\n",
" <td>Bachelor’s degree (BA, BS, B.Eng., etc.)</td>\n",
" <td>Computer science, computer engineering, or sof...</td>\n",
" <td>...</td>\n",
" <td>Just as welcome now as I felt last year</td>\n",
" <td>Tech articles written by other developers;Indu...</td>\n",
" <td>22.0</td>\n",
" <td>Man</td>\n",
" <td>No</td>\n",
" <td>Straight / Heterosexual</td>\n",
" <td>White or of European descent</td>\n",
" <td>No</td>\n",
" <td>Appropriate in length</td>\n",
" <td>Easy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>9</td>\n",
" <td>I am a developer by profession</td>\n",
" <td>Yes</td>\n",
" <td>Once a month or more often</td>\n",
" <td>The quality of OSS and closed source software ...</td>\n",
" <td>Employed full-time</td>\n",
" <td>New Zealand</td>\n",
" <td>No</td>\n",
" <td>Some college/university study without earning ...</td>\n",
" <td>Computer science, computer engineering, or sof...</td>\n",
" <td>...</td>\n",
" <td>Just as welcome now as I felt last year</td>\n",
" <td>NaN</td>\n",
" <td>23.0</td>\n",
" <td>Man</td>\n",
" <td>No</td>\n",
" <td>Bisexual</td>\n",
" <td>White or of European descent</td>\n",
" <td>No</td>\n",
" <td>Appropriate in length</td>\n",
" <td>Neither easy nor difficult</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>13</td>\n",
" <td>I am a developer by profession</td>\n",
" <td>Yes</td>\n",
" <td>Less than once a month but more than once per ...</td>\n",
" <td>OSS is, on average, of HIGHER quality than pro...</td>\n",
" <td>Employed full-time</td>\n",
" <td>United States</td>\n",
" <td>No</td>\n",
" <td>Master’s degree (MA, MS, M.Eng., MBA, etc.)</td>\n",
" <td>Computer science, computer engineering, or sof...</td>\n",
" <td>...</td>\n",
" <td>Somewhat more welcome now than last year</td>\n",
" <td>Tech articles written by other developers;Cour...</td>\n",
" <td>28.0</td>\n",
" <td>Man</td>\n",
" <td>No</td>\n",
" <td>Straight / Heterosexual</td>\n",
" <td>White or of European descent</td>\n",
" <td>Yes</td>\n",
" <td>Appropriate in length</td>\n",
" <td>Easy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>16</td>\n",
" <td>I am a developer by profession</td>\n",
" <td>Yes</td>\n",
" <td>Never</td>\n",
" <td>The quality of OSS and closed source software ...</td>\n",
" <td>Employed full-time</td>\n",
" <td>United Kingdom</td>\n",
" <td>No</td>\n",
" <td>Master’s degree (MA, MS, M.Eng., MBA, etc.)</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>Just as welcome now as I felt last year</td>\n",
" <td>Tech articles written by other developers;Indu...</td>\n",
" <td>26.0</td>\n",
" <td>Man</td>\n",
" <td>No</td>\n",
" <td>Straight / Heterosexual</td>\n",
" <td>White or of European descent</td>\n",
" <td>No</td>\n",
" <td>Appropriate in length</td>\n",
" <td>Neither easy nor difficult</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>17</td>\n",
" <td>I am a developer by profession</td>\n",
" <td>Yes</td>\n",
" <td>Less than once a month but more than once per ...</td>\n",
" <td>The quality of OSS and closed source software ...</td>\n",
" <td>Employed full-time</td>\n",
" <td>Australia</td>\n",
" <td>No</td>\n",
" <td>Bachelor’s degree (BA, BS, B.Eng., etc.)</td>\n",
" <td>Computer science, computer engineering, or sof...</td>\n",
" <td>...</td>\n",
" <td>Just as welcome now as I felt last year</td>\n",
" <td>Tech articles written by other developers;Indu...</td>\n",
" <td>29.0</td>\n",
" <td>Man</td>\n",
" <td>No</td>\n",
" <td>Straight / Heterosexual</td>\n",
" <td>Hispanic or Latino/Latina;Multiracial</td>\n",
" <td>No</td>\n",
" <td>Appropriate in length</td>\n",
" <td>Easy</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 85 columns</p>\n",
"</div>"
],
"text/plain": [
" Respondent MainBranch Hobbyist \\\n",
"0 4 I am a developer by profession No \n",
"1 9 I am a developer by profession Yes \n",
"2 13 I am a developer by profession Yes \n",
"3 16 I am a developer by profession Yes \n",
"4 17 I am a developer by profession Yes \n",
"\n",
" OpenSourcer \\\n",
"0 Never \n",
"1 Once a month or more often \n",
"2 Less than once a month but more than once per ... \n",
"3 Never \n",
"4 Less than once a month but more than once per ... \n",
"\n",
" OpenSource Employment \\\n",
"0 The quality of OSS and closed source software ... Employed full-time \n",
"1 The quality of OSS and closed source software ... Employed full-time \n",
"2 OSS is, on average, of HIGHER quality than pro... Employed full-time \n",
"3 The quality of OSS and closed source software ... Employed full-time \n",
"4 The quality of OSS and closed source software ... Employed full-time \n",
"\n",
" Country Student EdLevel \\\n",
"0 United States No Bachelor’s degree (BA, BS, B.Eng., etc.) \n",
"1 New Zealand No Some college/university study without earning ... \n",
"2 United States No Master’s degree (MA, MS, M.Eng., MBA, etc.) \n",
"3 United Kingdom No Master’s degree (MA, MS, M.Eng., MBA, etc.) \n",
"4 Australia No Bachelor’s degree (BA, BS, B.Eng., etc.) \n",
"\n",
" UndergradMajor ... \\\n",
"0 Computer science, computer engineering, or sof... ... \n",
"1 Computer science, computer engineering, or sof... ... \n",
"2 Computer science, computer engineering, or sof... ... \n",
"3 NaN ... \n",
"4 Computer science, computer engineering, or sof... ... \n",
"\n",
" WelcomeChange \\\n",
"0 Just as welcome now as I felt last year \n",
"1 Just as welcome now as I felt last year \n",
"2 Somewhat more welcome now than last year \n",
"3 Just as welcome now as I felt last year \n",
"4 Just as welcome now as I felt last year \n",
"\n",
" SONewContent Age Gender Trans \\\n",
"0 Tech articles written by other developers;Indu... 22.0 Man No \n",
"1 NaN 23.0 Man No \n",
"2 Tech articles written by other developers;Cour... 28.0 Man No \n",
"3 Tech articles written by other developers;Indu... 26.0 Man No \n",
"4 Tech articles written by other developers;Indu... 29.0 Man No \n",
"\n",
" Sexuality Ethnicity Dependents \\\n",
"0 Straight / Heterosexual White or of European descent No \n",
"1 Bisexual White or of European descent No \n",
"2 Straight / Heterosexual White or of European descent Yes \n",
"3 Straight / Heterosexual White or of European descent No \n",
"4 Straight / Heterosexual Hispanic or Latino/Latina;Multiracial No \n",
"\n",
" SurveyLength SurveyEase \n",
"0 Appropriate in length Easy \n",
"1 Appropriate in length Neither easy nor difficult \n",
"2 Appropriate in length Easy \n",
"3 Appropriate in length Neither easy nor difficult \n",
"4 Appropriate in length Easy \n",
"\n",
"[5 rows x 85 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Distribution\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Determine how the data is distributed\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The column `ConvertedComp` contains Salary converted to annual USD salaries using the exchange rate on 2019-02-01.\n",
"\n",
"This assumes 12 working months and 50 working weeks.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plot the distribution curve for the column `ConvertedComp`.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# your code goes here\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\jayar\\anaconda3\\lib\\site-packages\\seaborn\\distributions.py:2551: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).\n",
" warnings.warn(msg, FutureWarning)\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot1 = sns.distplot(df[\"ConvertedComp\"],hist = False, kde = True, color = 'r')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plot the histogram for the column `ConvertedComp`.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:ylabel='Frequency'>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEGCAYAAABPdROvAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVVUlEQVR4nO3df/BddX3n8efLBBFQKjSBpgkY7EQtMLpCZCl0XZV2TLUa3CmaHVsyDmu2lG1xd+sKzo70H2bYnV2rzBbarLokVqHRKmRbURFt7VYgBkQhIEsqFL5NFlL8AbgOGPa9f9xP8PrN98f95nzvvXyT52Pmzj33c87nnPe9fMjre37cc1NVSJJ0oJ437gIkSQubQSJJ6sQgkSR1YpBIkjoxSCRJnRgkkqROFg9rxUk+Bvw68GhVndrajgX+DFgJPAi8vaq+1+ZdClwAPAP8XlV9obWfDlwDHAF8Dri4qirJ4cBm4HTgMeAdVfXgbHUtWbKkVq5cOV9vU5IOCbfffvs/VtXSqeZlWN8jSfJa4Elgc1+Q/Gfgu1V1RZJLgGOq6n1JTgauBc4Afh74EvCyqnomyTbgYuBWekFyZVXdmOR3gFdW1W8nWQe8rareMVtdq1evru3btw/hHUvSwSvJ7VW1eqp5Qzu0VVVfBb47qXktsKlNbwLO7Wu/rqqeqqoHgJ3AGUmWAUdX1S3VS7zNk/rsW9engXOSZBjvRZI0vVGfIzm+qnYDtOfjWvty4OG+5SZa2/I2Pbn9p/pU1V7gB8DPTrXRJBuSbE+yfc+ePfP0ViRJ8Nw52T7VnkTN0D5Tn/0bqzZW1eqqWr106ZSH+CRJB2jUQfJIO1xFe360tU8AJ/QttwLY1dpXTNH+U32SLAZ+hv0PpUmShmzUQbIVWN+m1wM39LWvS3J4kpOAVcC2dvjriSRntvMf50/qs29dvwF8ubwDpSSN3DAv/70WeB2wJMkEcBlwBbAlyQXAQ8B5AFW1I8kW4B5gL3BRVT3TVnUhP7n898b2APgo8PEkO+ntiawb1nuRJE1vaJf/Pld5+a8kzd1YLv+VJB0aDBJJUidDO0dyMFp5yV+ObdsPXvHmsW1bkmbiHokkqRODRJLUiUEiSerEIJEkdWKQSJI6MUgkSZ0YJJKkTgwSSVInBokkqRODRJLUiUEiSerEIJEkdWKQSJI6MUgkSZ0YJJKkTgwSSVInBokkqRODRJLUiUEiSerEIJEkdWKQSJI6MUgkSZ0YJJKkTgwSSVInBokkqRODRJLUiUEiSerEIJEkdWKQSJI6MUgkSZ0YJJKkTsYSJEn+bZIdSe5Ocm2SFyQ5NslNSe5vz8f0LX9pkp1J7kvyxr7205Pc1eZdmSTjeD+SdCgbeZAkWQ78HrC6qk4FFgHrgEuAm6tqFXBze02Sk9v8U4A1wFVJFrXVXQ1sAFa1x5oRvhVJEuM7tLUYOCLJYuBIYBewFtjU5m8Czm3Ta4HrquqpqnoA2AmckWQZcHRV3VJVBWzu6yNJGpGRB0lV/QPwX4CHgN3AD6rqi8DxVbW7LbMbOK51WQ483LeKida2vE1Pbt9Pkg1JtifZvmfPnvl8O5J0yBvHoa1j6O1lnAT8PHBUkt+cqcsUbTVD+/6NVRuranVVrV66dOlcS5YkzWAch7Z+BXigqvZU1Y+BzwBnAY+0w1W050fb8hPACX39V9A7FDbRpie3S5JGaBxB8hBwZpIj21VW5wD3AluB9W2Z9cANbXorsC7J4UlOondSfVs7/PVEkjPbes7v6yNJGpHFo95gVd2W5NPAHcBe4BvARuCFwJYkF9ALm/Pa8juSbAHuactfVFXPtNVdCFwDHAHc2B6SpBEaeZAAVNVlwGWTmp+it3cy1fKXA5dP0b4dOHXeC5QkDcxvtkuSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdDBQkSU6dz40meXGSTyf5dpJ7k/xSkmOT3JTk/vZ8TN/ylybZmeS+JG/saz89yV1t3pVJMp91SpJmN+geyR8n2Zbkd5K8eB62+2Hg81X1CuBVwL3AJcDNVbUKuLm9JsnJwDrgFGANcFWSRW09VwMbgFXtsWYeapMkzcFAQVJVvwy8EzgB2J7kk0l+9UA2mORo4LXAR9u6n66q7wNrgU1tsU3AuW16LXBdVT1VVQ8AO4EzkiwDjq6qW6qqgM19fSRJIzLwOZKquh/4j8D7gH8OXNkOTf2LOW7zpcAe4H8k+UaSjyQ5Cji+qna3be0GjmvLLwce7us/0dqWt+nJ7ftJsiHJ9iTb9+zZM8dyJUkzGfQcySuT/CG9Q1BvAN5SVb/Ypv9wjttcDJwGXF1VrwZ+SDuMNd3mp2irGdr3b6zaWFWrq2r10qVL51iuJGkmg+6R/DfgDuBVVXVRVd0BUFW76O2lzMUEMFFVt7XXn6YXLI+0w1W050f7lj+hr/8KYFdrXzFFuyRphAYNkjcBn6yqHwEkeV6SIwGq6uNz2WBV/R/g4SQvb03nAPcAW4H1rW09cEOb3gqsS3J4kpPonVTf1g5/PZHkzHa11vl9fSRJI7J4wOW+BPwK8GR7fSTwReCsA9zu7wKfSPJ84DvAu+iF2pYkFwAPAecBVNWOJFvohc1e4KKqeqat50LgGuAI4Mb2kCSN0KBB8oKq2hciVNWT+/ZIDkRV3QmsnmLWOdMsfzlw+RTt24F5/Y6LJGluBj209cMkp+17keR04EfDKUmStJAMukfyHuBTSfadzF4GvGMoFUmSFpSBgqSqvp7kFcDL6V12++2q+vFQK5MkLQiD7pEAvAZY2fq8OglVtXkoVUmSFoyBgiTJx4FfAO4E9l0xte+2JJKkQ9igeySrgZPbPa0kSXrWoFdt3Q383DALkSQtTIPukSwB7kmyDXhqX2NVvXUoVUmSFoxBg+QPhlmEJGnhGvTy379O8hJgVVV9qX2rfdFs/SRJB79BbyP/bnp36f2T1rQcuH5INUmSFpBBT7ZfBJwNPA7P/sjVcTP2kCQdEgYNkqeq6ul9L5IsZpofkZIkHVoGDZK/TvJ+4Ij2W+2fAv7n8MqSJC0UgwbJJfR+Z/0u4F8Dn2Puv4woSToIDXrV1v8D/nt7SJL0rEHvtfUAU5wTqaqXzntFkqQFZS732trnBfR+BvfY+S9HkrTQDHSOpKoe63v8Q1V9CHjDcEuTJC0Egx7aOq3v5fPo7aG8aCgVSZIWlEEPbf3Xvum9wIPA2+e9GknSgjPoVVuvH3YhkqSFadBDW/9upvlV9cH5KUeStNDM5aqt1wBb2+u3AF8FHh5GUZKkhWMuP2x1WlU9AZDkD4BPVdW/GlZhkqSFYdBbpJwIPN33+mlg5bxXI0lacAbdI/k4sC3JZ+l9w/1twOahVSVJWjAGvWrr8iQ3Av+sNb2rqr4xvLIkSQvFoIe2AI4EHq+qDwMTSU4aUk2SpAVk0J/avQx4H3BpazoM+NNhFSVJWjgG3SN5G/BW4IcAVbULb5EiSWLwIHm6qop2K/kkRw2vJEnSQjJokGxJ8ifAi5O8G/gS/siVJIkBrtpKEuDPgFcAjwMvBz5QVTcNuTZJ0gIw6x5JO6R1fVXdVFXvrarfn48QSbIoyTeS/EV7fWySm5Lc356P6Vv20iQ7k9yX5I197acnuavNu7KFniRphAY9tHVrktfM87YvBu7te30JcHNVrQJubq9JcjKwDjgFWANclWRR63M1sAFY1R5r5rlGSdIsBg2S19MLk79L8q22F/CtA91okhXAm4GP9DWvBTa16U3AuX3t11XVU1X1ALATOCPJMuDoqrql7TVt7usjSRqRGc+RJDmxqh4Cfm2et/sh4D/w05cQH19VuwGqaneS41r7cuDWvuUmWtuP2/Tk9v0k2UBvz4UTTzxxHsqXJO0z2x7J9QBV9ffAB6vq7/sfB7LBJL8OPFpVtw/aZYq2mqF9/8aqjVW1uqpWL126dMDNSpIGMdtVW/3/WL90nrZ5NvDWJG8CXgAcneRPgUeSLGt7I8uAR9vyE8AJff1XALta+4op2iVJIzTbHklNM33AqurSqlpRVSvpnUT/clX9Jr0fzVrfFlsP3NCmtwLrkhze7u+1CtjWDoM9keTMdrXW+X19JEkjMtseyauSPE5vz+SINk17XVV19DzWcgW9Lz5eADwEnEdvIzuSbAHuAfYCF1XVM63PhcA1wBHAje0hSRqhGYOkqhbNNL+rqvor4K/a9GPAOdMsdzlw+RTt24FTh1ehJGk2c7mNvCRJ+zFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdjDxIkpyQ5CtJ7k2yI8nFrf3YJDclub89H9PX59IkO5Pcl+SNfe2nJ7mrzbsySUb9fiTpUDeOPZK9wL+vql8EzgQuSnIycAlwc1WtAm5ur2nz1gGnAGuAq5Isauu6GtgArGqPNaN8I5KkMQRJVe2uqjva9BPAvcByYC2wqS22CTi3Ta8Frquqp6rqAWAncEaSZcDRVXVLVRWwua+PJGlExnqOJMlK4NXAbcDxVbUbemEDHNcWWw483NdtorUtb9OT26fazoYk25Ns37Nnz7y+B0k61I0tSJK8EPhz4D1V9fhMi07RVjO0799YtbGqVlfV6qVLl869WEnStMYSJEkOoxcin6iqz7TmR9rhKtrzo619Ajihr/sKYFdrXzFFuyRphMZx1VaAjwL3VtUH+2ZtBda36fXADX3t65IcnuQkeifVt7XDX08kObOt8/y+PpKkEVk8hm2eDfwWcFeSO1vb+4ErgC1JLgAeAs4DqKodSbYA99C74uuiqnqm9bsQuAY4ArixPSRJIzTyIKmq/8XU5zcAzpmmz+XA5VO0bwdOnb/qJElz5TfbJUmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOlk87gI0mJWX/OVYtvvgFW8ey3YlLRzukUiSOlnweyRJ1gAfBhYBH6mqK8Zc0kFlXHtC4zSuvbBxftbueY7OwfjfeUEHSZJFwB8BvwpMAF9PsrWq7hlvZVrIDsXwlLpY0EECnAHsrKrvACS5DlgLGCTSHBie6mKhB8ly4OG+1xPAP528UJINwIb28skk9x3g9pYA/3iAfYfJuubGuubuuVqbdc1B/lOnul4y3YyFHiSZoq32a6jaCGzsvLFke1Wt7rqe+WZdc2Ndc/dcrc265mZYdS30q7YmgBP6Xq8Ado2pFkk6JC30IPk6sCrJSUmeD6wDto65Jkk6pCzoQ1tVtTfJvwG+QO/y349V1Y4hbrLz4bEhsa65sa65e67WZl1zM5S6UrXfKQVJkga20A9tSZLGzCCRJHVikDRJ1iS5L8nOJJdMMT9Jrmzzv5XktEH7Drmud7Z6vpXka0le1TfvwSR3JbkzyfYR1/W6JD9o274zyQcG7Tvkut7bV9PdSZ5JcmybN5TPK8nHkjya5O5p5o9rbM1W11jG1oC1jWt8zVbXOMbXCUm+kuTeJDuSXDzFMsMdY1V1yD/onaj/O+ClwPOBbwInT1rmTcCN9L67ciZw26B9h1zXWcAxbfrX9tXVXj8ILBnT5/U64C8OpO8w65q0/FuAL4/g83otcBpw9zTzRz62Bqxr5GNrDrWNfHwNUteYxtcy4LQ2/SLgf4/63y/3SHqevdVKVT0N7LvVSr+1wObquRV4cZJlA/YdWl1V9bWq+l57eSu979IMW5f3PNbPa5J/CVw7T9ueVlV9FfjuDIuMY2zNWteYxta+bc/2mU1nrJ/ZJKMaX7ur6o42/QRwL727fvQb6hgzSHqmutXK5P8Q0y0zSN9h1tXvAnp/dexTwBeT3J7ebWLmy6B1/VKSbya5Mckpc+w7zLpIciSwBvjzvuZhfV6zGcfYmqtRja25GPX4Gti4xleSlcCrgdsmzRrqGFvQ3yOZR4PcamW6ZQa6TcsBGnjdSV5P73/2X+5rPruqdiU5DrgpybfbX1SjqOsO4CVV9WSSNwHXA6sG7DvMuvZ5C/C3VdX/1+WwPq/ZjGNsDWzEY2tQ4xhfczHy8ZXkhfSC6z1V9fjk2VN0mbcx5h5JzyC3WplumWHepmWgdSd5JfARYG1VPbavvap2tedHgc/S240dSV1V9XhVPdmmPwcclmTJIH2HWVefdUw67DDEz2s24xhbAxnD2BrImMbXXIx0fCU5jF6IfKKqPjPFIsMdY/N94mchPujtmX0HOImfnHA6ZdIyb+anT1ZtG7TvkOs6EdgJnDWp/SjgRX3TXwPWjLCun+MnX3g9A3iofXZj/bzacj9D7zj3UaP4vNo6VzL9ieORj60B6xr52JpDbSMfX4PUNY7x1d73ZuBDMywz1DHmoS2mv9VKkt9u8/8Y+By9Kx92Av8XeNdMfUdY1weAnwWuSgKwt3p39zwe+GxrWwx8sqo+P8K6fgO4MMle4EfAuuqN3HF/XgBvA75YVT/s6z60zyvJtfSuMlqSZAK4DDisr6aRj60B6xr52JpDbSMfXwPWBSMeX8DZwG8BdyW5s7W9n94fAiMZY94iRZLUiedIJEmdGCSSpE4MEklSJwaJJKkTg0SSDmKz3WhyiuXfnuSedgPITw7Ux6u2JOngleS1wJP07rV16izLrgK2AG+oqu8lOa56X6CckXskknQQqyluNJnkF5J8vt3362+SvKLNejfwR9Vu1jlIiIBBIkmHoo3A71bV6cDvA1e19pcBL0vyt0luTbJmkJX5zXZJOoS0mzueBXyqfdMe4PD2vJjezS9fR+++W3+T5NSq+v5M6zRIJOnQ8jzg+1X1T6aYNwHcWlU/Bh5Ich+9YPn6bCuUJB0iqneL+QeSnAfP/gzvvp9Rvh54fWtfQu9Q13dmW6dBIkkHsXajyVuAlyeZSHIB8E7ggiTfBHbwk19F/ALwWJJ7gK8A762+nw+Ydhte/itJ6sI9EklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmd/H/KSZBPK7V7tAAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# your code goes here\n",
"df['ConvertedComp'].plot.hist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What is the median of the column `ConvertedComp`?\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"57745.0"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# your code goes here\n",
"df['ConvertedComp'].median()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"29.0"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"###### quiz ######\n",
"\n",
"df['Age'].median()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How many responders identified themselves only as a **Man**?\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Man 10480\n",
"Woman 731\n",
"Non-binary, genderqueer, or gender non-conforming 63\n",
"Man;Non-binary, genderqueer, or gender non-conforming 26\n",
"Woman;Non-binary, genderqueer, or gender non-conforming 14\n",
"Woman;Man 9\n",
"Woman;Man;Non-binary, genderqueer, or gender non-conforming 2\n",
"Name: Gender, dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# your code goes here\n",
"df['Gender'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Find out the median ConvertedComp of responders identified themselves only as a **Woman**?\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"57708.0"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# your code goes here\n",
"df[df['Gender'].eq('Woman')]['ConvertedComp'].median()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Give the five number summary for the column `Age`?\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Double click here for hint**.\n",
"\n",
"<!--\n",
"min,q1,median,q3,max of a column are its five number summary.\n",
"-->\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 11111.000000\n",
"mean 30.778895\n",
"std 7.393686\n",
"min 16.000000\n",
"25% 25.000000\n",
"50% 29.000000\n",
"75% 35.000000\n",
"max 99.000000\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# your code goes here\n",
"df['Age'].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plot a histogram of the column `Age`.\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:ylabel='Frequency'>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAD4CAYAAAAdIcpQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAS/0lEQVR4nO3dfbBc9X3f8ffHEgFhhzEUQVRJrnBHYxsYPyFTtXZbG5KiFsci7ZAoUxdNSqKW0ondppMIN1Mnf2iGzqSOzbSQKo6LsBNTOX5AtUMSotRxO0MRF9stz4PGKKBIRYrTFOJ6IOBv/9ifwnK59/5WQnt3xX2/Znb2nO+es+er30h8OA97TqoKSZIW8ppJNyBJmn6GhSSpy7CQJHUZFpKkLsNCktS1fNINjMu5555b69atm3QbknRKue+++/64qlbOrr9qw2LdunXMzMxMug1JOqUk+cO56h6GkiR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdb1qf8F9Klq3/SsT2/aBG6+c2LYlTT/3LCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV1jDYskB5Lcn+SbSWZa7ZwkdyV5rL2fPbT8DUn2J3k0yRVD9Uva9+xPclOSjLNvSdJLLcaexfuq6u1VtaHNbwf2VtV6YG+bJ8mFwBbgImATcHOSZW2dW4BtwPr22rQIfUuSmkkchtoM7GrTu4Crhuq3V9WzVfU4sB+4NMkq4KyquruqCrhtaB1J0iIYd1gU8LtJ7kuyrdXOr6rDAO39vFZfDTw5tO7BVlvdpmfXXybJtiQzSWaOHj16Ev8YkrS0jfuus++uqkNJzgPuSvLIAsvOdR6iFqi/vFi1E9gJsGHDhjmXkSQdv7HuWVTVofZ+BPgicCnwVDu0RHs/0hY/CKwdWn0NcKjV18xRlyQtkrGFRZLXJvn+Y9PA3wEeAPYAW9tiW4E72vQeYEuS05NcwOBE9r52qOqZJBvbVVDXDK0jSVoE4zwMdT7wxXaV63LgN6rqt5PcC+xOci3wBHA1QFU9mGQ38BDwPHB9Vb3Qvus64FZgBXBne0mSFsnYwqKqvgW8bY76t4HL51lnB7BjjvoMcPHJ7lGSNBp/wS1J6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpK6xh0WSZUm+keTLbf6cJHcleay9nz207A1J9id5NMkVQ/VLktzfPrspScbdtyTpRYuxZ/Eh4OGh+e3A3qpaD+xt8yS5ENgCXARsAm5OsqytcwuwDVjfXpsWoW9JUjPWsEiyBrgS+ORQeTOwq03vAq4aqt9eVc9W1ePAfuDSJKuAs6rq7qoq4LahdSRJi2DcexYfB34W+N5Q7fyqOgzQ3s9r9dXAk0PLHWy11W16dl2StEjGFhZJ3g8cqar7Rl1ljlotUJ9rm9uSzCSZOXr06IiblST1jHPP4t3AB5IcAG4HLkvyGeCpdmiJ9n6kLX8QWDu0/hrgUKuvmaP+MlW1s6o2VNWGlStXnsw/iyQtaWMLi6q6oarWVNU6Bieuf7+qPgjsAba2xbYCd7TpPcCWJKcnuYDBiex97VDVM0k2tqugrhlaR5K0CJZPYJs3AruTXAs8AVwNUFUPJtkNPAQ8D1xfVS+0da4DbgVWAHe2lyRpkSxKWFTVV4GvtulvA5fPs9wOYMcc9Rng4vF1KElaiL/gliR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkrpHCIomPNJWkJWzUPYtfSbIvyT9L8vpxNiRJmj4jhUVVvQf4h8BaYCbJbyT5obF2JkmaGiOfs6iqx4CfB34O+NvATUkeSfL3x9WcJGk6jHrO4q1Jfhl4GLgM+OGqekub/uUx9idJmgLLR1zu3wO/Cnykqr57rFhVh5L8/Fg6kyRNjVHD4u8B362qFwCSvAY4o6r+X1V9emzdSZKmwqjnLH4PWDE0f2arSZKWgFHD4oyq+rNjM236zPG0JEmaNqOGxXeSvPPYTJJLgO8usLwk6VVk1HMWHwY+l+RQm18F/NhYOpIkTZ2RwqKq7k3yZuBNQIBHqurPx9qZJGlqHM+NBN8FvBV4B/DjSa5ZaOEkZ7RbhPzPJA8m+cVWPyfJXUkea+9nD61zQ5L9SR5NcsVQ/ZIk97fPbkqS4/tjSpJeiVF/lPdp4JeA9zAIjXcBGzqrPQtcVlVvA94ObEqyEdgO7K2q9cDeNk+SC4EtwEXAJuDmJMvad90CbAPWt9emEf98kqSTYNRzFhuAC6uqRv3ituyxK6hOa68CNgPvbfVdwFcZ3EJkM3B7VT0LPJ5kP3BpkgPAWVV1N0CS24CrgDtH7UWS9MqMehjqAeAHjvfLkyxL8k3gCHBXVd0DnF9VhwHa+3lt8dXAk0OrH2y11W16dn2u7W1LMpNk5ujRo8fbriRpHqPuWZwLPJRkH4PDSwBU1QcWWqn94vvt7bbmX+w8F2Ou8xC1QH2u7e0EdgJs2LBh5L0gSdLCRg2LX3glG6mqP03yVQbnGp5KsqqqDidZxWCvAwZ7DGuHVlsDHGr1NXPUJUmLZNTnWfwBcAA4rU3fC3x9oXWSrDz2oKQkK4AfBB4B9gBb22JbgTva9B5gS5LTk1zA4ET2vnao6pkkG9tVUNcMrSNJWgQj7Vkk+SkGVyOdA/xVBucMfgW4fIHVVgG72hVNrwF2V9WXk9wN7E5yLfAEcDVAVT2YZDfwEPA8cP2xGxcC1wG3Mrg/1Z14cluSFtWoh6GuBy4F7oHBg5CSnLfQClX1vxj8JmN2/dvMEzJVtQPYMUd9BvA54JI0IaNeDfVsVT13bCbJcuY5ySxJevUZNSz+IMlHgBXt2dufA/7L+NqSJE2TUcNiO3AUuB/4J8BvMXgetyRpCRj1RoLfY/BY1V8dbzuSpGk06tVQjzPHOYqqeuNJ70iSNHWO595Qx5zB4HLXc05+O5KkaTTqj/K+PfT6o6r6OHDZeFuTJE2LUQ9DvXNo9jUM9jS+fywdSZKmzqiHof7d0PTzDG798aMnvRtJ0lQa9Wqo9427EUnS9Br1MNS/XOjzqvrYyWlHkjSNjudqqHcxuDMswA8DX+OlDyuSJL1KHc/Dj95ZVc8AJPkF4HNV9ZPjakyLa932r0xkuwduvHIi25V0fEa93ccbgOeG5p8D1p30biRJU2nUPYtPA/uSfJHBL7l/BLhtbF1JkqbKqFdD7UhyJ/A3W+knquob42tLkjRNRj0MBXAm8HRVfQI42B59KklaAkYKiyQfBX4OuKGVTgM+M66mJEnTZdQ9ix8BPgB8B6CqDuHtPiRpyRg1LJ6rqqLdpjzJa8fXkiRp2owaFruT/Efg9Ul+Cvg9fBCSJC0Z3auhkgT4z8CbgaeBNwH/pqruGnNvkqQp0Q2LqqokX6qqSwADQpKWoFEPQ/2PJO8aayeSpKk16i+43wf80yQHGFwRFQY7HW8dV2OSpOmxYFgkeUNVPQH83UXqR5I0hXp7Fl9icLfZP0zy+ar6B4vQkyRpyvTOWWRo+o3jbESSNL16YVHzTEuSlpDeYai3JXmawR7GijYNL57gPmus3UmSpsKCYVFVy070i5OsZfDMix8AvgfsrKpPJDmHwY/81gEHgB+tqv/T1rkBuBZ4AfjpqvqdVr8EuBVYAfwW8KF2+5GxmNRT4yRpWh3PLcqP1/PAz1TVW4CNwPVJLgS2A3uraj2wt83TPtsCXARsAm5OciysbgG2Aevba9MY+5YkzTK2sKiqw1X19Tb9DPAwsBrYDOxqi+0CrmrTm4Hbq+rZqnoc2A9cmmQVcFZV3d32Jm4bWkeStAjGuWfxF5KsA94B3AOcX1WHYRAowHltsdXAk0OrHWy11W16dn2u7WxLMpNk5ujRoyf1zyBJS9nYwyLJ64DPAx+uqqcXWnSOWi1Qf3mxamdVbaiqDStXrjz+ZiVJcxprWCQ5jUFQ/HpVfaGVn2qHlmjvR1r9ILB2aPU1wKFWXzNHXZK0SMYWFu3W5r8GPFxVHxv6aA+wtU1vBe4Yqm9Jcnp7vvd6YF87VPVMko3tO68ZWkeStAhGvZHgiXg38I+A+5N8s9U+AtzI4GFK1wJPAFcDVNWDSXYDDzG4kur6qnqhrXcdL146e2d7SZIWydjCoqr+O3OfbwC4fJ51dgA75qjPABefvO4kScdjUa6GkiSd2gwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1jS0sknwqyZEkDwzVzklyV5LH2vvZQ5/dkGR/kkeTXDFUvyTJ/e2zm5JkXD1LkuY2zj2LW4FNs2rbgb1VtR7Y2+ZJciGwBbiorXNzkmVtnVuAbcD69pr9nZKkMRtbWFTV14A/mVXeDOxq07uAq4bqt1fVs1X1OLAfuDTJKuCsqrq7qgq4bWgdSdIiWexzFudX1WGA9n5eq68Gnhxa7mCrrW7Ts+tzSrItyUySmaNHj57UxiVpKZuWE9xznYeoBepzqqqdVbWhqjasXLnypDUnSUvdYofFU+3QEu39SKsfBNYOLbcGONTqa+aoS5IW0WKHxR5ga5veCtwxVN+S5PQkFzA4kb2vHap6JsnGdhXUNUPrSJIWyfJxfXGSzwLvBc5NchD4KHAjsDvJtcATwNUAVfVgkt3AQ8DzwPVV9UL7qusYXFm1ArizvSRJi2hsYVFVPz7PR5fPs/wOYMcc9Rng4pPYmiTpOI0tLKRRrNv+lYlt+8CNV05s29KpZlquhpIkTTHDQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqWv5pBuQJmXd9q9MZLsHbrxyItuVXgn3LCRJXYaFJKnLsJAkdRkWkqQuw0KS1HXKhEWSTUkeTbI/yfZJ9yNJS8kpcelskmXAfwB+CDgI3JtkT1U9NNnOpOM3qUt2wct2deJOlT2LS4H9VfWtqnoOuB3YPOGeJGnJOCX2LIDVwJND8weBvzZ7oSTbgG1t9s+SPLoIvc12LvDHE9juqcQx6hvLGOXfnuxvnDj/LvUd7xj9lbmKp0pYZI5avaxQtRPYOf525pdkpqo2TLKHaecY9TlGo3Gc+k7WGJ0qh6EOAmuH5tcAhybUiyQtOadKWNwLrE9yQZLvA7YAeybckyQtGafEYaiqej7JPwd+B1gGfKqqHpxwW/OZ6GGwU4Rj1OcYjcZx6jspY5Sqlx36lyTpJU6Vw1CSpAkyLCRJXYbFCUqyNsl/TfJwkgeTfKjVz0lyV5LH2vvZk+510pIsS/KNJF9u847RLElen+Q3kzzS/k79dcfppZL8i/Zv7YEkn01yhmMEST6V5EiSB4Zq845LkhvabZMeTXLFqNsxLE7c88DPVNVbgI3A9UkuBLYDe6tqPbC3zS91HwIeHpp3jF7uE8BvV9WbgbcxGC/HqUmyGvhpYENVXczgQpctOEYAtwKbZtXmHJf236gtwEVtnZvb7ZS6DIsTVFWHq+rrbfoZBv+4VzO4Dcmuttgu4KqJNDglkqwBrgQ+OVR2jIYkOQv4W8CvAVTVc1X1pzhOsy0HViRZDpzJ4LdWS36MquprwJ/MKs83LpuB26vq2ap6HNjP4HZKXYbFSZBkHfAO4B7g/Ko6DINAAc6bYGvT4OPAzwLfG6o5Ri/1RuAo8J/a4bpPJnktjtNfqKo/An4JeAI4DPzfqvpdHKP5zDcuc906afUoX2hYvEJJXgd8HvhwVT096X6mSZL3A0eq6r5J9zLllgPvBG6pqncA32FpHk6ZVzvmvhm4APjLwGuTfHCyXZ2SRrp10lwMi1cgyWkMguLXq+oLrfxUklXt81XAkUn1NwXeDXwgyQEGdwq+LMlncIxmOwgcrKp72vxvMggPx+lFPwg8XlVHq+rPgS8AfwPHaD7zjcsJ3zrJsDhBScLgGPPDVfWxoY/2AFvb9FbgjsXubVpU1Q1Vtaaq1jE4qfb7VfVBHKOXqKr/DTyZ5E2tdDnwEI7TsCeAjUnObP/2LmdwntAxmtt847IH2JLk9CQXAOuBfaN8ob/gPkFJ3gP8N+B+Xjwe/xEG5y12A29g8Bf86qqaffJpyUnyXuBfVdX7k/wlHKOXSPJ2BhcBfB/wLeAnGPzPnOPUJPlF4McYXIn4DeAngdexxMcoyWeB9zK4FflTwEeBLzHPuCT518A/ZjCOH66qO0fajmEhSerxMJQkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSer6/93CfiKyB/OaAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# your code goes here\n",
"df['Age'].plot.hist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Outliers\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Finding outliers\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Find out if outliers exist in the column `ConvertedComp` using a box plot?\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='ConvertedComp'>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAEGCAYAAABbzE8LAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWoklEQVR4nO3dfXRV1ZnH8d+TBAgSRiE6wtBqhJCWONc6Qp1qu2ZkCl2AQDpVa12MxKlLFq2DCLXqaHyB5awu37CaOu1YV6tOGey0U0uhkBZabKettoWOqKi1EeMLvlRjiwbxJbDnj3Pu9dy35F5I7oPy/azFSu6+e5/znJN9f+yc5J5YCEEAgMqr8i4AAA5WBDAAOCGAAcAJAQwATghgAHBSU07nww8/PDQ0NAxSKQDw3rRly5aXQwhH5LaXFcANDQ3avHnzwFUFAAcBM3uqUDuXIADACQEMAE4IYABwQgADgBMCGACcEMAA4IQABgAnBDAAOCGAAcAJAQwATghgAHBCAAOAEwIYAJwQwADghAAGACcEMAA4IYABwAkBDABOCGAAcFLW34QbKO3t7ers7Mxq27FjhyRp3Lhxef0bGxu1aNGiitQGAJXiEsCdnZ164OFHteeQ0Zm26td3SpJeeDO7pOrXX6lobQBQKS4BLEl7Dhmt3R+clXk8/LF1kpTVlmwHgPcargEDgBMCGACcEMAA4IQABgAnBDAAOCGAAcAJAQwATghgAHBCAAOAEwIYAJwQwADghAAGACcEMAA4IYABwAkBDABOCGAAcEIAA4ATAhgAnBDAAOCEAAYAJwQwADghgAHACQEMAE4IYABwQgADgBMCGACcEMAA4IQABgAnBDAAOCGAAcAJAQwATghgAHBCAAOAEwIYAJwQwADghAAGACcEMAA4IYABwAkBDABOCGAAcEIAA4CTigRwe3u72tvbK7GrA3L/AFBITSV20tnZWYndHLD7B4BCuAQBAE4IYABwQgADgBMCGACcEMAA4IQABgAnBDAAOCGAAcAJAQwATghgAHBCAAOAEwIYAJwQwADghAAGACcEMAA4IYABwAkBDABOCGAAcEIAA4ATAhgAnBDAAOCEAAYAJwQwADghgAHACQEMAE4IYABwQgADgBMCGACcEMAA4IQABgAnBDAAOCGAAcAJAQwATghgAHBCAAOAEwIYAJwQwADghAAGACc13gVUwrZt29Tb26tTTjmlrHH19fXq7u7OPK6rq1NPT09Wn9GjR+uVV14puo2hQ4fqrbfeyjweM2aMXnzxRUlSCKHgftLGjh2r2tpaPf/88zrssMP0wgsvyMzU2NioP/zhD5owYYKeeOIJVVVVae/evRo5cqRee+21zPgRI0Zo165dRY8nlUpp+/btWr58uZYvX66dO3dmnhs/fry2b9+uUaNGaffu3aqvr9eOHTsy+0p/bG5u1iOPPJIZN2TIEFVVVenNN9+UJFVVVWnkyJHauXOnmpqa9Pjjj2f6LliwQHfddZdCCKqqqlJ7e7tGjRqlSy+9VE8//bT27t2rt99+W5I0e/ZsdXR0qLe3V9XV1Ro6dKg+//nP68Ybb8w6Z83NzQoh6JprrlF9fb0kqbOzU4sXL9by5ct15513KpVKaeXKlVq4cKGmT5+uSy65RM8995xuueUWjRo1SsuWLdMnP/lJXXPNNbriiit0zz336OMf/7huuukmnX322Vq5cqWuv/56TZ48Wddee63Wr1+vOXPmqKWlRYsXL9bNN9+sxsZGdXd3a9myZZo2bZpWrFihq666SlOnTi04Tz73uc/p0UcfVSqVUnt7uySpu7tbbW1t2rNnj6qrqzPH1N3drSuuuCLvOCXp61//ulauXKn58+frs5/9bKaGq666KtOvUFuu1atX66abbtIXvvAFzZkzJ9OeHPvkk0/q4osvzpyLQn3q6+uLbivZ94ILLtB1112np59+WkcddZS+9KUv5dWW3M7JJ5+cGXfjjTdq165deuaZZ3TllVfqnnvu6fPYSpE8htNOOy3Tfu+99+7zNguxdAiUYsqUKWHz5s1l72Tx4sWSpJtvvjnzeMv2F7X7g7MyfYY/tk6SstrS7ZPHH5kZuy/KDd6DTaH/WDw0NDTouOOO0w9+8IOS+puZis3flpYWLVmyRJJ0zjnnqKurS3V1ddq1a1fWmLlz52b2l97/mjVrVF1drd7eXtXU1GjPnj2SlDWurq5Oa9euzZpbDQ0N6urqUkNDg+644w6tWLFCa9asyYytqanRxo0bC9ab3E76Rb5ixYqsc5E+pmR78jgLbSddw9y5czP9CrXlmjp1qkIIMjNt2rQp054c+5Of/EQ9PT2Zc1Goz5IlS4puK9n36KOPVldXV96xFqtpzpw5Bcelv159HVspksewevXqTPu+BrCZbQkhTMltf89fgjjrrLO8SzjgHQjhK0ldXV364Q9/WHL/vhYP69evV3d3tzo7OzMv0J6enrwxyYDr6urS+vXrFUJQb2+vJKm3t1chhLxxPT09uuiii/LqT3/csmWLOjo6ssb29vbmBZAUrX6TFi1apO7ubq1fvz7vmDo7O9XR0ZF3nFK0+k269dZbMzV0dHSou7tb3d3deW25Vq9enak5hJD5TyQ5dt26dZl509PToy1btuT16ejo0MqVKwtuK7dvMkQlad26dVm15da0du3aguPSX69ix1aKZF3J8JUGfjFXkRXw6aefrt27d6uxsVFS9C3ha28F7Tr+M5k+xVbAIx64WyOHWmZsubZu3bpP4/DuZmaaO3eutm7dmvcirYS6ujq98cYbmSBPK7QKLvSinjt3rtasWZMV/Gamo48+Wk899VSmPX2cS5YsKbidmpqazEr+1FNPzYRnsq3YSjO5302bNmnFihWZsYWOd+3atVl90vtOSq6C+9qelL0Kzq2pP8WOrRT91bUvq+B9XgGb2QIz22xmm1966aWydwx4CCFow4YNLuErRavCQi/gYi/qXBs3bswLnPSKL9mePs5ikiv5DRs2aOPGjXltuQrtN11TsfrTq+Hc7fe17b62JymrtnLCN73vvs5LX/qrayD1+0O4EMJtkm6TohXwvuxk3LhxkvKvAZdib+1fqHE/rgFz/ffgZGaaPn36AbkCLsW0adNKXgFPnz696HaSK+Dp06fnrYALjc29tm5mmZr6WgHn9im2Ak4eY18rzWRtfV3vL3bcfZ2XvvRX10B6z18DHjt2rHcJKEN1dfWAbGfIkCGaP3++2trayh5XqilT8r6jzFi2bJmqqvJfXpdffnle26RJk7Iep1Iptba25oX1kCFD1NbWllVj+jglad68eVn9zzjjjEwN1dXVmj9/vlpbW/Pacl144YVZj5cuXSpJWWNzz9OyZcvy+lRXV+u8884ruK3cvrmSx1WopmLj0oodWyn6qmugvecDeNWqVd4lHPDSqxdvDQ0NOvXUU0vun1xN5Zo5c6bq6+vV2NiohoYGSdFx5o6ZO3du1v5nzpwpM8uEX01Njcwsb1xdXZ1uuOGGvPrTHydPnqwZM2Zkja2pqSn4a2hf/epXsx63t7ervr5eM2fOzDumxsZGzZgxI+84JeWF3fnnn5+pYcaMGaqvr1d9fX1eW66WlpZMzenfOJCUNXbWrFmZeVNXV5f5NbTc7c+bN6/gtnL7ps9d2qxZs7Jqy61p9uzZBcelv17Fjq0UybpaWlqynhvoX0N7zwewVPq3fblyv4CFgmr06NF9bmPo0KFZj8eMGZP3gi42UcaOHatjjjlGtbW1GjNmjKRo8k2cOFGSNGHCBEnvrAZGjhyZNX7EiBF9Hk8qldKIESO0bNkyHXrooVnPjR8/XpI0atQo1dbWZi4jpfeV/tjc3Jw1bsiQIRo2bFjmcVVVVWbbTU1NWX0XLFig2tpaDRs2TMOHD1dbW5taW1s1ceJEDRs2LGuVNXv27MzXsbq6WsOHD89aTaU1Nzdr0qRJWauftra2zHGmUqnMSnHhwoVqbW1VY2OjDjnkkMz+U6mULrvsMlVVVenyyy9XKpXKrMDOPvtsVVVVZVZ86ZCcM2dOZj/pVXd6W+kfBBVa/aalV8GpVCrT1traqkmTJqmpqSnrmFpbWwsep/TOKjjZN5VKZfUr1JYrfby55zg59uqrr846F8W2X2xbyb5tbW1qampSbW2tmpqa+lyZL126NGvcpEmTdNRRR8nMMl+vfV39FjuGwXJQ/B5w7v4BoJIO2t8DBoADFQEMAE4IYABwQgADgBMCGACcEMAA4IQABgAnBDAAOCGAAcAJAQwATghgAHBCAAOAEwIYAJwQwADghAAGACcEMAA4IYABwAkBDABOCGAAcEIAA4ATAhgAnBDAAOCEAAYAJwQwADghgAHACQEMAE4IYABwQgADgBMCGACcEMAA4IQABgAnBDAAOCGAAcAJAQwATghgAHBCAAOAEwIYAJwQwADgpKYSO2lsbKzEbg7Y/QNAIRUJ4EWLFlViNwfs/gGgEC5BAIATAhgAnBDAAOCEAAYAJwQwADghgAHACQEMAE4IYABwQgADgBMCGACcEMAA4IQABgAnBDAAOCGAAcAJAQwATghgAHBCAAOAEwIYAJwQwADghAAGACcEMAA4IYABwAkBDABOCGAAcEIAA4ATAhgAnBDAAOCEAAYAJwQwADghgAHACQEMAE4IYABwQgADgBMCGACcEMAA4IQABgAnBDAAOCGAAcAJAQwATghgAHBS47Xj6tdf0fDH1iUed0tSVlu6n3RkJUsDgIpwCeDGxsa8th07eiVJ48blhu2RBfsDwLudSwAvWrTIY7cAcEDhGjAAOCGAAcAJAQwATghgAHBCAAOAEwIYAJwQwADghAAGACcEMAA4IYABwAkBDABOCGAAcEIAA4ATAhgAnBDAAOCEAAYAJwQwADghgAHACQEMAE4IYABwYiGE0jubvSTpqX3c1+GSXt7HsYOJuspDXeWhrvK8V+s6OoRwRG5jWQG8P8xscwhhSkV2VgbqKg91lYe6ynOw1cUlCABwQgADgJNKBvBtFdxXOairPNRVHuoqz0FVV8WuAQMAsnEJAgCcEMAA4GS/A9jMZpjZ782s08wuLfC8mdkt8fMPmtkJpY4d5LrmxfU8aGa/MrMPJZ7rMrOHzOwBM9tc4bpOMbOd8b4fMLMrSx07yHV9MVHTw2a2x8xGx88N5vn6hpn90cweLvK81/zqry6v+dVfXV7zq7+6vObX+81sk5k9ambbzGxxgT6DN8dCCPv8T1K1pCckjZc0VNJWSc05fWZJWi/JJH1E0q9LHTvIdZ0saVT8+cx0XfHjLkmHD0Qt+1DXKZLW7svYwawrp/8cST8d7PMVb/vvJJ0g6eEiz1d8fpVYV8XnV4l1VXx+lVKX4/waK+mE+PORkh6vZIbt7wr4REmdIYTtIYS3JN0tqSWnT4uku0LkfkmHmdnYEscOWl0hhF+FEP4UP7xf0vsGaN/7VdcgjR3obZ8ladUA7btPIYSfS3qljy4e86vfupzmVynnqxjX85WjkvPr+RDC7+LPX5P0qKRxOd0GbY7tbwCPk/RM4vGzyi++WJ9Sxg5mXUnnKvofLi1I+rGZbTGzBQNUUzl1nWRmW81svZkdW+bYwaxLZnaIpBmS/ifRPFjnqxQe86tclZpfpar0/CqZ5/wyswZJfyPp1zlPDdocqym7ymxWoC3399qK9Sll7L4qedtmNlXRC+RjieaPhhCeM7O/lLTBzB6L/wevRF2/U/S+8R4zmyXp+5Imljh2MOtKmyPplyGE5GpmsM5XKTzmV8kqPL9K4TG/yuEyv8ysTlHoXxhCeDX36QJDBmSO7e8K+FlJ7088fp+k50rsU8rYwaxLZnacpNsltYQQutPtIYTn4o9/lHSPom81KlJXCOHVEEJP/Pk6SUPM7PBSxg5mXQmfUc63h4N4vkrhMb9K4jC/+uU0v8pR8fllZkMUhe/KEML3CnQZvDm2nxewayRtl3SM3rkIfWxOn1OVfQH7N6WOHeS6jpLUKenknPYRkkYmPv+VpBkVrGuM3nmDzImSno7Pnev5ivsdqug63ohKnK/EPhpU/IdKFZ9fJdZV8flVYl0Vn1+l1OU1v+Jjv0vSl/voM2hzbCAOYJainxw+IenyuG2hpIWJA7w1fv4hSVP6GjuAJ7a/um6X9CdJD8T/Nsft4+MTuVXSNoe6/iXe71ZFP7w5ua+xlaorfnyOpLtzxg32+Vol6XlJbytacZx7gMyv/uryml/91eU1v/qsy3F+fUzRZYMHE1+rWZWaY7wVGQCc8E44AHBCAAOAEwIYAJwQwADghAAGgCL6u4lQgf6fNrNH4hv7/Fd//QlglMzMxpjZ3Wb2RDzJ1plZk0Md55jZX5U5piH5IjKzE83s5/GdrB4zs9vjt8ECSXcoemt0v8xsoqR/VfTOvWMlXdjfGAIYJTEzU/QupHtDCBNCCM2SLpN0ZIXrqFb0+6JlBXDONo6U9B1Jl4QQPiBpkqQORXfDAjJCgZsImdkEM+uI703xv2b2wfip8yTdGuKbMIXonXt9IoBRqqmS3g4hfC3dEEJ4QNIvzOz6+B6uD5nZmVLmvrP3mtl34xXmyvi+qjPN7L/T24j7rYk//4SZ3WdmvzOz78Tvz0/fD/ZKM/uFojtlTZG0Mr4/7HAzm2xmP4tfED+K71SluH2rmd0n6fzEsZwv6c4Qwn3xcYQQwndDCC+a2Wgz+35839f747cTy8yuNrM7zezHcT2fMrPr4mPuiN/Omq71WjP7TfyvcZC+HvBzm6RFIYTJki6S9O9xe5OkJjP7ZTx3+l05E8Ao1V9L2lKg/VOSjpf0IUnTJF2fDkBFd5a6UFKzonc0fVTSBkkfMbMRcZ8zJX07vh9Bm6RpIYQTJG2WtDSxnzdCCB8LIXwrfm5eCOF4Sb2S2iWdHr8gviHp3+Ix35R0QQjhpBKPRZKWSfq/EMJxilb4dyWem6Dobaktkr4laVMIISVpd9ye9moI4URJX5H05SL7wbtQvCg4WdJ3zOwBSf+h6J7CUvTW5ImK7rl8lqTbzeywvra3v3dDAz4maVUIYY+kF83sZ5I+LOlVRe+Zf1aS4snaEEL4hZl1SJpjZt9VFFwXS/p7RUH9y+hqh4ZKui+xn28X2f8HFAXqhnhctaTnzexQSYeFEH4W9/tPRTdGL+V4TpOkEMJPzaw+3pYkrQ8hvG1mD8X76YjbH1J0n4O0VYmPN5WwT7x7VEn6c/yff65nJd0fQnhb0pNm9ntFgfzbvjYGlGKbpMkF2gvdki/tzcTne/TOf/jflvRpSf8g6bchuhG2SdoQQjg+/tccQjg3MX5XkX2YpG2JcakQwifi9mLvsy92LMWOJ72dNyUphLBX0eWYdPteZS9mQpHP8S4XoltVPmlmZ0iZP1f0ofjp7yu6VKf4O7omRTfrKYoARql+KmmYmZ2XbjCzDyu64cyZZlZtZkco+tMzv+lnW/cq+vM05+mdle39kj6avmZqZof08RsWr+mdH5j9XtIRZnZSPG6ImR0bQvizpJ1mlr4P77zE+K9IajWzv00cyz+Z2RhJP0/3NbNTJL0c8u8P258zEx/v66sjDmxmtkrR1/ADZvasmZ2raH6ca2bpGwSl/wrGjyR1m9kjkjZJ+mJI3Ia0EC5BoCQhhGBm/yjpyxb98cE3FP2trgsl1Sm6W1WQdHEI4YXET4YLbWuPma1V9NsMrXHbS2Z2jqRVZjYs7tqm6E5Tue6Q9DUz2y3pJEmnS7olvlRQo+i66zZJ/yzpG2b2uqIXR3r/L5rZZyTdYNFNvvcqCt7vSbpa0jfN7EFJr6frK9MwM/u1ogXOWfswHgeIEEKxr1/eD9ji74iWKvtnF33ibmjAADKzLkW3K3zZuxYc+LgEAQBOWAEDgBNWwADghAAGACcEMAA4IYABwAkBDABO/h8jFTIkqsA+LQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# your code goes here\n",
"sns.boxplot( x=df['ConvertedComp'])"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='Age'>"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAEGCAYAAABbzE8LAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAN5UlEQVR4nO3dbYyVZX6A8esPgwUxuBUtsYPd2c2sa7tquzgfUBtzBHGHdaN9sanGjVhr/NLwIiWNWmJjih+aGKMh2aa4pXVSs027byBBVtxibWPcDbhbZetLT3bZXeiqgAm0isWBux/OmWHOOIczwMz8zzDXLyHj83qeuc9wefMw8xClFCRJE29a9gVI0lRlgCUpiQGWpCQGWJKSGGBJStJxKjtfeOGFpaura5wuRZLOTrt27TpQSrlo+PpTCnBXVxc7d+4cu6uSpCkgIn460npvQUhSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCU5pX8TbjJbv3491Wr1tI7dt28fAJ2dnWN5SQB0d3ezfPnyMT+vpPY3ZQJcrVb54e7XOXbuBad87PQPDgHw9v+N7XBN/+C9MT2fpMllygQY4Ni5F3Dksi+e8nGz3tgKcFrHjua8kqYm7wFLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSkgkJ8Pr161m/fv1EvJTahO+51FrHRLxItVqdiJdRG/E9l1rzFoQkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAGvcLVmyhEqlwo033ghApVIZ/AWwaNEiKpUKixcvpre3l0qlwtKlSwFYsWIFlUqF1atXA3DXXXdRqVS45557ePDBB6lUKjz00EMArFmzhkqlwv333w/QsL1arXLTTTdRrVYBePjhh6lUKjzyyCMAbNq0iUqlwjPPPMOTTz5JpVJh48aNI34+Q8918OBBVqxYwcGDBwE+tjzUyba1Oq/yjOd7YYA17j766CMAjh49OuL248ePA3Ds2DE+/PBDAI4cOQLAq6++CsArr7wCwJ49e4BarF566SUAXnzxRQB27twJwMsvvwzQsH3dunW8//77rFu3DoAdO3YAsH37dgAef/xxAB577DGefvppAPr6+ka83qHneuqpp3jttdcG9x2+PNTJtrU6r/KM53thgDWulixZ0rA8MOtttnyybQMz6JHcfPPNDcu33HJLw/JAuPfs2cOaNWsatt17772UUgAGPw4YPguuVqsN53r22WcppbBt2zaq1Srbtm0bXB46Yzp48GDTba3O6yw4T6v37Ux1jOnZmti3bx9Hjhxh5cqVE/FyI6pWq0w7WlrvOIGmfXiYavV/UsdlvFSrVWbNmjU4+x0LzWbQAIcPH25YPnToUNN9B2bKA956662m+/b19XH33XcPLg/MoAcMfH7Hjh1j3bp1DbP5vr4+7rvvPqA2i2q2rdV5h++ridPqfTtTLWfAEXFvROyMiJ379+8fsxeWJqOBWepw/f397Nmzh/7+/sHlgdsbAM8//3zTba3OO3xfTZxW79uZajkDLqVsADYA9PT0nNYUsrOzE4AnnnjidA4fEytXrmTXj99Je/2RHJ85h+5Pz0sdl/EyMKs/cOBA8pWMra6urhFj2dHRwfz589m7dy/9/f10dHQ03H654YYb2Lp164jbWp13+L6aOK3etzPlPWCNqxkzZozZuc4555ym2+bMmdOwfP755zfdt6enp2H50ksvbbrvnXfe2bC8du3ahuWBz2/69OmsXbuWadOmDS4PPXbZsmVNt7U67/B9NXFavW9nygBrXA3/I9sLL7xw0uWTbXvuueea7rt58+aG5U2bNjUsd3V1DX589NFHG7Zt2LCBiAAY/Dhg6P1fgO7u7oZzLV26lIigt7eX7u5uent7B5fnzp07eNzcuXObbmt13uH7auK0et/OlAHWuBuYzTWbwQ6dYcycOROAWbNmAXDllVcCsGDBAuBESLu7u7nmmmsAuO6664ATM9uFCxcCNGxfu3Yts2fPHpxpXn/99cCJ79JYtWoVAKtXr+aOO+4APj77HTD0XMuWLeOKK64Y3Hf48lAn29bqvMoznu9FDP+2m5Pp6ekpw/8GeTQG7ge2wz3gI5d98ZSPnfXGVoDTOrbVea86y+8Bn42fm3SqImJXKaVn+HpnwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUpKOiXiR7u7uiXgZtRHfc6m1CQnw8uXLJ+Jl1EZ8z6XWvAUhSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1KSjuwLmEjTP3iPWW9sPY3jDgKc1rGtrgfmjek5JU0eUybA3d3dp33svn39AHR2jnUs553RdUma3KZMgJcvX559CZLUwHvAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCUxwJKUxABLUhIDLElJDLAkJTHAkpTEAEtSEgMsSUkMsCQlMcCSlMQAS1ISAyxJSQywJCWJUsrod47YD/x0/C6nqQuBAwmvO5k4Rq05RqPjOLV2qmP0yVLKRcNXnlKAs0TEzlJKT/Z1tDPHqDXHaHQcp9bGaoy8BSFJSQywJCWZLAHekH0Bk4Bj1JpjNDqOU2tjMkaT4h6wJJ2NJssMWJLOOgZYkpK0VYAj4pKI2BERr0fEjyJiZX39BRGxPSL+q/7xl7OvNVtETI+IH0TElvqyYzRMRHwiIr4eEW/Uv6audpwaRcR99d9ruyPiaxEx0zGCiNgYEe9GxO4h65qOS0Q8EBHViHgzIr4w2tdpqwAD/cCfllJ+HVgI/ElE/AZwP/DdUspngO/Wl6e6lcDrQ5Ydo497AthWSrkM+E1q4+U41UVEJ7AC6CmlXA5MB27DMQL4e6B32LoRx6XeqNuAz9WP+UpETB/Vq5RS2vYXsAlYArwJXFxfdzHwZva1JY/L/PoXwCJgS32dY9Q4RnOAn1D/i+Yh6x2nE2PRCfwcuADoALYANzpGg+PTBexu9bUDPAA8MGS/7wBXj+Y12m0GPCgiuoDPA98D5pVSfgFQ//griZfWDh4H/gw4PmSdY9To08B+4O/qt2q+GhGzcZwGlVL2AY8CPwN+ARwqpTyHY9RMs3EZ+B/ZgL31dS21ZYAj4jzgG8CqUsrh7OtpJxHxJeDdUsqu7Gtpcx3AAuCvSymfB95nav5Ruqn6PcxbgE8BvwrMjogv517VpBQjrBvV9/e2XYAjYga1+D5dSvlmffU7EXFxffvFwLtZ19cGrgVujog9wD8CiyLiH3CMhtsL7C2lfK++/HVqQXacTrgB+EkpZX8p5SPgm8A1OEbNNBuXvcAlQ/abD/z3aE7YVgGOiAD+Fni9lPLYkE2bgWX1/15G7d7wlFRKeaCUMr+U0kXtxv+/lFK+jGPUoJTyNvDziPhsfdVi4D9xnIb6GbAwIs6t/95bTO0vKh2jkTUbl83AbRHxSxHxKeAzwPdHc8K2+km4iPht4N+A1zhxf/NBaveB/wn4NWpfNH9QSnkv5SLbSERUgDWllC9FxFwcowYR8VvAV4FzgB8Df0Rt0uE41UXEw8AfUvsOpB8A9wDnMcXHKCK+BlSoPXbyHeAvgG/TZFwi4s+Bu6mN46pSyrOjep12CrAkTSVtdQtCkqYSAyxJSQywJCUxwJKUxABLUhIDrEkhIn43IkpEXJZ9LdJYMcCaLG4H/p3aD59IZwUDrLZXfzbItcAfUw9wREyLiK/Un2W7JSK2RsSt9W1XRcS/RsSuiPjOwI+PSu3GAGsy+B1qz/V9C3gvIhYAv0ftcYFXUPvprath8Fki64FbSylXARuBRxKuWWqpI/sCpFG4ndojOKH2AKLbgRnAP5dSjgNvR8SO+vbPApcD22uPN2A6tUctSm3HAKut1Z9xsQi4PCIKtaAW4FvNDgF+VEq5eoIuUTpt3oJQu7sV6CulfLKU0lVKuYTav3RxAPj9+r3gedQenAK1f7XgoogYvCUREZ/LuHCpFQOsdnc7H5/tfoPaA8T3AruBv6H2xLxDpZSj1KL9VxHxH8APqT3jVmo7Pg1Nk1ZEnFdK+d/6bYrvA9fWnwMsTQreA9ZktiUiPkHteb9/aXw12TgDlqQk3gOWpCQGWJKSGGBJSmKAJSmJAZakJP8P4R7lhWqGuogAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"###### quiz ######\n",
"\n",
"sns.boxplot( x=df['Age'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Find out the Inter Quartile Range for the column `ConvertedComp`.\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"73132.0"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# your code goes here\n",
"\n",
"#IQR = Q3-Q1\n",
"\n",
"IRQ = df['ConvertedComp'].quantile(0.75) - df['ConvertedComp'].quantile(0.25) \n",
"IRQ"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Find out the upper and lower bounds.\n"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"100000.0 26868.0\n",
"Upper: 209698.0\n",
"Lower: -82830.0\n"
]
}
],
"source": [
"# your code goes here\n",
"\n",
"#UPPER BOUND = 1.5*Q3 OR Q3 + 1.5*IRQ\n",
"#LOWER BOUND = 1.5*Q1 OR Q1 - 1.5*IRQ\n",
"\n",
"Q3 = df['ConvertedComp'].quantile(0.75)\n",
"Q1 = df['ConvertedComp'].quantile(0.25)\n",
"print (Q3, Q1)\n",
"print (\"Upper:\", Q3 + 1.5*IRQ )\n",
"print (\"Lower:\", Q1 - 1.5*IRQ )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Identify how many outliers are there in the `ConvertedComp` column.\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Outlier below lower bound: 0\n",
"Outlier above upper bound: 879\n"
]
}
],
"source": [
"# your code goes here\n",
"\n",
"#OUTLIERS: ANYTHING THAT DOES NOT FALL IN BETWEEN UPPER AND LOWER BOUNDS\n",
"\n",
"print ('Outlier below lower bound:', df['ConvertedComp'].lt(Q1 - 1.5*IRQ ).sum())\n",
"print ('Outlier above upper bound:', df['ConvertedComp'].gt(Q3 + 1.5*IRQ ).sum())"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"57745.0"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"###### quiz ######\n",
"\n",
"df['ConvertedComp'].median()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a new dataframe by removing the outliers from the `ConvertedComp` column.\n"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(9703, 85)"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# your code goes here\n",
"\n",
"df_new = df[df['ConvertedComp'].le(Q3 + 1.5*IRQ)]\n",
"df_new.shape\n"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"52704.0"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"###### quiz ######\n",
"\n",
"df_new['ConvertedComp'].median()"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"59883.20838915799"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"###### quiz ######\n",
"\n",
"df_new['ConvertedComp'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Correlation\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Finding correlation\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Find the correlation between `Age` and all other numerical columns.\n"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Respondent 0.004041\n",
"CompTotal 0.006970\n",
"ConvertedComp 0.105386\n",
"WorkWeekHrs 0.036518\n",
"CodeRevHrs -0.020469\n",
"Age 1.000000\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# your code goes here\n",
"df.corr()['Age']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Authors\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ramesh Sannareddy\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Other Contributors\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rav Ahuja\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Change Log\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n",
"| ----------------- | ------- | ----------------- | ---------------------------------- |\n",
"| 2020-10-17 | 0.1 | Ramesh Sannareddy | Created initial version of the lab |\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright © 2020 IBM Corporation. This notebook and its source code are released under the terms of the [MIT License](https://cognitiveclass.ai/mit-license?utm_medium=Exinfluencer\\&utm_source=Exinfluencer\\&utm_content=000026UJ\\&utm_term=10006555\\&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork21426264-2021-01-01\\&cm_mmc=Email_Newsletter-\\_-Developer_Ed%2BTech-\\_-WW_WW-\\_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264\\&cm_mmca1=000026UJ\\&cm_mmca2=10006555\\&cm_mmca3=M12345678\\&cvosrc=email.Newsletter.M12345678\\&cvo_campaign=000026UJ).\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

IBM

Data Engineering Capstone Project

This course is part of IBM Data Engineering Professional Certificate

Rav Ahuja

Instructors: Rav Ahuja +1 more

Instructors

Instructor ratings.

We asked all learners to give feedback on our instructors based on the quality of their teaching style.

Sponsored by FutureX

11,888 already enrolled

(101 reviews)

Recommended experience

Advanced level

Complete all prior courses in the IBM Data Engineering Professional Certificate.

What you'll learn

Demonstrate proficiency in skills required for an entry-level data engineering role.

Design and implement various concepts and components in the data engineering lifecycle such as data repositories.

Showcase working knowledge with relational databases, NoSQL data stores, big data engines, data warehouses, and data pipelines.

Apply skills in Linux shell scripting, SQL, and Python programming languages to Data Engineering problems.

Details to know

data engineering capstone project github

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Placeholder

Build your Data Management expertise

  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate from IBM

Placeholder

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

Placeholder

There are 7 modules in this course

Showcase your skills in this Data Engineering project! In this course you will apply a variety of data engineering skills and techniques you have learned as part of the previous courses in the IBM Data Engineering Professional Certificate.

You will demonstrate your knowledge of Data Engineering by assuming the role of a Junior Data Engineer who has recently joined an organization and be presented with a real-world use case that requires architecting and implementing a data analytics platform. In this Capstone project you will complete numerous hands-on labs. You will create and query data repositories using relational and NoSQL databases such as MySQL and MongoDB. You’ll also design and populate a data warehouse using PostgreSQL and IBM Db2 and write queries to perform Cube and Rollup operations. You will generate reports from the data in the data warehouse and build a dashboard using Cognos Analytics. You will also show your proficiency in Extract, Transform, and Load (ETL) processes by creating data pipelines for moving data from different repositories. You will perform big data analytics using Apache Spark to make predictions with the help of a machine learning model. This course is the final course in the IBM Data Engineering Professional Certificate. It is recommended that you complete all the previous courses in this Professional Certificate before starting this course.

In this module, you will design a data platform that uses MySQL as an OLTP database. You will be using MySQL to store the OLTP data.

What's included

2 videos 2 quizzes 1 app item 2 plugins

2 videos • Total 5 minutes

  • Introduction to Capstone Project • 4 minutes • Preview module
  • Assignment Overview • 1 minute

2 quizzes • Total 22 minutes

  • Checklist: OLTP Database • 10 minutes
  • Graded Quiz: OLTP Database • 12 minutes

1 app item • Total 30 minutes

  • Hands-on Lab: OLTP Database • 30 minutes

2 plugins • Total 15 minutes

  • Data Platform Architecture • 10 minutes
  • OLTP Database Requirements and Design • 5 minutes

Querying Data in NoSQL Databases

In this module, you will design a data platform that uses MongoDB as a NoSQL database. You will use MongoDB to store the e-commerce catalog data.

1 video 2 quizzes 1 app item

1 video • Total 1 minute

  • Assignment Overview: Querying Data in NoSQL Databases • 1 minute • Preview module

2 quizzes • Total 25 minutes

  • Checklist: Querying Data in NoSQL Databases • 10 minutes
  • Graded Quiz: Querying Data in NoSQL Databases • 15 minutes
  • Hands-on Lab: Querying Data in NoSQL Databases • 30 minutes

Build a Data Warehouse

In this module you will design and implement a data warehouse and you will then generate reports from the data in the data warehouse.

2 videos 1 reading 3 quizzes 3 app items 1 plugin

2 videos • Total 4 minutes

  • Assignment Overview: Data Warehouse Design & Setup • 2 minutes • Preview module
  • Assignment Overview: Data Warehouse Reporting • 1 minute

1 reading • Total 1 minute

  • Optional Lab Information • 1 minute

3 quizzes • Total 45 minutes

  • Checklist: Data Warehousing • 14 minutes
  • Checklist: Data Warehouse Reporting • 16 minutes
  • Graded Quiz: Data Warehouse & Reporting • 15 minutes

3 app items • Total 180 minutes

  • Hands-on Lab: Data Warehousing • 60 minutes
  • Hands-on Lab: Data Warehouse Reporting using PostgreSQL • 60 minutes
  • (Optional) Obtain IBM Cloud Feature Code and Activate Trial Account • 60 minutes

1 plugin • Total 30 minutes

  • (Optional) Hands-on Lab: Data Warehouse Reporting using DB2 • 30 minutes

Data Analytics

In this module, you will assume the role of a data engineer at an e-commerce company. Your company has finished setting up a data warehouse. Now you are assigned the responsibility to design a reporting dashboard that reflects the key metrics of the business.

1 video 5 readings 2 quizzes 4 plugins

  • Assignment Overview • 1 minute • Preview module

5 readings • Total 42 minutes

  • (Optional): About this optional lesson with Looker Studio • 2 minutes
  • (Optional) : Getting Started with Google Looker Studio • 10 minutes
  • (Optional): Creating Visualizations in Reports using Looker Studio • 10 minutes
  • (Optional) : Summary and Highlights • 10 minutes
  • Final Assignment Overview • 10 minutes

2 quizzes • Total 27 minutes

  • Checklist: Dashboard Creation • 12 minutes
  • Graded Quiz: Dashboard Creation • 15 minutes

4 plugins • Total 180 minutes

  • (Optional):Hands-on Lab: Getting Started with Google Looker Studio • 60 minutes
  • (Optional): Hands-on Lab: Creating and Configuring Visualizations in Reports with Google Looker Studio • 60 minutes
  • Final Assignment Part A: Dashboard Creation using IBM Cognos Analytics • 30 minutes
  • Final Assignment Part B: Dashboard Creation using Google Looker Studio • 30 minutes

ETL & Data Pipelines

In this module, you will use the given python script to perform various ETL operations that move data from RDBMS to NoSQL, NoSQL to RDBMS, and from RDBMS, NoSQL to the data warehouse. You will write a pipeline that analyzes the web server log file, extracts the required lines and fields, transforms and loads data.

2 videos 3 quizzes 2 app items

  • Assignment Overview: ETL • 2 minutes • Preview module
  • Assignment Overview: Data Pipelines using Apache Airflow • 1 minute

3 quizzes • Total 39 minutes

  • Checklist: ETL • 6 minutes
  • Checklist: Data Pipelines using Apache Airflow • 18 minutes
  • Graded Quiz: ETL & Data Pipelines using Apache Airflow • 15 minutes

2 app items • Total 90 minutes

  • Hands-on Lab: ETL • 60 minutes
  • Hands-on Lab: Data Pipelines using Apache Airflow • 30 minutes

Big Data Analytics with Spark

In this module, you will use the data from a webserver to analyse search terms. You will then load a pretrained sales forecasting model and predict the sales forecast for a future year.

1 video 2 quizzes 2 app items

  • Assignment Overview: Big Data Analytics with Spark • 0 minutes • Preview module

2 quizzes • Total 29 minutes

  • Checklist: Big Data Analytics with Spark • 14 minutes
  • Graded Quiz: Big Data Analytics with Spark • 15 minutes

2 app items • Total 60 minutes

  • Practice Hands On Lab: Saving and loading a SparkML model • 30 minutes
  • Hands-on Lab: SparkML Ops • 30 minutes

Final Submission and Peer Review

In this final module you will complete your submission of screenshots from the hands-on labs for your peers to review. Once you have completed your submission you will then review the submission of one of your peers and grade their submission.

2 readings 1 peer review

2 readings • Total 3 minutes

  • Congrats & Next Steps • 2 minutes
  • Thanks from the Course Team • 1 minute

1 peer review • Total 120 minutes

  • Submit your Work and Review your Peers • 120 minutes

data engineering capstone project github

IBM is the global leader in business transformation through an open hybrid cloud platform and AI, serving clients in more than 170 countries around the world. Today 47 of the Fortune 50 Companies rely on the IBM Cloud to run their business, and IBM Watson enterprise AI is hard at work in more than 30,000 engagements. IBM is also one of the world’s most vital corporate research organizations, with 28 consecutive years of patent leadership. Above all, guided by principles for trust and transparency and support for a more inclusive society, IBM is committed to being a responsible technology innovator and a force for good in the world. For more information about IBM visit: www.ibm.com

Why people choose Coursera for their career

data engineering capstone project github

Learner reviews

Showing 3 of 101

101 reviews

Reviewed on Mar 18, 2023

I enjoyed having to go back and revise the other courses in the specialization. I had forgotten how interesting they were.

Reviewed on Mar 10, 2024

The Capstone was a bit of an anticlimax. I was expecting a very challenging Capstone, but found a "follow the instructions" approach which made it seem too simple. I'm not complaining ;-)

Recommended if you're interested in Information Technology

data engineering capstone project github

Machine Learning with Apache Spark

data engineering capstone project github

Python Project for Data Engineering

data engineering capstone project github

Introduction to Big Data with Spark and Hadoop

data engineering capstone project github

Introduction to NoSQL Databases

Placeholder

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

banner-in1

  • Data Science

Top 20 Data Engineering Project Ideas 2024 [With Source Code]

Home Blog Data Science Top 20 Data Engineering Project Ideas 2024 [With Source Code]

Play icon

Welcome to the world of data engineering, where the power of big data unfolds. If you're an aspiring data engineer and seeking to showcase your skills or gain hands-on experience, you've landed in the right spot. Get ready to learn the best data engineering project concepts and explore a world of exciting data engineering projects in this article.

Before working on these initiatives, you should be conversant with topics and technologies. Companies are constantly seeking experienced data engineers who can create innovative data engineering initiatives. Therefore, the greatest thing you can do as a novice is to work on some real-time data engineering initiatives. Working on a data engineering project will not only give you a deeper understanding of how data engineering works, but it will also improve your problem-solving skills as you encounter and fix problems within the project. Best Data Science certifications online or offline are available to assist you in establishing a solid foundation for every end-to-end data engineering project.

What are Data Engineering Projects?

If you want to break into the field of data engineering but don't yet have any expertise in the field, compiling a portfolio of data engineering projects may help. From EDA and data cleansing to data modeling and visualization, the greatest data engineering projects demonstrate the whole data process from start to finish.

Data pipeline best practices should be shown in these initiatives. You should be able to identify potential weak spots in data pipelines and construct robust solutions to withstand them. Finally, make data visualizations to display your project's results and construct a website to showcase your work, whether it's a portfolio or a personal site.

The first step in hiring data engineers is reviewing a candidate's résumé. When screening resumes, most hiring managers prioritize candidates who have actual experience working on data engineering projects.

Structure of a Data Engineering Project 

Here is the Project Folder Structure for data engineer project ideas:

  • config/ (Configuration Files)
  • data/ (Data Files
  • docs/ (Documentation)
  • etl/ (Extract-Transform-Load)
  • pipelines/ (Data Pipeline Orchestration)
  • src/ (Source Code)
  • tests/ (Project Tests)
  • .gitignore (Version Control Exclusion)
  • environment.yml (Conda Environment)
  • README.md (Project Overview

List of Top Data Engineering Projects of 2024

Data engineers make unprocessed data accessible and functional for other data professionals. Multiple types of data exist within organizations, and it is the obligation of data architects to standardize them so that data analysts and scientists can use them interchangeably. If data scientists and analysts are pilots, data engineers are aircraft manufacturers. Without the latter, the former cannot accomplish its objectives. From analysts to Big Data Engineers, everyone in the field of data science has been discussing data engineering.

Here are the list of data engineering project ideas (beginner, intermediate, and professionals): 

  • Stock and Twitter Data Extraction Using Python, Kafka, and Spark
  • Use Python to Scrape Real Estate Listings and Make a Dashboard
  • Use Stack Overflow Data for Analytic Purposes
  • Extracting Inflation Rates from CommonCrawl and Building a Model
  • Realtime Data Analytics
  • Yelp Review Analysis
  • Finnhub API with Kafka for Real-Time Financial Market Data Pipeline
  • Pipeline for Real-Time Data Processing in Music Applications
  • Anomaly Detection in Cloud Servers
  • Smart Cities Using Big Data
  • Tourist Behaviour Analysis
  • Image Caption Generator.

When constructing a data engineering project, you should prioritize the following areas:

  • Multiple sources of data (APIs, websites, CSVs, JSON, etc.)
  • Data consumption
  • Data storage
  • Data visualization (So that you have something to show for your efforts)
  • Utilising multiple instruments

Top 4 Data Engineering Project Ideas: Beginner & Final Year Students

Becoming an expert data engineer necessitates familiarity with the best practices and cutting-edge technologies in your field. Participating in a data engineering project is a great way to learn the ropes of the field. That's why we're going to zero in on the data engineering initiatives that need your attention. If you are struggling with Data Engineering projects for beginners, then Data Engineer Bootcamp is for you.

Some simple beginner Data Engineer projects that might help you go forward professionally are provided below.

1. Stock and Twitter Data Extraction Using Python, Kafka, and Spark

Project Overview:  The rising and falling of GameStop's stock price and the proliferation of cryptocurrency exchanges have made stocks a topic of widespread attention.

Stock and Twitter Data Extraction Using Python

If you share this individual's enthusiasm for the markets, you may want to consider creating a tool like Cashtag, which was created by a Reddit developer. For this study, we wanted to create a "big data pipeline for user sentiment analysis on the US stock market." In a nutshell, this initiative uses social media data to provide real-time market sentiment predictions. The process flow for this project is shown in the following diagram:

This project's documentation will serve as a starting point from which you may draw ideas for your own work.

Source Code: Stock and Twitter Data Extraction Using Python, Kafka, and Spark

2. Use Python to Scrape Real Estate Listings and Make a Dashboard

Project Overview:  If you're looking to get your hands dirty with some cutting-edge tech and big Data Engineering projects for engineering students, consider something like sspaeti's 20-minute data engineering project. The purpose of this work is to provide a resource that can help you find the best possible home or rental.

Use Python to Scrape Real Estate Listings and Make a Dashboard

Source: Medium

Web scraping applications like Beautiful Soup and Scrapy are used to gather information for this project. As a data engineer, you should get experience writing Python programs that process HTML, and web scraping is an excellent method to do so. Delta Lake and Kubernetes are both trending subjects. Therefore it's interesting to see them both addressed in this project.

Finally, a well-designed user interface is an essential part of any successful data engineering project. Superset is used for data visualization in this project, while Dagster is used to coordinate the many moving parts. The wide range of methods used in this work makes it an excellent addition to a resume.

Source: Use Python to Scrape Real Estate Listings and Make a Dashboard

3. Use Stack Overflow Data for Analytic Purposes

Project Overview:  What if you had access to all or most of the public repos on GitHub? Which queries do you have?

As part of similar research, Felipe Hoffa analysed gigabytes of data spread over many publications from Google's BigQuery data collection. However, the abundance of data opens numerous possibilities for research and analysis. Concepts that Felipe examined include:

  • The Case for Tabs
  • Which languages do programmers spend their weekends working on?
  • Searching for questions and comments in GitHub repos.
  • Since there are numerous ways to approach this task, it encourages originality in one's approach to data analysis.

2.8 million open-source projects are available for inspection.

Moreover, this project concept should highlight the fact that there are many interesting datasets already available on services like GCP and AWS. Hundreds of datasets are available from these two cloud services, so you may practise your analytical skills without having to scrape data from an API.

Source: Use Stack Overflow Data for Analytic Purposes

4. Extracting Inflation Rates from CommonCrawl and Building a Model

Project Overview:  Dr. Usama Hussain worked on another intriguing idea. He calculated the rate of inflation by following internet pricing fluctuations for products and services. Given that the United States has had the highest inflation rate since 2008, this is a significant problem.

The author utilised petabytes of website data from the Common Crawl in their effort.

This is also another excellent example of putting together and showing a data engineering project, in my opinion. One of the difficulties I often mention is how difficult it may be to demonstrate your data engineering job.

However, Dr. Hussain's project is documented in such a way that it is possible to see what work was done and the skills he possesses without having to dig into all the code.

The data flow is outlined below by Dr. Hussain.

Source Code: Extracting Inflation Rates from CommonCrawl and Building a Model

Top 4 Data Engineering Project Ideas: Intermediate Level

Knowing big data theory alone will not get you very far. You'll need to put your newfound knowledge into action. Working on big data projects allows you to put your big data skills to the test. Projects are a wonderful way to put your skills to the test. They are also excellent for your resume. This post will go over some amazing Big Data projects that you may work on to demonstrate your big data expertise and these are solid Data Engineer projects for resume.

Here are some data engineering project ideas to consider and Data Engineering portfolio project examples to demonstrate practical experience with data engineering problems.

1. Realtime Data Analytics

Project Overview:  Olber, a corporation that provides taxi services, is gathering information about each and every journey. Per trip, two different devices generate additional data. The taxi metre transmits information on the length of each journey, the distance travelled, as well as the pick-up and drop-off locations. Customers' payments are processed using a smartphone application, which also provides reliable and easily accessible data about fares. In order to identify patterns among its customers, the taxi firm needs to compute, in real-time, the typical amount of cash given as a tip for each kilometre travelled in each region.

A complete end-to-end stream processing pipeline is shown here using an architectural diagram. Extracting, transforming, loading, and reporting are the four processes that make up this kind of pipeline. The pipeline in this reference design collects data from two different sources, then conducts a join operation on related records from each stream, then enriches the output, and finally produces an average. The findings are being saved for use in further analyses.

Source Code: Realtime Data Analytics

2. Yelp Review Analysis

Project Overview:  Yelp is a platform that allows people to post reviews and provide a star rating to businesses that they have visited. Studies found that a one-star raise led to a 59 percent gain in revenue for independently owned and operated firms. As a consequence of this, we think the Yelp dataset has a lot of promise as a resource for gaining valuable insights. Yelp reviews written by customers are a treasure trove just waiting to be unearthed.

The primary objective of this project is to carry out in-depth analyses of seven different cuisine types of restaurants, namely Korean, Japanese, Chinese, Vietnamese, Thai, French, and Italian, in order to determine what makes a good restaurant and what concerns customers, and then to make recommendations for future improvement and growth in profit. The majority of our focus will be on analysing feedback from consumers to figure out why they either like or detest the company. Using big data, we are able to transform unstructured data, such as customer reviews, into actionable insights, which enables businesses to better understand how and why customers prefer their products or services and to make improvements to their operations as quickly as is practically possible.

Source Code: Yelp Review Analysis

3. Finnhub API with Kafka for Real-Time Financial Market Data Pipeline

Project Overview:  The goal of this project is to construct a streaming data pipeline by making use of the real-time financial market data API provided by Finnhub. This project's architecture is essentially composed on five layers: the Data Ingestion layer, the Message broker layer, the Stream processing layer, the Serving database layer, and the Visualisation layer. A dashboard that provides data in a graphical manner for in-depth study is the final product of this project.

The pipeline consists of many different components, one of which is a producer that retrieves data from Finnhub's API and then transmits that data to a Kafka topic, which is part of a Kafka cluster that stores the data and processes it. Apache Spark is going to be used for stream processing. The next step is to use Cassandra for the purpose of storing the real-time financial market data that is being sent over the pipeline. Users are able to watch the market data in real-time and detect trends and patterns by using the final dashboard that was created with the help of Grafana. This dashboard shows real-time charts and graphs that are based on the data that is stored in the database.

Source Code: Finnhub API with Kafka for Real-Time Financial Market Data Pipeline

4. Pipeline for Real-Time Data Processing in Music Applications

Project Overview:  The project will stream events that are created by a fictitious music streaming service that operates similarly to Spotify. Additionally, a data pipeline that consumes real-time data will be developed. The incoming data would be analogous to an event that occurred when a person listened to music, navigated around the website, or authenticated themselves. The processing of the data would take place in real-time, and it would be saved to the data lake at regular intervals (every two minutes). The hourly batch job will then make use of this data by consuming it, applying transformations to it, and creating the tables that are needed for our dashboard so that analytics may be generated. We are going to try to conduct an analysis of indicators such as the most played songs, active users, user demographics, etc.

You will be able to generate a sample dataset for this project by using Eventism and the Million Songs dataset. Apache Kafka and Apache Spark are two examples of streaming technologies that are used for processing data in real-time. The Structured Streaming API offered by Spark makes it possible for data to be processed in real-time in mini-batches, which in turn offers low-latency processing capabilities. The processed data are uploaded to Google Cloud Storage, where they are then subjected to transformation with the assistance of dbt. We can clean the data, convert the data, and aggregate the data using dbt so that it is ready for analysis. The data is then sent to BigQuery, which serves as a data warehouse, and Data Studio is used to create a visual representation of the data. Apache AirFlow has been used for the purpose of orchestration, whilst Docker is the tool of choice when it comes to containerization.

Source Code: Pipeline for Real-Time Data Processing in Music Applications

Top 4 Data Engineering Project Ideas - Advanced Level

After you have worked on these, adding projects for Data Engineer to your resume will likely increase the likelihood that an interview will be requested of you.

1. Anomaly Detection in Cloud Servers

Source: ResearchGate

Project Overview:  Anomaly detection is a valuable instrument for cloud platform administrators who wish to monitor and analyse cloud behaviour in order to increase cloud reliability. It aids cloud platform administrators in detecting unanticipated system activity in order to take preventative measures prior to a system breakdown or service failure.

This project provides a reference implementation of a Cloud Dataflow streaming pipeline that integrates with BigQuery ML, Cloud AI Platform, to detect anomalies. A critical component of the implementation utilises Dataflow for feature extraction and real-time outlier detection, which has been validated on over 20TB of data.

Source Code: Anomaly Detection in Cloud Servers

2. Smart Cities Using Big Data

Project Overview:  A "smart city" is an ultra-modern urban area that gathers data through electronic means, voice activation techniques, and sensors. The data is used to better manage the city's assets, resources, and services, which in turn leads to better citywide operations. In order to keep tabs on and manage things like traffic and transportation systems, power plants, utilities, water supply networks, waste, crime detection, information systems, educational institutions, health care facilities, and more, data is gathered from citizens, devices, buildings, and assets and then processed and analysed. This data is collected by means of big data, and then the complex characteristics of a smart city may be put into effect with the aid of advanced algorithms, smart network infrastructures, and numerous analytics platforms. For traffic or stadium sensing, analytics, and management, this smart city reference pipeline demonstrates how to combine several media building pieces with analytics provided by the OpenVINO Toolkit.

Source Code: Smart Cities Using Big Data

3. Tourist Behaviour Analysis

Project Overview:  One of the most forward-thinking ideas for a big data project is presented here. The purpose of this Big Data project is to research visitor behaviour in order to ascertain the preferences of tourists and the locations that are visited the most, as well as to anticipate the need for tourism in the future.

What part does large amounts of data play in the whole project? Because vacationers use the internet and other technologies while they are away from home, they leave digital traces that can be easily collected and distributed by Big Data. The vast majority of the data comes from outside sources like social media websites. The sheer amount of data is just too much for a conventional database to manage, which is why big data analytics is required. The data collected from all of these sources may be put to use to assist companies in the airline, hotel, and tourism sectors in expanding their client base and marketing their products and services. Additionally, it can assist tourism organisations in visualising and forecasting current and future trends, which is another useful application for the tool.

Source Code: Tourist Behavior Analysis

4. Image Caption Generator

Project Overview:  Businesses must now upload engaging content as a result of the rise of social media and the significance of digital marketing. Visuals that are enticing to the eye are essential, but the images must also be accompanied by subtitles. Utilising hashtags and attention-grabbing subtitles may help you reach the intended audience more effectively. Large datasets containing photos and captions that are correlated must be managed. Image processing and deep learning are used to comprehend the image, and artificial intelligence is used to generate relevant and alluring captions. Python source code for Big Data can be written. The creation of image captions is not a Big Data project proposal for beginners and is indeed difficult. Using CNN (Convolution Neural Network) and RNN (Recurrent Neural Network) with BEAM Search, the project described below employs a neural network to generate captions for an image.

Rich and colourful datasets, such as MSCOCO, Flickr8k, Flickr30k, PASCAL 1K, AI Challenger Dataset, and STAIR Captions, are currently used in the generation of image descriptions and are gradually becoming a topic of discussion. The supplied project employs cutting-edge machine learning and big data algorithms to create an efficient image caption generator.

Source Code: Image Caption Generator

Open-Source Data Engineering Project Ideas: Additional Topics

Below are some Data Engineering project topic examples

1. Analytics Application
2. Extract, Transform, Load (ETL)
3. Extracting Inflation Data
4. Building Data Pipelines
5. Creating a Data Repository
6. Analyse Security Breach
7. Aviation Data Analysis
8. Shipping and Distribution Demand Forecasting

Why Should You Work on Data Engineering-Based Projects?

In conjunction with machine learning, it enables the development of marketing plans that are based on the forecasts of customers. Businesses that use big data analytics become more customer focused.

Learning this skill set, which is in great demand, will allow you to make rapid strides in your professional development. Because of this, the best thing you can do if you're new to big data is to think of some ideas for projects that include big data.

Data engineers are responsible for the construction and administration of computer hardware and software systems that are used for the gathering, formatting, storing, and processing of data. In addition to this, they make sure that the data is always readily accessible to consumers. The end-to-end data process is shown via data engineering projects, which range from exploratory data analysis (EDA) and data cleansing through data modelling and visualization.

Including Data Engineering projects on your resume is quite crucial if you want your application for a job to stand out from the other applicants who have applied for the same position.

Best Platforms to Work on Data Engineering Projects

The following is a list of several platforms that are suitable for use in Data Engineering real time projects -

  • Great Expectations

One of the finest data science learning platforms, Google Cloud provides all of the tools that data scientists use to extract value from data, making it one of the top data science learning platforms. Business intelligence solutions like as Power BI, Tableau, and Looker may assist companies in mitigating operational risk and achieving maximum efficiency in terms of operations enablement by assisting businesses in making choices that are supported by data.

Learn Data Engineering the Smart Way!

A few thing that you should keep in mind while studying for data engineering projects and jobs are -

  • Learn how to program in languages such as Python and Scala and become an expert in those languages.
  • Scripting and automation are skills you should learn.
  • Gain familiarity with database management, and work on improving your SQL skills.
  • Master data processing methods.
  • Acquire the skill of scheduling your workflows.
  • Gain experience in cloud computing by using services such as Amazon Web Services.
  • Improve your understanding of technologies used in infrastructure, such as Docker and Kubernetes, for example.
  • Maintain a current awareness of the trends in the industry.
Elevate your career with business analyst certificate programs  . Establish your expertise and open doors to limitless opportunities!

This article examines some of the finest concepts for large data projects. We began with some basic, quick-to-complete assignments and have added Data Engineering projects with source code.

The optimal undertaking is one that establishes a balance between industry interests and personal interests. Whether you like it or not, your personal interest will be communicated through the topic you select, so it is essential to select a topic that you enjoy. If you have an interest in equities, real estate, politics, or any other niche category, you can use the projects listed above as a template for your own project. Checkout the KnowledgeHut’s best Data Science certification online for Data Engineering project ideas.

Frequently Asked Questions (FAQs)

An online portfolio is the best way to showcase your work. Document each project's construction and operation. Your blog entries or Github repositories may show your problem description, recommended design, data analysis approach, and results. Adding real world Data Engineering projects is a good way to showcase projects for Data Engineering. 

Start with a question. Next, find a relevant dataset. Kaggle, FiveThirtyEight, Google Trends, the Census Bureau, and Data.gov provide free datasets. Use an open API or web scraping tools to get website data. 

project-worthy topics in data engineering:

  • Data pipeline development
  • Data warehousing
  • Data modeling
  • Data integration
  • Data migration 

Data engineering creates a trustworthy data storage and processing infrastructure. Building and maintaining data pipelines to centralize data sources. Data engineers build and maintain the infrastructure data scientists and analysts utilize to work with data. 

An example of data engineering with example is businesses wanting to know how website visitors behave. Web logs, smartphone apps, and social media accounts provide data. Databases, JSON, and CSV files contain the data. This data must be collected, normalized, imported into a central data repository, and examined. Data engineering is fascinating. Data engineers take data from multiple sources, convert it to Parquet or ORC, then put it into a data warehouse like Amazon Redshift or Google BigQuery. Data scientists and analysts may then study the data in a data warehouse. 

Profile

Ritesh Pratap Arjun Singh

RiteshPratap A. Singh is an AI & DeepTech Data Scientist. His research interests include machine vision and cognitive intelligence. He is known for leading innovative AI projects for large corporations and PSUs. Collaborate with him in the fields of AI/ML/DL, machine vision, bioinformatics, molecular genetics, and psychology.

Avail your free 1:1 mentorship session.

Something went wrong

Upcoming Data Science Batches & Dates

NameDateFeeKnow more

Course advisor icon

Data Engineering Capstone

Scope of works.

The purpose of this project is to demonstrate various skills associated with data engineering projects. In particular, developing ETL pipelines using Airflow, constructing data warehouses through Redshift databases and S3 data storage as well as defining efficient data models e.g. star schema. As an example I will perform a deep dive into US immigration, primarily focusing on the type of visas being issued and the profiles associated. The scope of this project is limited to the data sources listed below with data being aggregated across numerous dimensions such as visatype, gender, port_of_entry, nationality and month.

Further details and analysis can be found here

Data Description & Sources

  • I94 Immigration Data: This data comes from the US National Tourism and Trade Office found here . Each report contains international visitor arrival statistics by world regions and select countries (including top 20), type of visa, mode of transportation, age groups, states visited (first intended address only), and the top ports of entry (for select countries).
  • World Temperature Data: This dataset came from Kaggle found here .
  • U.S. City Demographic Data: This dataset contains information about the demographics of all US cities and census-designated places with a population greater or equal to 65,000. Dataset comes from OpenSoft found here .
  • Airport Code Table: This is a simple table of airport codes and corresponding cities. The airport codes may refer to either IATA airport code, a three-letter code which is used in passenger reservation, ticketing and baggage-handling systems, or the ICAO airport code which is a four letter code used by ATC systems and for airports that do not have an IATA airport code (from wikipedia). It comes from here .

data engineering capstone project github

Data Storage

data engineering capstone project github

Data was stored in S3 buckets in a collection of CSV and PARQUET files. The immigration dataset extends to several million rows and thus this dataset was converted to PARQUET files to allow for easy data manipulation and processing through Dask and the ability to write to Redshift.

data engineering capstone project github

Dask is an extremely powerful and flexible library to handle parallel computing for dataframes in Python. Through this library, I was able to scale pandas and numpy workflows with minimal overhead. Whilst PySpark is a great API to Spark and tool to handle big data, I also highly recommend Dask, which you can read more about here .

ETL Pipeline

data engineering capstone project github

Defining the data model and creating the star schema involves various steps, made significantly easier through the use of Airflow. The process of extracting files from S3 buckets, transforming the data and then writing CSV and PARQUET files to Redshift is accomplished through various tasks highlighted below in the ETL Dag graph. These steps include:

  • Extracting data from SAS Documents and writing as CSV files to S3 immigration bucket
  • Extracting remaining CSV and PARQUET files from S3 immigration bucket
  • Writing CSV and PARQUET files from S3 to Redshift

data engineering capstone project github

Overall this project was a small undertaking to demonstrate the steps involved in developing a data warehouse that is easily scalable. Skills include:

  • Creating a Redshift Cluster, IAM Roles, Security groups.
  • Developing an ETL Pipeline that copies data from S3 buckets into staging tables to be processed into a star schema
  • Developing a star schema with optimization to specific queries required by the data analytics team.
  • Using Airflow to automate ETL pipelines using Airflow, Python, Amazon Redshift.
  • Writing custom operators to perform tasks such as staging data, filling the data warehouse, and validation through data quality checks.
  • Transforming data from various sources into a star schema optimized for the analytics team’s use cases.

Related Posts

  • Data Engineering Projects

Daniel Diamond

data watches music travel

data engineering capstone project github

Hands-On Data Engineering: A Comprehensive Walkthrough of the IBM Capstone Project

Daniel Chaves

Daniel Chaves

E mbarking on a career in data engineering? The IBM Data Engineering Professional Certificate is your roadmap to success. This comprehensive program is designed to equip aspiring data engineers with the essential skills and tools they need to excel in the field. The journey culminates in a hands-on capstone project, a real-world scenario where learners can apply their data engineering knowledge and skills. In this article, we’re going to unpack this capstone project, revealing how it fosters a holistic understanding of data engineering and prepares you for a successful career in this dynamic field.

Data Engineering Project Overview

The project scenario revolves around an e-commerce platform. This platform requires a robust and efficient data infrastructure to enable advanced analytics that drive business decisions. As the data engineer working on this project, your mission is multi-fold. You need to extract data from various sources, transform this raw data into a suitable schema, load it into appropriate databases and data warehouses, orchestrate data pipelines for efficient data movement, and perform analytics to derive valuable insights.

Key Stages of the Data Engineering Project

The capstone project is strategically divided into several stages, each focusing on a specific set of data engineering tasks:

Transactional Database Setup with MySQL

The first phase of the project involves setting up a transactional database for the e-commerce platform’s transactional data. This phase allows you to dive into the world of MySQL databases, where you get hands-on experience in:

  • Designing a normalized schema, optimized for Online Transaction Processing (OLTP) workloads. This allows for efficient transaction management in the database.
  • Loading structured data from CSV files into MySQL tables. This gives you a practical understanding of how data is ingested into databases.
  • Writing SQL queries for data manipulation. This strengthens your command over SQL, the standard language for relational database management systems.
  • Creating indexes for faster data lookups, thereby improving the performance of the database.
  • Automating data dumps using bash scripts, a skill that is essential for maintaining and updating databases.

NoSQL Datastore Modeling with MongoDB

The second phase of the project requires you to work with MongoDB, a popular NoSQL database, to store the e-commerce catalog’s unstructured product metadata. During this phase, you will be involved in:

  • Modeling data using MongoDB’s flexible document schema. This flexible schema is one of the reasons MongoDB is preferred for handling unstructured data.
  • Importing and exporting JSON data, the format in which data is stored in MongoDB.
  • Writing MongoDB queries to analyze document collections. This gives you an understanding of how NoSQL databases handle data retrieval.

Data Warehouse Implementation with PostgreSQL

Following the NoSQL phase, you move towards the implementation of a data warehouse to facilitate analytics and reporting. For this, you use PostgreSQL, a powerful, open-source object-relational database system. This phase introduces you to:

  • The design of a schema with fact and dimension tables, the core of any data warehouse.
  • Modeling logical relationships between entities, providing a clear structure to your data warehouse.
  • Loading data into the analytics database, a critical step in making data available for analytical processing.

Data Warehousing and BI Reporting

On top of the data warehouse, the project requires you to create reports and dashboards using IBM Cognos Analytics. This stage allows you to delve into the world of business intelligence (BI), where you:

  • Connect to data sources like DB2, the database storing your data.
  • Craft visualizations like charts and graphs, the heart of any BI reporting, providing a visual representation of data.
  • Build dashboards with multiple integrated visuals, providing a consolidated view of business data.

ETL Pipeline Development with Python

To ensure synchronization of data from the transactional databases, you develop ETL (Extract, Transform, Load) jobs using Python. This phase allows you to:

  • Extract data from sources like MySQL.
  • Transform and cleanse this data using Pandas, a popular Python library for data manipulation and analysis.
  • Load the transformed analytical datasets into targets like PostgreSQL.

Data Pipeline Orchestration with Apache Airflow

The project also involves the modeling and orchestration of ETL workflows using Apache Airflow’s directed acyclic graphs (DAGs). Here, you learn to:

  • Author Airflow DAGs, which define a collection of tasks to be executed and the order in which they run.
  • Monitor pipeline runs, keeping track of data movement.
  • Handle errors and retries, ensuring the smooth functioning of your data pipelines.

Data Analytics Engineering with Apache Spark

The final stage of the project involves performing big data analytics on e-commerce data and building a machine learning model for sales prediction using Apache Spark. This phase allows you to:

  • Understand distributed data processing using PySpark, a Python library for Apache Spark.
  • Develop and deploy machine learning pipelines, a crucial part of modern data analytics.
  • Generate business insights, the ultimate goal of any data project.

Key Takeaways

The capstone project offers several key learning outcomes. You will gain proficiency in data engineering frameworks like ETL, and understand the differences between transactional and analytical databases. You will also develop logical and physical data models and learn how to leverage data warehouses for business intelligence. Additionally, you will learn about orchestrating and monitoring data pipelines, and how to build machine learning models on big data stacks.

These skills, combined with hands-on experience, prepare you for data engineering roles. You gain a thorough understanding of the end-to-end data lifecycle and are equipped to work cross-functionally to deliver data products.

After completing this capstone project, you’re ready to embark on the next phase of your data engineering journey. You can pursue career opportunities in data engineering, undertake more advanced programs, apply for IBM data certifications, and work on personal data projects to build expertise.

In conclusion, the IBM capstone project provides a comprehensive, practical learning experience. It primes you for challenging data-centric roles and instills confidence in you to handle professional data tasks and environments. So, gear up and embark on this rewarding journey into the world of data engineering!

Daniel Chaves

Written by Daniel Chaves

As a Production Engineer, I bring 4 years of experience harnessing data to fuel business growth and provide actionable insights.

Text to speech

Academic Writing Workspace | Apessay.net

Work directly with experts and academics around the world in the area of computer writing. save your time with apessay.net, can you recommend any other popular capstone projects on github for data engineering.

Data pipeline for Lyft trip data (18k+ stars on GitHub): This extensive project builds a data pipeline to ingest, transform, and analyze over 1.5 billion Lyft ride-hailing trips. The ETL pipeline loads raw CSV data from S3 into Redshift, enriches it with additional data from other sources, and stores aggregated metrics in a data warehouse. Visualizations of the cleaned data are then generated using Tableau. Some key aspects of the project include:

Building Lambda functions to load and transform data in batches using Python and AWS Glue ETL jobs Designing Redshift database schemas and tables to optimize for queries Calculating metrics like total rides and revenues by city and over time periods Deploying the ETL pipelines, database, and visualizations on AWS Documenting all steps and components of the data pipeline

This would be an excellent capstone project due to the large scale of real-world data, complex ETL process, and end-to-end deployment on cloud infrastructure. Students could learn a lot about architecting production-grade data pipelines.

Data pipeline for NYC taxi trip data (10k+ stars): Similar to the Lyft project but for NYC taxi data, this project builds a streaming real-time ETL pipeline instead of batch processing. It ingests raw taxi trip data from Kafka topics, enriches it with spatial data using Flink jobs, and loads enriched events into Druid and ClickHouse for real-time analytics. It also includes a dashboard visualizing live statistics. Key aspects include:

Setting up a Kafka cluster to act as the data lake Developing Flink jobs to streamingly join trip data with location data Configuring Druid and ClickHouse databases for real-time queryability Deploying the streaming pipeline on Kubernetes Building a real-time dashboard using Grafana

This project focuses on streaming ETL and real-time analytics capabilities which are highly valuable skills for data engineers. It provides an end-to-end view of architecting streaming data pipelines.

Data pipeline for Wikipedia page view statistics (6k+ stars): This project builds an automated monthly pipeline to gather Wikipedia page view statistics from CSV dumps, process them through Spark jobs, and load preprocessed page view counts into Druid. Some key components:

Downloading and validating raw Wikipedia page view dumps Developing Spark DataFrame jobs to filter, cleanse and aggregate data Configuring Druid clusters and ingesting aggregated page counts Running Spark jobs through Airflow and monitoring executions Integrating Druid with Superset for analytics and visualizations

By utilizing Spark, Druid, Airflow and cloud infrastructure, this project showcases techniques for building scalable batch data pipelines. It also focuses on automating and monitoring the end-to-end workflow.

Each of these representative GitHub projects have received thousands of stars due to their relevance, quality, and educational value for aspiring data engineers. They demonstrate best practices for architecting, implementing and deploying real-world data pipelines on modern data infrastructure. A student undertaking one of these projects as a capstone would have the opportunity to dive deep into essential data engineering skills while gaining exposure to modern cloud technologies and following industry standards. They also provide complete documentation for replicating the systems from start to finish. Projects like these could serve as excellent foundations and inspiration for high-quality data engineering capstone projects.

The three example GitHub projects detailed above showcase important patterns for building data pipelines at scale. They involve ingesting, transforming and analyzing large volumes of real public data using modern data processing frameworks. Key aspects covered include distributed batch and stream processing, automating pipelines, deploying on cloud infrastructure, and setting up databases for analytics and visualization. By modeling a capstone project after one of these highly rated examples, a student would learn valuable skills around architecting end-to-end data workflows following best practices. The projects also demonstrate applying data engineering techniques to solve real problems with public, non-sensitive datasets.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Top.Mail.Ru

Current time by city

For example, New York

Current time by country

For example, Japan

Time difference

For example, London

For example, Dubai

Coordinates

For example, Hong Kong

For example, Delhi

For example, Sydney

Geographic coordinates of Elektrostal, Moscow Oblast, Russia

Coordinates of elektrostal in decimal degrees, coordinates of elektrostal in degrees and decimal minutes, utm coordinates of elektrostal, geographic coordinate systems.

WGS 84 coordinate reference system is the latest revision of the World Geodetic System, which is used in mapping and navigation, including GPS satellite navigation system (the Global Positioning System).

Geographic coordinates (latitude and longitude) define a position on the Earth’s surface. Coordinates are angular units. The canonical form of latitude and longitude representation uses degrees (°), minutes (′), and seconds (″). GPS systems widely use coordinates in degrees and decimal minutes, or in decimal degrees.

Latitude varies from −90° to 90°. The latitude of the Equator is 0°; the latitude of the South Pole is −90°; the latitude of the North Pole is 90°. Positive latitude values correspond to the geographic locations north of the Equator (abbrev. N). Negative latitude values correspond to the geographic locations south of the Equator (abbrev. S).

Longitude is counted from the prime meridian ( IERS Reference Meridian for WGS 84) and varies from −180° to 180°. Positive longitude values correspond to the geographic locations east of the prime meridian (abbrev. E). Negative longitude values correspond to the geographic locations west of the prime meridian (abbrev. W).

UTM or Universal Transverse Mercator coordinate system divides the Earth’s surface into 60 longitudinal zones. The coordinates of a location within each zone are defined as a planar coordinate pair related to the intersection of the equator and the zone’s central meridian, and measured in meters.

Elevation above sea level is a measure of a geographic location’s height. We are using the global digital elevation model GTOPO30 .

Elektrostal , Moscow Oblast, Russia

data engineering capstone project github

The purpose of this project is to demonstrate various skills associated with data engineering projects. In particular, developing ETL pipelines using Airflow, constructing data warehouses through Redshift databases and S3 data storage as well as defining efficient data models e.g. star schema.

Demonstrate proficiency in skills required for an entry-level data engineering role. Design and implement various concepts and components in the data engineering lifecycle such as data repositories. Showcase working knowledge with relational databases, NoSQL data stores, big data engines, data warehouses, and data pipelines.

In addition to the data files, the project workspace includes: etl.py - reads data from S3, processes that data using Spark, and writes processed data as a set of dimensional tables back to S3 etl_functions.py and utility.py - these modules contains the functions for creating fact and dimension tables, data visualizations and cleaning.

IBM Capstone Data Engineering Project Overview. This project explored several data engineering technologies, concepts and skills that I acquired while completing the IBM Data Engineering Professional Certificate. You can find all the screenshots and scripts pertaining to this project on GitHub.

The project I did was part of my Udacity Data Engineering Capstone project. GitHub link has the details. The below walkthrough has snippets. Step 1: Scope the Project and Gather Data.

IBM Data Analyst Capstone Project: Week 3 Exploratory Data Analysis · GitHub. Instantly share code, notes, and snippets.

In this Capstone project you will complete numerous hands-on labs. You will create and query data repositories using relational and NoSQL databases such as MySQL and MongoDB. You'll also design and populate a data warehouse using PostgreSQL and IBM Db2 and write queries to perform Cube and Rollup operations.

You will demonstrate your knowledge of Data Engineering by assuming the role of a Junior Data Engineer who has recently joined an organization and be presented with a real-world use case that requires architecting and implementing a data analytics platform. In this Capstone project you will complete numerous hands-on labs. You will create and ...

the impact on regional demographics. The project follows the follow steps: Step 1: Scope the Project and Gather Data. Step 2: Explore and Assess the Data. Step 3: Define the Data Model. Step 4: Run ETL to Model the Data. Step 5: Complete Project Write Up.

Develop a data model designed for Online Analytical Processing (OLAP) to support queries analyzing US immigration data. In the data model, we complemented the US immigration data with US cities'...

Here are the list of data engineering project ideas (beginner, intermediate, and professionals): Stock and Twitter Data Extraction Using Python, Kafka, and Spark. Use Python to Scrape Real Estate Listings and Make a Dashboard. Use Stack Overflow Data for Analytic Purposes.

The purpose of this project is to demonstrate various skills associated with data engineering projects. In particular, developing ETL pipelines using Airflow, constructing data warehouses through Redshift databases and S3 data storage as well as defining efficient data models e.g. star schema. As an example I will perform a deep dive into US ...

The capstone project is strategically divided into several stages, each focusing on a specific set of data engineering tasks: Transactional Database Setup with MySQL

Data Engineering Capstone Project. This credential earner has demonstrated a foundational knowledge of data engineering. The earner has implemented various concepts in the data engineering lifecycle and gained a working knowledge of Python, Relational Databases, NoSQL Data Stores, Big Data Engines, Data Warehouses, and Data Pipelines.

Data pipeline for Lyft trip data (18k+ stars on GitHub): This extensive project builds a data pipeline to ingest, transform, and analyze over 1.5 billion Lyft ride-hailing trips. The ETL pipeline loads raw CSV data from S3 into Redshift, enriches it with additional data from other sources, and stores aggregated metrics in a data warehouse.

Data Engineering Capstone Project - Udacity Data Engineering Expert Track. In this project, I gathered some datasets to work with, explored this data, assessed and cleaned it, defined and built the best data model to work with, and ran ETL to model the data.

The Kontragent database provides a twenty-four-hour online access to financial information of Russian companies. Our database contains annual financial statements and contact details of companies. By accessing data you also getting an actual data record from EGRUL. Currently our database has detailed information on 12.2 million Russian companies.

Main page; Contents; Current events; Random article; About Wikipedia; Contact us; Donate; Help; Learn to edit; Community portal; Recent changes; Upload file

The purpose of this project is to demonstrate various skills associated with data engineering projects. In particular, developing highly Scalable Data Ingestion Architecture Using Airflow and Spark,constructing cloud data warehouses through Redshift databases and S3 data storage as well as defining efficient star schema data model. - ddgope/data-engineering-capstone

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.A copy of the license is included in the section entitled GNU Free Documentation License.

All the transactional data like inventory and sales are stored in the MySQL database server. SoftCart's webserver is driven entirely by these two databases. Data is periodically extracted from these two databases and put into the staging data warehouse running on PostgreSQL. Production data warehouse is on the cloud instance of IBM DB2 server.

Geographic coordinate systems. WGS 84 coordinate reference system is the latest revision of the World Geodetic System, which is used in mapping and navigation, including GPS satellite navigation system (the Global Positioning System).

Data Engineering Bootcamp Capstone Project. Contribute to jg2012/JGCapstone development by creating an account on GitHub.

IMAGES

  1. Rolex

    rolex yacht master pave dial

  2. Rolex

    rolex yacht master pave dial

  3. 126655-0005 Rolex Yacht-Master 40 Diamond Pave Dial Mens Watch

    rolex yacht master pave dial

  4. Rolex Yacht-Master Diamond Pave Dial Automatic Men's Rubber Watch

    rolex yacht master pave dial

  5. Rolex Yacht-Master Diamond Pave Dial Automatic Men's Rubber Watch

    rolex yacht master pave dial

  6. Rolex 126655 Yacht Master 126655 18K Rose Gold / Pave Dial 2021 UNWORN

    rolex yacht master pave dial

COMMENTS

  1. Rolex Yacht-Master 40 watch: 18 ct Everose gold

    To preserve the beauty of its pink gold watches, Rolex created and patented an exclusive 18 ct pink gold alloy cast in its own foundry: Everose gold. Introduced in 2005, 18 ct Everose is used on all Rolex Oyster models in pink gold. Staying on course. Diamond-Paved Dial. Bidirectional Rotatable Bezel.

  2. Rolex Yacht-Master 40 126655

    NEW 2023 Yacht-Master 40 PAVE DIAMOND DIAL $ 48,900 + $49 for shipping. US. Promoted. Rolex Yacht-Master 40. 126655 Pave Diamond Dial Box Papers Tags 2022 $ 47,892 ... More Information About the Rolex Yacht-Master 40 126655. Basic Info; Brand: Rolex: Model: Yacht-Master 40: Reference number: 126655: Alternative reference numbers: 126655-0002 ...

  3. Rolex Yacht-Master 42 226679TBR

    18K White Gold BAGUETTE PAVE DIAMOND DIAL 42mm 226679 TBR B+P $ 99,993 + $175 for shipping. US. Rolex Yacht-Master 42. 226679TBR $ 105,715 + $114 for shipping. IT. ... More Information About the Rolex Yacht-Master 42 226679TBR. Basic Info; Brand: Rolex: Model: Yacht-Master 42: Reference number: 226679TBR: Movement: Automatic: Case material ...

  4. Rolex Yacht-Master

    Unworn Yacht-Master 42mm Black Dial White Gold Oysterflex 226659 Box & Papers 2024 $ 31,449 + $150 for shipping. US. Promoted. Rolex Yacht-Master 42. ... The Rolex Yacht-Master 40 is also available in the brand's own Rolesium material combination. The mix of stainless steel and platinum is exclusive to the Yacht-Master and gives the watch a ...

  5. 126655-0005 Rolex Yacht-Master 40 Diamond Pave Dial Mens Watch

    The Rolex Yacht-Master is an iconic timepiece that is beloved by luxury watch collectors and enthusiasts. This particular model is crafted from 18k rose gold and features a ceramic bezel and a diamond-paved dial. The date function is a handy addition, while the oysterflex bracelet ensures a comfortable and secure fit.

  6. Rolex Yacht-Master Diamond Pave Dial Men's Watch 126655-0005

    Shop for Yacht-Master Diamond Pave Dial Men's Watch 126655-0005 by Rolex at JOMASHOP, see price in cart. WARRANTY or GUARANTEE availablewith every item. ... Rolex Yacht-Master Diamond Pave Dial Men's Watch 126655-0005 Item No. 126655DSR. Write a review. IN STOCK 5% Off Trade-in Eligible. Condition: New. Retail $47,800.00. $45,600.00.

  7. Rolex Yacht-Master

    Designed for navigators. Sailing occupies a special place in the world of Rolex. In 1958, the brand partnered the New York Yacht Club, creator of the legendary America's Cup. Rolex then formed partnerships with several prestigious yacht clubs around the world and became associated with major nautical events - offshore races and coastal ...

  8. 126655 Pave Diamond Rolex Yacht-Master 40mm Mens Watch

    Rolex Yacht-Master 40mm Mens Watch, Model 126655 Pave Diamond Price: $58,000.00, New and Authentic, Free Shipping ... Dial: Pave diamond set dial. Applied polished Everose gold rimmed hour markers with luminescent fill. Polished 18kt Everose gold hands with luminescent fill.

  9. Rolex Yacht-Master 40mm Pave Dial Watch Ref# 126655

    Brand Rolex Model Yacht-Master Reference # 126655 Case Size and Material 40mm / Rose Gold Bracelet Style and Material Oysterflex w/ Rose Gold Clasp Dial Factory Pave Dial with Dot Hour Markers Bezel Black Ceramic Movement Automatic Winding Movement Power Reserve 70 Hours Water Resistance 100m / 330ft Crysta ... Rolex Yacht-Master 40mm Pave Dial ...

  10. Rolex Yacht-Master 40 Diamond Pave Dial, Rose Gold, 126655

    Begin your next adventure with the rose gold Rolex Yacht-Master 40mm 126655 wristwatch. Featuring a diamond pave dial, Oysterflex bracelet, & rotatable bezel ... Rolex Yacht-Master 40 Diamond Pave Dial, Rose Gold, 126655. SKU. 5949. $50,850.00. Sign up for price alerts. In stock. $49,070 w/ bank wire. Notify me when the price drops. CASE SIZE ...

  11. Rolex Yacht-Master 126655 Rose Gold Pave Diamond Dial

    The Rolex Yacht Master is designed in a 40mm 18k rose gold case with a bidirectional 60-minute graduated bezel with matte black cerachrom insert. It features a pave diamond dial with white luminous hour markers and chromalight display. This beauty is housed on an Oysterflex bracelet intergrated with an Oysterlock safety clasp.

  12. Rolex Yacht-Master 40 watch: Oystersteel and Everose gold

    Discover the Yacht-Master 40 watch in Oystersteel and Everose gold on the Official Rolex Website Model: m126621-0002. ... Intense black dial Exceptional legibility. Like all Rolex Professional watches, the Yacht-Master 40 offers exceptional legibility in all circumstances, and especially in the dark, thanks to its Chromalight display. ...

  13. Rolex Yacht-Master 126655 Rose Gold Pave Diamond Dial

    Model: Yacht-Master 40. Case Material: Rose Gold. Bracelet Material: Black Rubber. Dial: Pave Diamond Set. Box & Papers: Original box, original papers. It features a pave diamond dial with white luminous hour markers and chromalight display. ... Rolex Yacht Master II 116680; Shop Luxury Watches. Best Sellers. Rolex GMT Master II 126710; Rolex ...

  14. ROLEX YACHT-MASTER 40 126655 DIAMOND PAVE DIAL FULL SET aus 2021

    Neu eingetroffen: Die Rolex Yacht-Master 40 Diamond Pave Dial aus 2021Referenz: 126655Instagram: https://www.instagram.com/uhrenfreund24/Webshop: https://www...

  15. Rolex Yacht-Master 116695 SATS

    Rainbow Bezel Diamond Pave Dial Rose Gold - 116695SATS 116695SATS $ 228,813 + $568 for shipping. MC. Rolex Yacht-Master 40. 116695 SATS $ 116,734 + $99 for shipping. AE. ... More Information About the Rolex Yacht-Master 116695 SATS. Basic Info; Brand: Rolex: Model: Yacht-Master 40: Reference number: 116695 SATS: Alternative reference numbers ...

  16. Rolex Yacht-Master 37 18ct Everose Gold

    Rolex Yacht-Master 37 Listing: NZ$33,865 Rolex Yacht-Master 37 18ct Everose Gold - Ref 268655, Reference number 268655; Rose gold; Automatic; Condition Very good; Year 2020; Watch with origi ... 37mm 268655 Everose Gold Oysterflex Black Dial Unworn/Complete/2023+ NZ$ 40,616 + NZ$499 for shipping. US. ... Everose Gold Pave Diamond Edition 37mm ...

  17. Rolex Yacht-Master 40 watch: 18 ct Everose gold

    Discover the Yacht-Master 40 watch in 18 ct Everose gold on the Official Rolex Website. Model:m126655-0002. ... Intense black dial Exceptional legibility. Like all Rolex Professional watches, the Yacht-Master 40 offers exceptional legibility in all circumstances, and especially in the dark, thanks to its Chromalight display. ...

  18. Buy Rolex Yacht-Master 42 White Gold watches, pawnshop Perspectiva

    Buy Rolex Yacht-Master 42 White Gold at the best price. Prices, photos, characteristics. Perspectiva pawnshop, call us: +7 (495) 959-99-99. EN Menu Request a call EN Site search ... Yacht-Master 42 White Gold. Reference: 226659-0002 Add to Compare Add to favorites ...

  19. Rolex Yacht-Master 40 Candy black 116695SATS

    Rolex Yacht-Master 40 Listing: $110,456 Rolex Yacht-Master Candy black 116695SATS, Reference number 116695SATS; Rose gold; Automatic; Condition Good; Watch with original box and ... Pave Dial - New $ 249,526 + $227 for shipping. NL. Over 800,000 satisfied watch buyers worldwide. Chrono24 dealers receive great ratings: 4.8 out of 5 August 19 ...

  20. yacht master dial

    Ben Bridge Jeweler will ship merchandise to United States addresses, United States P.O. Box, US Embassy / Military APO or FPO addresses. We cannot ship international orders at thi

  21. Rolex Yacht-Master 40 watch: Oystersteel and platinum

    Discover the Yacht-Master 40 watch in Oystersteel and platinum on the Official Rolex Website. Model:m126622-0001. ... Slate Dial Exceptional legibility. Like all Rolex Professional watches, the Yacht-Master 40 offers exceptional legibility in all circumstances, and especially in the dark, thanks to its Chromalight display. ...

  22. Moscow mule x white gold day-date 40mm : r/rolex

    38 votes, 13 comments. 203K subscribers in the rolex community. Reddit's go-to source for news and discussion about Rolex and Tudor watches.

  23. Rolex Yacht-Master 37 watch: Oystersteel and platinum

    Discover the Yacht-Master 37 watch in Oystersteel and platinum on the Official Rolex Website. Model:m268622-0002. ... Slate Dial Exceptional legibility. Like all Rolex Professional watches, the Yacht-Master 37 offers exceptional legibility in all circumstances, and especially in the dark, thanks to its Chromalight display. ...

  24. ‭ROLEX BOUTIQUE GUM‬ in Red Square 3 109012 Moscow

    Moscow11:05 pm. Currently closed. Our store in Moscow, Russiais recognized as an Official Rolex Retailer, as we only sell genuine Rolex timepieces. Official Rolex Retailer. ‭ROLEX BOUTIQUEGUM‬. Red Square 3109012 MoscowRussia. +7 495 937 53 73. Get directions. Visit website.