Datasets for Exploratory Data Analysis for Beginners

1736876876.png

Written by Aayush Saini · 3 minute read · Jun 14, 2020 . Datasets, 6

If you are a machine learning beginner and looking to finally get started using Python, In this Post you see some top Datasets for beginners level.

  1. 911 Calls Capstone Project
  2. USA House Price Prediction
  3. Iris Flower Dataset
  4. Haberman Dataset
  5. MNIST Dataset
  6. Titanic Dataset

 

 

 

911 Calls Capstone Project

The data contains the following fields:

  • lat : String variable, Latitude
  • lng: String variable, Longitude
  • desc: String variable, Description of the Emergency Call
  • zip: String variable, Zipcode
  • title: String variable, Title
  • timeStamp: String variable, YYYY-MM-DD HH:MM:SS
  • twp: String variable, Township
  • addr: String variable, Address
  • e: String variable, Dummy variable (always 1)

Get Datasets


USA House Price Prediction

The data contains the following columns:

  • 'Avg. Area Income': Avg. Income of residents of the city house is located in.
  • 'Avg. Area House Age': Avg Age of Houses in same city
  • 'Avg. Area Number of Rooms': Avg Number of Rooms for Houses in same city
  • 'Avg. Area Number of Bedrooms': Avg Number of Bedrooms for Houses in same city
  • 'Area Population': Population of city house is located in
  • 'Price': Price that the house sold at
  • 'Address': Address for the house

Get Datasets


Iris Flower Dataset

English: The scatterplot of Iris flower data set, collected by Edgar Anderson and popularized in the Machine learning community by Ronald Fisher.

The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician, eugenicist, and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.[1] It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species.[2] Two of the three species were collected in the Gaspé Peninsula "all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus".[3] Fisher's paper was published in the journal, the Annals of Eugenics, creating controversy about the continued use of the Iris dataset for teaching statistical techniques today. – From Wikipedia

Toy Dataset: Iris Dataset:

  • A simple dataset to learn the basics.
  • 3 flowers of Iris species. [see images on wikipedia link above]
  • 1936 by Ronald Fisher.
  • Petal and Sepal: View Image

Get Datasets


Haberman Dataset

The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer.

Get Dataset


MNIST Dataset

MNIST ("Modified National Institute of Standards and Technology") is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

Get Dataset


Titanic Dataset

The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

Get Datasets


Thanks for Reading

 

 

Share   Share