🔥
Exploratory Data Analysis (EDA)
WebsiteGithub
  • 👋Welcome!
  • Course Content
    • 1. Introduction
      • EDA: Uncovering Insights and Patterns
      • Why EDA?
      • Importance of EDA
      • The role of EDA in the data analysis process
      • A Comprehensive Examination
      • Code & Practice
      • Basic Concept
    • 2. Fundamentals
      • Lifecycle
        • Data Science
        • EDA
    • 3. Dataset Selection and Understanding
      • Kaggle
      • Github
    • 4. Data Cleaning and Preprocessing
    • 5. Techniques and Approaches
      • Types of EDA
    • 6. Data Visualization
    • 7. Statistical Measures and Hypothesis Testing
    • 9. Case Studies
    • 11. Best Practices and Tips for Effective EDA
    • 12. Future Trends and Emerging Technologies
  • Dataset
    • ℹ️Kaggle
  • Tools and Software
    • ✨Data Analysis Tools
    • 🐍Python Library
      • 🐼Pandas
      • 🧊Numpy
      • 📊Matplotlib
      • 📈Seaborn
      • 📶Plotly
      • 🤹SciPy
      • 💫Statsmodels
      • 👂Scikit-learn
      • 🗳️Yellowbrick
    • ⛏️Python tools
    • ®️® ® ® The R Project
    • 🌀Data Exploration
    • 🎯Data Quality
    • 📔Data Profiling
    • 📺Visualization
  • Tech Exploration
    • 🎬Youtube
    • ☁️Github
    • 🔬Lab
    • 💼Case Study
  • Reference
    • API Reference
      • Pets
      • Users
      • Quick Start
Powered by GitBook
On this page
  • Cleanlab
  • Great Expectations
  • VisiData
  1. Tools and Software

Data Quality

PreviousData ExplorationNextData Profiling

Last updated 1 year ago

Now that you’ve learned more about your data and cleaned it up, it’s time to ensure the quality of your data is up to par. With these data exploration tools, you can determine if your data is accurate, consistent, and reliable. High-quality data is essential for making informed decisions, as well as for the effective operation of systems and processes that rely on it. Maintaining high-quality data is critical for organizations in order to avoid negative impacts on decision-making and business operations.

Cleanlab

Cleanlab is focused on data-centric AI (DCAI), providing algorithms/interfaces to help companies (across all industries) improve the quality of their datasets and diagnose/fix various issues in them. This tool automatically detects problems in an ML dataset. This data-centric AI package facilitates machine learning with messy, real-world data by providing clean labels for robust training and flagging errors in your data.

Great Expectations

VisiData

VisiData is a free, open-source tool that lets you quickly open, explore, summarize, and analyze datasets in your computer’s terminal. VisiData works with CSV files, Excel spreadsheets, SQL databases, and many other data sources. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility that can handle millions of rows with ease.

Cleanlab’s Chief Scientist & Co-Founder, Jonas Mueller, will present more about the tool at ODSC East coming this May, in a session called “.”

Great Expectations (GX) helps data teams build a shared understanding of their data through quality testing, documentation, and profiling. With , data teams can express what they “expect” from their data using simple assertions. Great Expectations provides support for different data backends such as flat file formats, SQL databases, Pandas dataframes and Sparks, and comes with built-in notification and data documentation functionality.

Sam Bail, technical lead at Superconductive (the core maintainers behind Great Expectations), delivered a talk about building a robust data pipeline during ODSC East 2021. .

🎯
Improving ML Datasets with Cleanlab, a Standard Framework for Data-Centric AI
Great Expectations
You can watch it on demand here
CleanlabCleanlab
Logo
GitHub - cleanlab/cleanlab: The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.GitHub
Welcome | Great Expectations
GitHub - great-expectations/great_expectations: Always know what to expect from your data.GitHub
Open-source data multitool | VisiDataVisiData
Logo
GitHub - saulpw/visidata: A terminal spreadsheet multitool for discovering and arranging dataGitHub
Logo
Logo
Logo
Visitors
Logo