
Hands-On Exploratory Data Analysis with Python
Perform EDA techniques to understand, summarize, and investigate your data
作品简介
Exploratory Data Analysis (EDA) is an approach to data analysis that involves the application of diverse techniques to gain insights into a dataset. This book will help you gain practical knowledge of the main pillars of EDA - data cleaning; data preparation; data exploration; and data visualization.
You'll start by performing EDA using open source datasets and perform simple to advanced analyses to turn data into meaningful insights. You'll then learn various descriptive statistical techniques to describe the basic characteristics of data and progress to performing EDA on time-series data. As you advance; you'll learn how to implement EDA techniques for model development and evaluation and build predictive models to visualize results. Using Python for data analysis; you'll work with real-world datasets; understand data; summarize its characteristics; and visualize it for business intelligence.
By the end of this EDA book; you'll have developed the skills required to carry out a preliminary investigation on any dataset; yield insights into data; present your results with visual aids; and build a model that correctly predicts future outcomes.
Suresh Kumar Mukhiya is a PhD candidate, currently affiliated to the Western Norway University of Applied Sciences (HVL). He is a big data enthusiast, specializing in Information Systems, Model-Driven Software Engineering, Big Data Analysis, Artificial Intelligence and Frontend development. He has completed a Masters in Information Systems from the Norwegian University of Science and Technology (NTNU, Norway) along with a thesis in processing mining. He also holds a bachelor's degree in computer science and information technology (BSc.CSIT) from Tribhuvan University, Nepal, where he was decorated with the Vice-Chancellor's Award for obtaining the highest score. He is a passionate photographer and a resilient traveler.
Usman Ahmed is a data scientist and Ph.D. candidate at Western Norway University of Applied Science (HVL). He has rich experience in building and scaling high-performance systems based on data mining, natural language processing, and machine learning. Usman's research interests are sequential data mining, heterogeneous computing, natural language processing, a recommendation system, and machine learning. He has completed a Master's of Science in computer science from Capital University of Science and Technology, Islamabad, Pakistan. Usman Ahmed was awarded Gold Medal in Bachelor of Computer Science from Heavy Industries Taxila Education City.
作品目录
Copyright and Credits
About Packt
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Section 1: The Fundamentals of EDA
Exploratory Data Analysis Fundamentals
Understanding data science
The significance of EDA
Steps in EDA
Making sense of data
Comparing EDA with classical and Bayesian analysis
Software tools available for EDA
Getting started with EDA
Summary
Further reading
Visual Aids for EDA
Technical requirements
Line chart
Bar charts
Scatter plot
Area plot and stacked plot
Pie chart
Table chart
Polar chart
Histogram
Lollipop chart
Choosing the best chart
Other libraries to explore
Summary
Further reading
EDA with Personal Email
Technical requirements
Loading the dataset
Data transformation
Data analysis
Summary
Further reading
Data Transformation
Technical requirements
Background
Merging database-style dataframes
Transformation techniques
Benefits of data transformation
Summary
Further reading
Section 2: Descriptive Statistics
Descriptive Statistics
Technical requirements
Understanding statistics
Measures of central tendency
Measures of dispersion
Summary
Further reading
Grouping Datasets
Technical requirements
Understanding groupby()
Groupby mechanics
Data aggregation
Pivot tables and cross-tabulations
Summary
Further reading
Correlation
Technical requirements
Introducing correlation
Types of analysis
Discussing multivariate analysis using the Titanic dataset
Outlining Simpson's paradox
Correlation does not imply causation
Summary
Further reading
Time Series Analysis
Technical requirements
Understanding the time series dataset
TSA with Open Power System Data
Summary
Further reading
Section 3: Model Development and Evaluation
Hypothesis Testing and Regression
Technical requirements
Hypothesis testing
p-hacking
Understanding regression
Model development and evaluation
Summary
Further reading
Model Development and Evaluation
Technical requirements
Types of machine learning
Understanding supervised learning
Understanding unsupervised learning
Understanding reinforcement learning
Unified machine learning workflow
Summary
Further reading
EDA on Wine Quality Data Analysis
Technical requirements
Disclosing the wine quality dataset
Analyzing red wine
Analyzing white wine
Model development and evaluation
Summary
Further reading
Appendix
String manipulation
Using pandas vectorized string functions
Using regular expressions
Further reading
Other Books You May Enjoy
Leave a review - let other readers know what you think