Hands-On Exploratory Data Analysis with Python

Hands-On Exploratory Data Analysis with Python

Perform EDA techniques to understand, summarize, and investigate your data

暂无评价综合评分的显示会考虑用户真实性等多项因素,每部作品出现综合评分的时间不定。

作品简介

Exploratory Data Analysis (EDA) is an approach to data analysis that involves the application of diverse techniques to gain insights into a dataset. This book will help you gain practical knowledge of the main pillars of EDA - data cleaning; data preparation; data exploration; and data visualization.

You'll start by performing EDA using open source datasets and perform simple to advanced analyses to turn data into meaningful insights. You'll then learn various descriptive statistical techniques to describe the basic characteristics of data and progress to performing EDA on time-series data. As you advance; you'll learn how to implement EDA techniques for model development and evaluation and build predictive models to visualize results. Using Python for data analysis; you'll work with real-world datasets; understand data; summarize its characteristics; and visualize it for business intelligence.

By the end of this EDA book; you'll have developed the skills required to carry out a preliminary investigation on any dataset; yield insights into data; present your results with visual aids; and build a model that correctly predicts future outcomes.

Suresh Kumar Mukhiya is a PhD candidate, currently affiliated to the Western Norway University of Applied Sciences (HVL). He is a big data enthusiast, specializing in Information Systems, Model-Driven Software Engineering, Big Data Analysis, Artificial Intelligence and Frontend development. He has completed a Masters in Information Systems from the Norwegian University of Science and Technology (NTNU, Norway) along with a thesis in processing mining. He also holds a bachelor's degree in computer science and information technology (BSc.CSIT) from Tribhuvan University, Nepal, where he was decorated with the Vice-Chancellor's Award for obtaining the highest score. He is a passionate photographer and a resilient traveler.

Usman Ahmed is a data scientist and Ph.D. candidate at Western Norway University of Applied Science (HVL). He has rich experience in building and scaling high-performance systems based on data mining, natural language processing, and machine learning. Usman's research interests are sequential data mining, heterogeneous computing, natural language processing, a recommendation system, and machine learning. He has completed a Master's of Science in computer science from Capital University of Science and Technology, Islamabad, Pakistan. Usman Ahmed was awarded Gold Medal in Bachelor of Computer Science from Heavy Industries Taxila Education City.

作品目录

  1. Copyright and Credits
  2. About Packt
  3. Contributors
  4. About the authors
  5. About the reviewer
  6. Packt is searching for authors like you
  7. Preface
  8. Who this book is for
  9. What this book covers
  10. To get the most out of this book
  11. Get in touch
  12. Section 1: The Fundamentals of EDA
  13. Exploratory Data Analysis Fundamentals
  14. Understanding data science
  15. The significance of EDA
  16. Steps in EDA
  17. Making sense of data
  18. Comparing EDA with classical and Bayesian analysis
  19. Software tools available for EDA
  20. Getting started with EDA
  21. Summary
  22. Further reading
  23. Visual Aids for EDA
  24. Technical requirements
  25. Line chart
  26. Bar charts
  27. Scatter plot
  28. Area plot and stacked plot
  29. Pie chart
  30. Table chart
  31. Polar chart
  32. Histogram
  33. Lollipop chart
  34. Choosing the best chart
  35. Other libraries to explore
  36. Summary
  37. Further reading
  38. EDA with Personal Email
  39. Technical requirements
  40. Loading the dataset
  41. Data transformation
  42. Data analysis
  43. Summary
  44. Further reading
  45. Data Transformation
  46. Technical requirements
  47. Background
  48. Merging database-style dataframes
  49. Transformation techniques
  50. Benefits of data transformation
  51. Summary
  52. Further reading
  53. Section 2: Descriptive Statistics
  54. Descriptive Statistics
  55. Technical requirements
  56. Understanding statistics
  57. Measures of central tendency
  58. Measures of dispersion
  59. Summary
  60. Further reading
  61. Grouping Datasets
  62. Technical requirements
  63. Understanding groupby()
  64. Groupby mechanics
  65. Data aggregation
  66. Pivot tables and cross-tabulations
  67. Summary
  68. Further reading
  69. Correlation
  70. Technical requirements
  71. Introducing correlation
  72. Types of analysis
  73. Discussing multivariate analysis using the Titanic dataset
  74. Outlining Simpson's paradox
  75. Correlation does not imply causation
  76. Summary
  77. Further reading
  78. Time Series Analysis
  79. Technical requirements
  80. Understanding the time series dataset
  81. TSA with Open Power System Data
  82. Summary
  83. Further reading
  84. Section 3: Model Development and Evaluation
  85. Hypothesis Testing and Regression
  86. Technical requirements
  87. Hypothesis testing
  88. p-hacking
  89. Understanding regression
  90. Model development and evaluation
  91. Summary
  92. Further reading
  93. Model Development and Evaluation
  94. Technical requirements
  95. Types of machine learning
  96. Understanding supervised learning
  97. Understanding unsupervised learning
  98. Understanding reinforcement learning
  99. Unified machine learning workflow
  100. Summary
  101. Further reading
  102. EDA on Wine Quality Data Analysis
  103. Technical requirements
  104. Disclosing the wine quality dataset
  105. Analyzing red wine
  106. Analyzing white wine
  107. Model development and evaluation
  108. Summary
  109. Further reading
  110. Appendix
  111. String manipulation
  112. Using pandas vectorized string functions
  113. Using regular expressions
  114. Further reading
  115. Other Books You May Enjoy
  116. Leave a review - let other readers know what you think