The more data you have, the more reliable information you can derive. This information can be used to anticipate asset or user/customer behavior and gain a competitive advantage.
Segment 1 - What is Python, Jupyter Notebook and Anaconda
In this module the participant will become familiar with the Python environment
- What is Jupyter Notebook
- What are the basics of programming with Python :
- What are the variables in data science
- What data processing can be done: Indexing, extraction, replacement, modification, addition, conversion, cleaning, membership test, sorting, structures (sets, dictionaries, etc.), mathematical operators, comparison operators, logical operators, etc.
- How to use the debugger
- What are the rules for using if, if-else and if-elif conditions for flow control
- When to use while and for loops
- How to create your own functions (def syntax, inputs or parameters, function body and outputs: return)
- When and how to use lambda functions
- How to load or install libraries and modules or python packages
Segment 2: What libraries are essential for data science
In this module the participant will become familiar with the different libraries specific to data science, and their respective use.
ETL (extract/transform/load) process
How to conduct a process that will extract, transform and load data, from a raw data source, for business needs.
- Taking advantage of the Pandas library (Panel Data or Python Data Analysis):
- How to extract data from various sources (Excel, CSV, HTML, JSON, etc.) and manipulate it (clean, filter and transform)
- How to identify, remove and replace missing data
- How to deal with duplicate data
- How to manage data aggregations (groupby)
- Using the Numpy library to create or generate data (simulations). Introduction to Monte-Carlo simulation.
Modeling
How to model data to conceptualize the relationships between different types of information, with the Pandas library:
- How to combine data tables (add and merge tables)
- How to transform data and how to create data tables
How to optimize and forecast data with the Statsmodels.api and Scpipy.stats libraries.
Visualization
Once the data has been extracted and modeled, it remains to see how to visualize them in a graphic form (diagram, graph, map, animation...), more easily interpretable and exploitable.
How to use the Matplotlib and Seaborn libraries to :
- Visualize, combine and customize data: line graphs, scatterplots, boxplots, heat maps, etc.
- Save one or more graphs (pdf, jpeg, etc.)