Data Science and Big Data

Data Science and Big Data

MarineTech

17:30:00

(0 reviews)
Avatar
Sea and Beyond
Buy Course
Free Lessons

The Data Science and Machine Learning course, prepared by Mr. Venkat Krishna in conjunction with Sea and Beyond is designed to empower maritime professionals—including mariners, engineers, and analysts—to make informed, auditable decisions using data. The curriculum bridges the gap between traditional maritime expertise and modern data science by moving from foundational statistics to advanced AI-ML applications.

A unique and defining feature of this course is its focus on real-world use cases from the maritime environment. By utilizing raw data from main engine parameters (as highlighted in the query), the program allows those from a maritime background to relate directly to the data modelling process. This practical approach is woven throughout the modules, where students apply algorithms like Random Forest regression and classification specifically to marine data, comparing these results with traditional models to evaluate predictions and inferences.

The course is structured into four primary phases:

  • Foundations and Statistical Analysis: The journey begins with the data lifecycle, covering collection from IoT devices, cleaning, and Exploratory Data Analysis (EDA). Participants master essential statistical tools such as hypothesis testing, p-values, Z-scores, and probability distributions (PDF and CDF) to infer population characteristics from sample data.
  • Core Machine Learning Theory: Learners are introduced to Supervised and Unsupervised learning. The curriculum covers linear and multiple regression, feature engineering, and metrics like RMSE and R² to ensure model reliability. Concepts like degrees of freedom and the Central Limit Theorem (CLT) are emphasized to help students avoid overfitting and ensure their models generalize well to new data.
  • Maritime-Specific Applications: The course dives deep into specialized techniques applied to maritime datasets, including Clustering methods, What-if analysis, and Anomaly detection using Isolation Forest. These tools are essential for operational tasks like identifying sensor faults or predicting maintenance needs. Additionally, Time series analysis and moving averages are applied to marine data to enhance predictive accuracy.
  • Low-Code Tools and Deployment: To make data science accessible, the course introduces the KNIME low-code platform, allowing users to apply complex regression and classification models to marine data without intensive coding. Finally, participants learn to build and deploy Streamlit applications and use GitHub to manage their work, transforming their models into practical, day-to-day tools for the maritime industry.

This structured roadmap ensures that maritime professionals can transition from understanding "the 3 Vs" of Big Data (Volume, Velocity, Variety) to deploying sophisticated AI agents and RAG systems in a way that is directly relevant to their field.


Avatar

Sea and Beyond

At Sea and Beyond we strive for authenticity and honesty in our work. Our mission is to help “You take a well informed decision” and we try and support you through our various services like 

  • Mentoring
  • Skill enhancement/Education
  • Job Opportunities and
  • Your branding (CV and LinkedIn preparation)

We have a team of writers who conduct thorough research to ensure the accuracy of the content and for the clarity in communication. We will be happy to support you wherever required

Rating:

(0 Reviews)


Price:

15000


Language:

English


Length:

17:30:00


Intended for:


Reference:


Topics:
4 Sections . 51 lectures . 17h 30m total length

7 min

1.1 Introduction to Data Science & Big Data

1. Course Vision

  • Purpose: Empower marine professionals to make informed, auditable decisions using data.
  • Audience: Mariners, engineers, analysts, and curious minds at the intersection of ocean and code.

Foundations

  • Data Science: Extracts insights using statistics, machine learning, and programming.
  • Big Data: Characterized by the 3 Vs — Volume, Velocity, Variety.

Free

11 min

1.2 Data Analysis vs Analytics

2. Core Concepts

  • Data: Recorded measurements.
  • Variables: Characteristics that vary across entities.
  • Sources: Humans, machines, and their combinations.
  • Forms: Structured vs. unstructured data.

Analytics vs. Analysis

  • Qualitative:
    Analysis (past) – Storytelling
    Analytics (future) – Intuition-based analysis
  • Quantitative:
    Analysis (past) – Data insights
    Analytics (future) -Formulas, algorithms

Locked

13 min

1.3 Types of Variables

3. Types of Variables

  • Categorical – Nominal, Ordinal
  • Numerical – Discrete, Continuous (Interval & Ratio)

With examples for each of the types of variables

Locked

9 min

1.4 Distributions

4. Distributions

  • Normal: Bell curve, symmetric about the mean.
  • Uniform: Equal probability (e.g., dice rolls).
  • Binomial: Two outcomes (e.g., coin tosses).

Locked

17 min

1.5 Population & Samples

5. Population vs. Sample

  • Sampling Types:
    o Probability: Random, stratified, clustered
    o Non-probability: Convenience, judgmental, snowball
  • Statistics:
    o Descriptive: Charts, mean, median, mode
    o Inferential: Hypothesis testing, confidence intervals

With demonstration using a web application.
(link provided for additional practice)

Locked

12 min

1.6 Measures of central tendency

6. Measures of central tendency

Dispersion

  • Standard Deviation: Spread around the mean.
  • Variance: Squared deviation from the mean.

Locked

12 min

1.7 Correlation

7. Correlation

  • Correlation: Measures relationship strength (−1 to +1).

Demonstration how to calculate Correlations in excel.

Locked

14 min

1.8 Causality

8. Causality

  • Causality: Requires deeper investigation — correlation ≠ causation.

Demonstration using web application explaining causality.
(link provided for additional practice)

Locked

13 min

1.9 Central Limit Theorem

9. Central Limit Theorem (CLT)

  • Insight: Sample means approximate a normal distribution for large samples (n ≥ 30), regardless of population distribution.
  • Assumptions: Random sampling, independence, finite variance

Demonstration using a web application explaining CLT application.
(link provided for additional practice)
 

Additional note explaining difference between Std. deviation & Std. error.

Locked

20 min

1.10 Measures of central tendency demo

10. Measures of Central tendency demonstration

  • Calculations: Mean, median, mode, std. deviation, variance, var. coeff

Demonstration in a web application explaining skew, kurtosis etc.,
(link provided for additional practice).

Locked

5 min

2.1 Introduction

1. Data Lifecycle

  • Collection & Storage: From databases, APIs, IoT devices.
  • Cleaning & Preprocessing: Handling missing values, outliers, scaling issues.
  • Exploratory Data Analysis (EDA): Visual summaries and trend identification.
  • Machine Learning & AI: Predictive modeling for decision-making.

Free

30 min

2.2 Hypothesis tests

2. Hypothesis Testing

  • Null Hypothesis (H₀): Assumes no effect or change.
  • Alternative Hypothesis (H₁): Suggests a significant effect or difference.
  • Purpose: To infer population characteristics from sample data using statistical tests.

Locked

21 min

2.3 Z-scores, p-values | Part 1

3 Statistical Tools & Concepts

  • Z-Scores: Standardize data for comparison across datasets.
  • p-Values: Measure evidence against H₀.

Locked

21 min

2.4 Z-scores, p-values | Part 2

4. Statistical Tools & Concepts

  • Z-Scores: Standardize data for comparison across datasets.
  • p-Values: Measure evidence against H₀.

Locked

30 min

2.5 PDF, CDF, use cases

5. Density functions

      PDF (Probability Density Function):

  • Describes the likelihood of a continuous random variable taking on a specific value.
  • The area under the curve represents probability.

     CDF (Cumulative Distribution Function):

  • Represents the probability that a variable takes a value less than or equal to a given point.
  • Useful for calculating p-values and tail probabilities.

Locked

18 min

2.6 Z-test, T-test, Degrees of freedom

6.  Z-Test vs T-Test

  • Use Z-Test when population standard deviation is known and sample size ≥ 30.
  • Use T-Test when population standard deviation is unknown.

Locked

26 min

2.7 One tailed test

 7.  One tailed & two tailed tests:

  • Two-Tailed Test: Checks for any difference from hypothesized mean.
  • Left-Tailed Test: Tests if sample mean is significantly lower.
  • Right-Tailed Test: Tests if sample mean is significantly higher.

Locked

17 min

2.8 Errors in statistical testing

8. Errors in Testing

  • Type I Error (False Positive): Rejecting a true H₀.
  • Type II Error (False Negative): Failing to reject a false H₀.
  • Trade-off: Reducing one increases the other unless sample size increases.

Locked

25 min

2.9 Outliers & detection

 9. Data Issues

  • Common Problems: Missing values, measurement errors, sensor faults, manual manipulation.
  • Outliers: Identified using Interquartile Range (IQR).

Locked

22 min

2.10 Types of plots

10. Visualization Techniques

  • Plot Types: Scatter, Line, Bar, Histogram, Pie, Box plots.
  • Tools: Excel, Python (Matplotlib, Seaborn).

Locked

13 min

3.1 Introduction to Machine Learning

Module 1: Introduction to Machine Learning

  • ML is a subset of AI that learns patterns from data to make predictions/decisions.
  • Relies on data, algorithms, and iterative improvement.
  • Applications span industries: predictive maintenance, customer segmentation, fraud detection, NLP, computer vision.

Free

20 min

3.2 Independent & Dependent Variables

Module 2: Independent & dependent variables

  • Independent variables (predictors) vs Dependent variables (outcomes).
  • Regression models relationships between them.
  • Purpose: infer/predict one variable from others, understand correlation impact.

Locked

9 min

3.3 Dummy Variables

Module 3: Dummy variables

  • Used when predictors are categorical (e.g., weather: good/bad).
  • Encoded into binary variables for regression models.

Locked

29 min

3.4 Linear Regression

Module 4: Linear regression, Multi linear regression

  • Supervised ML algorithm assuming linear relationship between input & output.
  • Types:
    • Simple Linear Regression: one predictor.
    • Multiple Linear Regression: multiple predictors.
  • Assumptions: linearity, independence, homoscedasticity, normality of errors, no multicollinearity, no autocorrelation, additivity.
  • Real-world use cases: stock market, real estate, medical risk, sales forecasting.

Locked

22 min

3.5 Metrics, Multi Linear Regression

Module 5: Metrics, Multi linear regression demo

  • Limitations: assumes linearity, sensitive to outliers.
  • Metrics: calculations

Locked

22 min

3.6 Understanding the Metrics

Module 6: Understanding the Metrics

  • MSE, MAE, RMSE for error measurement.
  • R2 & Adjusted R2 for explanatory power.

Evaluation of metrics for comparing different models for selecting a better
model for predictions and analytics.

Locked

20 min

3.7 Splitting Data for ML

Module 7: Splitting data for ML
Splitting data to training and test data

  • Training data: builds the model, larger dataset.
  • Test data: evaluates performance, unseen dataset to check generalization.
  • Prevents overfitting and validates accuracy.

Locked

28 min

3.8 Feature engineering and scaling

Module 8: Feature engineering and feature scaling

  • Feature engineering: cleansing, transformation, extraction, selection, Iteration, imputing missing values.
  • Feature scaling: normalization (0–1) or standardization (mean=0, SD=1).
  • Ensures equal contribution of features, avoids bias in distance-based algorithms.

Locked

22 min

3.9 Linear Regression applying features

Module 9: Linear regression applying features

  • Linear regression applying splitting data & feature scaling
  • Understanding the hierarchy between splitting & scaling data.

Able to evaluate the model using test data after constructing a model for
predictions.

Locked

31 min

3.10 Regression vs Classification

Module 10: Regression vs Classification

  • Other Regression & Classification
  • Other regressions: polynomial, ridge, lasso, SVR, decision tree, random forest, logistic, gradient boosting, XGB.
  • Classification vs Regression:
  • Classification → discrete outcomes (yes/no, categories).
  • Regression → continuous outcomes (price, temperature).
  • Algorithms: decision tree, random forest, KNN, logistic regression, SVM, Naïve Bayes.

Locked

26 min

3.11 ANOVA

Module 11: ANOVA (One way ANOVA)

  • Analysis of Variance: compares means across 3+ groups.
  • Tests null hypothesis (all means equal).
  • Assumptions: normality, independence, equal variance, no overlap.
  • Examines variability within vs among groups.

Locked

10 min

4.1 Introduction to AI ML applications

Module 1: Introduction to AI ML applications

  • Module overview, areas of deployment along with course introduction
    and learning curve.

Free

22 min

4.2 Supervised & Unsupervised learning

Module 2: Supervised & Unsupervised learning

  • Understanding differences between them and application demo in python
  • Differentiating between regression, classification, clustering methods & its application areas.
  • Purpose: How and where to apply, choosing variables based on the method.

Locked

17 min

4.3 Random forest Intuition

Module 3: Random Forest intuition

  • Fundamentals of decision trees, random forest
  • Applying in regression & classification
  • Importance: brief knowledge of how decision trees work and application for Regression in ML.

Locked

21 min

4.4 Logistics Regression Intuition

Module 4: Logistic regression intuition

  • Fundamentals of logistic regression
  • Types of logistic regression, difference between linear regression & logistic regression
  • Sigmoid function, accuracy metrics, ROC curves.
  • Importance: applications for Classification in ML

Locked

25 min

4.5 Random Forest regression Use case

Module 5: Random Forest regression Use case

  • Applying Random Forest regression to a marine data
  • Comparing the results with multi linear regression applied to same data
  • Importance: Model evaluation, predictions and inferences.

Locked

18 min

4.6 Random Forest classification Use case

Module 6: Random Forest classification Use case

  • Applying Random Forest classification to marine data
  • Comparing results with Logistic regression applied to same data
  • Importance: Model evaluation, predictions and inferences.

Locked

26 min

4.7 Clustering Methods

Module 7: Clustering methods

  • Fundamentals of clustering, uses.
  • Applying clustering methods to marine data
  • Importance: Evaluation, inferences

Locked

19 min

4.8 Whatif analysis using AI ML

Module 8: Whatif analysis using AI ML

  • Significance of Whatif analysis
  • Application to marine data
  • Importance: Predictions, inferences

Locked

25 min

4.9 Anomaly detection using AI ML

Module 9: Anomaly detection using AI ML

  • Anomaly detection methods
  • Significance of anomaly detection
  • Fundamentals of Isolation Forest for anomaly detection
  • Applying isolation forest on marine data
  • Importance: significance of anomaly detection, inferences

Locked

31 min

4.10 Times Series & Moving Averages

Module 10: Time Series & Moving Averages

  • Fundamentals of time series and moving averages
  • Application to marine data
  • Importance: significance & inferences

Locked

28 min

4.11 The Rise of AI (GenAI, AI agents, RAG, MCP servers, Agentic AI)

Module 11: The Rise of AI

  • GenAI, AI agents, Agentic AI, RAG systems, MCP servers
  • Application areas and its importance.

Locked

23 min

4.12 AI agents Use Case

Module 12: AI agents Use case

  • AI agents’ fundamentals and applications
  • Using AI agents’ demo

Locked

33 min

4.13 RAG Systems

Module 13: RAG Systems

  • RAG systems fundamentals and applications
  • Using RAG systems demo

Locked

23 min

4.14 mcp servers and Use case

Module 14: mcp servers and Use case

  • Mcp servers’ fundamentals and applications
  • Using mcp servers’ demo

Locked

25 min

4.15 KNIME low code platform

Module 15: KNIME low code platform

  • KNIME fundamentals, applications
  • Application on marine data

Locked

21 min

4.16 Regression models in KNIME

Module 16: Regression models in KNIME

  • Applying regression in KNIME on marine data
  • Comparing Random Forest regression with multi linear regression in KNIME on same dataset. Model evaluation and comparison in KNIME.
  • Importance & ease of low code no code platforms, predictions, inferences.

Locked

26 min

4.17 Classification models in KNIME

Module 17: Classification models in KNIME

  • Applying classification in KNIME on marine data
  • Comparing Random Forest classifier, decision tree classifier, naïve bayes classifier, logistic regression in KNIME on same dataset. Model evaluation & comparison in KNIME.
  • Importance & ease of low code no code platforms, predictions, inferences.

Locked

25 min

4.18 Optimisation in modelling

Module 18: Optimization in modelling

  • Optimization fundamentals, necessity to optimize.
  • Comparing various models to evaluate metrics and optimize best model.
  • Importance of optimization, applying on marine data.

Locked

19 min

4.19 Streamlit applications and Use case

Module 19: Streamlit applications and Use case

  • Significance of Streamlit application
  • Creating an application using marine data
  • Deploy it locally in your system using Streamlit
  • Importance of Streamlit application for making simple & effective day to day applications for simplifying tasks.

Locked

28 min

4.20 VS code, Github

Module 20: VS code, GitHub

  • Significance of VS code interpreter & GitHub
  • Saving your work in VS code, pushing it to GitHub repository to your account.
  • Deploying it in Streamlit community, pulling saved data from GitHub to deploy a web application using marine data.

Locked

Successful completion of this course will earn you a certificate of completion from Sea and Beyond. This certificate will be emailed to you and you could also share and show it on LinkedIn Please click on the button below to purchase the certificate.

Upgrade to certifcation for free

Contact Us