Sea and Beyond

Shore Opportunities in the Por.. Register Now

back to course page

Data Science and Big Data

MarineTech

17:30:00

(0 reviews)

Buy Course

Free Lessons

The Data Science and Big Data course for maritime professionals, prepared by Mr. Venkat Krishna in conjunction with Sea and Beyond, is designed to empower mariners, marine engineers, shipping professionals, and analysts to make informed, auditable decisions using data. The curriculum bridges the gap between traditional maritime expertise and modern technologies by progressing from foundational statistics to advanced AI and machine learning applications relevant to the shipping industry.

A unique feature of this maritime data analytics course is its focus on real-world use cases from the maritime environment. By utilizing raw data from main engine parameters, the program allows maritime professionals to directly relate to the data modelling process. This practical approach is integrated throughout the modules, where students apply advanced algorithms such as Random Forest regression and classification specifically to marine datasets, helping them evaluate predictions and improve operational decision-making.

The course is structured into four primary phases:

Foundations and Statistical Analysis

The journey begins with the data lifecycle, covering collection from IoT devices, cleaning, and Exploratory Data Analysis (EDA). Participants master essential statistical tools such as hypothesis testing, p-values, Z-scores, and probability distributions (PDF and CDF) to infer population characteristics from sample data.

Core Machine Learning Theory

Learners are introduced to Supervised and Unsupervised learning. The curriculum covers linear and multiple regression, feature engineering, and metrics like RMSE and R² to ensure model reliability. Concepts like degrees of freedom and the Central Limit Theorem (CLT) are emphasized to help students avoid overfitting and ensure their models generalize well to new data.

Maritime-Specific Applications

The course dives deep into specialized techniques applied to maritime datasets, including Clustering methods, What-if analysis, and Anomaly detection using Isolation Forest. These tools are essential for operational tasks like identifying sensor faults or predicting maintenance needs. Additionally, time series analysis and moving averages are applied to marine data to enhance predictive accuracy.

Low-Code Tools and Deployment

To make maritime data science accessible, the course introduces the KNIME low-code platform, allowing users to apply complex regression and classification models to marine data without intensive coding. Finally, participants learn to build and deploy Streamlit applications and use GitHub to manage their work, transforming their models into practical, day-to-day tools for the maritime industry.

This structured roadmap ensures that maritime professionals can transition from understanding the “3 Vs” of Big Data (Volume, Velocity, Variety) to deploying advanced AI agents, RAG systems, and maritime data analytics solutions that support smarter decision-making in modern shipping operations.

Sea and Beyond

At Sea and Beyond we strive for authenticity and honesty in our work. Our mission is to help “You take a well informed decision” and we try and support you through our various services like

Mentoring
Skill enhancement/Education
Job Opportunities and
Your branding (CV and LinkedIn preparation)

We have a team of writers who conduct thorough research to ensure the accuracy of the content and for the clarity in communication. We will be happy to support you wherever required

Rating:

(0 Reviews)

Price:

₹ 15000

Language:

English

Length:

17:30:00

Intended for:

Reference:

Topics:

MarineTech

4 Sections . 51 lectures . 17h 30m total length

7 min

1.1 Introduction to Data Science & Big Data

1. Course Vision

Purpose: Empower marine professionals to make informed, auditable decisions using data.
Audience: Mariners, engineers, analysts, and curious minds at the intersection of ocean and code.

Foundations

Data Science: Extracts insights using statistics, machine learning, and programming.
Big Data: Characterized by the 3 Vs — Volume, Velocity, Variety.

Free

11 min

1.2 Data Analysis vs Analytics

2. Core Concepts

Data: Recorded measurements.
Variables: Characteristics that vary across entities.
Sources: Humans, machines, and their combinations.
Forms: Structured vs. unstructured data.

Analytics vs. Analysis

Qualitative:
Analysis (past) – Storytelling
Analytics (future) – Intuition-based analysis
Quantitative:
Analysis (past) – Data insights
Analytics (future) -Formulas, algorithms

13 min

1.3 Types of Variables

3. Types of Variables

Categorical – Nominal, Ordinal
Numerical – Discrete, Continuous (Interval & Ratio)

With examples for each of the types of variables

9 min

1.4 Distributions

4. Distributions

Normal: Bell curve, symmetric about the mean.
Uniform: Equal probability (e.g., dice rolls).
Binomial: Two outcomes (e.g., coin tosses).

17 min

1.5 Population & Samples

5. Population vs. Sample

Sampling Types:
o Probability: Random, stratified, clustered
o Non-probability: Convenience, judgmental, snowball
Statistics:
o Descriptive: Charts, mean, median, mode
o Inferential: Hypothesis testing, confidence intervals

With demonstration using a web application.
(link provided for additional practice)

12 min

1.6 Measures of central tendency

6. Measures of central tendency

Dispersion

Standard Deviation: Spread around the mean.
Variance: Squared deviation from the mean.

12 min

1.7 Correlation

7. Correlation

Correlation: Measures relationship strength (−1 to +1).

Demonstration how to calculate Correlations in excel.

14 min

1.8 Causality

8. Causality

Causality: Requires deeper investigation — correlation ≠ causation.

Demonstration using web application explaining causality.
(link provided for additional practice)

13 min

1.9 Central Limit Theorem

9. Central Limit Theorem (CLT)

Insight: Sample means approximate a normal distribution for large samples (n ≥ 30), regardless of population distribution.
Assumptions: Random sampling, independence, finite variance

Demonstration using a web application explaining CLT application.
(link provided for additional practice)

Additional note explaining difference between Std. deviation & Std. error.

20 min

1.10 Measures of central tendency demo

10. Measures of Central tendency demonstration

Calculations: Mean, median, mode, std. deviation, variance, var. coeff

Demonstration in a web application explaining skew, kurtosis etc.,
(link provided for additional practice).

5 min

2.1 Introduction

1. Data Lifecycle

Collection & Storage: From databases, APIs, IoT devices.
Cleaning & Preprocessing: Handling missing values, outliers, scaling issues.
Exploratory Data Analysis (EDA): Visual summaries and trend identification.
Machine Learning & AI: Predictive modeling for decision-making.

Free

30 min

2.2 Hypothesis tests

2. Hypothesis Testing

Null Hypothesis (H₀): Assumes no effect or change.
Alternative Hypothesis (H₁): Suggests a significant effect or difference.
Purpose: To infer population characteristics from sample data using statistical tests.

21 min

2.3 Z-scores, p-values | Part 1

3 Statistical Tools & Concepts

Z-Scores: Standardize data for comparison across datasets.
p-Values: Measure evidence against H₀.

21 min

2.4 Z-scores, p-values | Part 2

4. Statistical Tools & Concepts

Z-Scores: Standardize data for comparison across datasets.
p-Values: Measure evidence against H₀.

30 min

2.5 PDF, CDF, use cases

5. Density functions

PDF (Probability Density Function):

Describes the likelihood of a continuous random variable taking on a specific value.
The area under the curve represents probability.

CDF (Cumulative Distribution Function):

Represents the probability that a variable takes a value less than or equal to a given point.
Useful for calculating p-values and tail probabilities.

18 min

2.6 Z-test, T-test, Degrees of freedom

6. Z-Test vs T-Test:

Use Z-Test when population standard deviation is known and sample size ≥ 30.
Use T-Test when population standard deviation is unknown.

26 min

2.7 One tailed test

7. One tailed & two tailed tests:

Two-Tailed Test: Checks for any difference from hypothesized mean.
Left-Tailed Test: Tests if sample mean is significantly lower.
Right-Tailed Test: Tests if sample mean is significantly higher.

17 min

2.8 Errors in statistical testing

8. Errors in Testing

Type I Error (False Positive): Rejecting a true H₀.
Type II Error (False Negative): Failing to reject a false H₀.
Trade-off: Reducing one increases the other unless sample size increases.

25 min

2.9 Outliers & detection

9. Data Issues

Common Problems: Missing values, measurement errors, sensor faults, manual manipulation.
Outliers: Identified using Interquartile Range (IQR).

22 min

2.10 Types of plots

10. Visualization Techniques

Plot Types: Scatter, Line, Bar, Histogram, Pie, Box plots.
Tools: Excel, Python (Matplotlib, Seaborn).

13 min

3.1 Introduction to Machine Learning

Module 1: Introduction to Machine Learning

ML is a subset of AI that learns patterns from data to make predictions/decisions.
Relies on data, algorithms, and iterative improvement.
Applications span industries: predictive maintenance, customer segmentation, fraud detection, NLP, computer vision.

Free

20 min

3.2 Independent & Dependent Variables

Module 2: Independent & dependent variables

Independent variables (predictors) vs Dependent variables (outcomes).
Regression models relationships between them.
Purpose: infer/predict one variable from others, understand correlation impact.

9 min

3.3 Dummy Variables

Module 3: Dummy variables

Used when predictors are categorical (e.g., weather: good/bad).
Encoded into binary variables for regression models.

29 min

3.4 Linear Regression

Module 4: Linear regression, Multi linear regression

Supervised ML algorithm assuming linear relationship between input & output.
Types:
- Simple Linear Regression: one predictor.
- Multiple Linear Regression: multiple predictors.
Assumptions: linearity, independence, homoscedasticity, normality of errors, no multicollinearity, no autocorrelation, additivity.
Real-world use cases: stock market, real estate, medical risk, sales forecasting.

22 min

3.5 Metrics, Multi Linear Regression

Module 5: Metrics, Multi linear regression demo

Limitations: assumes linearity, sensitive to outliers.
Metrics: calculations

22 min

3.6 Understanding the Metrics

Module 6: Understanding the Metrics

MSE, MAE, RMSE for error measurement.
R2 & Adjusted R2 for explanatory power.

Evaluation of metrics for comparing different models for selecting a better
model for predictions and analytics.

20 min

3.7 Splitting Data for ML

Module 7: Splitting data for ML
Splitting data to training and test data

Training data: builds the model, larger dataset.
Test data: evaluates performance, unseen dataset to check generalization.
Prevents overfitting and validates accuracy.

28 min

3.8 Feature engineering and scaling

Module 8: Feature engineering and feature scaling

Feature engineering: cleansing, transformation, extraction, selection, Iteration, imputing missing values.
Feature scaling: normalization (0–1) or standardization (mean=0, SD=1).
Ensures equal contribution of features, avoids bias in distance-based algorithms.

22 min

3.9 Linear Regression applying features

Module 9: Linear regression applying features

Linear regression applying splitting data & feature scaling
Understanding the hierarchy between splitting & scaling data.

Able to evaluate the model using test data after constructing a model for
predictions.

31 min

3.10 Regression vs Classification

Module 10: Regression vs Classification

Other Regression & Classification
Other regressions: polynomial, ridge, lasso, SVR, decision tree, random forest, logistic, gradient boosting, XGB.
Classification vs Regression:
Classification → discrete outcomes (yes/no, categories).
Regression → continuous outcomes (price, temperature).
Algorithms: decision tree, random forest, KNN, logistic regression, SVM, Naïve Bayes.

26 min

3.11 ANOVA

Module 11: ANOVA (One way ANOVA)

Analysis of Variance: compares means across 3+ groups.
Tests null hypothesis (all means equal).
Assumptions: normality, independence, equal variance, no overlap.
Examines variability within vs among groups.

10 min

4.1 Introduction to AI ML applications

Module 1: Introduction to AI ML applications

Module overview, areas of deployment along with course introduction
and learning curve.

Free

22 min

4.2 Supervised & Unsupervised learning

Module 2: Supervised & Unsupervised learning

Understanding differences between them and application demo in python
Differentiating between regression, classification, clustering methods & its application areas.
Purpose: How and where to apply, choosing variables based on the method.

17 min

4.3 Random forest Intuition

Module 3: Random Forest intuition

Fundamentals of decision trees, random forest
Applying in regression & classification
Importance: brief knowledge of how decision trees work and application for Regression in ML.

21 min

4.4 Logistics Regression Intuition

Module 4: Logistic regression intuition

Fundamentals of logistic regression
Types of logistic regression, difference between linear regression & logistic regression
Sigmoid function, accuracy metrics, ROC curves.
Importance: applications for Classification in ML

25 min

4.5 Random Forest regression Use case

Module 5: Random Forest regression Use case

Applying Random Forest regression to a marine data
Comparing the results with multi linear regression applied to same data
Importance: Model evaluation, predictions and inferences.

18 min

4.6 Random Forest classification Use case

Module 6: Random Forest classification Use case

Applying Random Forest classification to marine data
Comparing results with Logistic regression applied to same data
Importance: Model evaluation, predictions and inferences.

26 min

4.7 Clustering Methods

Module 7: Clustering methods

Fundamentals of clustering, uses.
Applying clustering methods to marine data
Importance: Evaluation, inferences

19 min

4.8 Whatif analysis using AI ML

Module 8: Whatif analysis using AI ML

Significance of Whatif analysis
Application to marine data
Importance: Predictions, inferences

25 min

4.9 Anomaly detection using AI ML

Module 9: Anomaly detection using AI ML

Anomaly detection methods
Significance of anomaly detection
Fundamentals of Isolation Forest for anomaly detection
Applying isolation forest on marine data
Importance: significance of anomaly detection, inferences

31 min

4.10 Times Series & Moving Averages

Module 10: Time Series & Moving Averages

Fundamentals of time series and moving averages
Application to marine data
Importance: significance & inferences

28 min

4.11 The Rise of AI (GenAI, AI agents, RAG, MCP servers, Agentic AI)

Module 11: The Rise of AI

GenAI, AI agents, Agentic AI, RAG systems, MCP servers
Application areas and its importance.

23 min

4.12 AI agents Use Case

Module 12: AI agents Use case

AI agents’ fundamentals and applications
Using AI agents’ demo

33 min

4.13 RAG Systems

Module 13: RAG Systems

RAG systems fundamentals and applications
Using RAG systems demo

23 min

4.14 mcp servers and Use case

Module 14: mcp servers and Use case

Mcp servers’ fundamentals and applications
Using mcp servers’ demo

25 min

4.15 KNIME low code platform

Module 15: KNIME low code platform

KNIME fundamentals, applications
Application on marine data

21 min

4.16 Regression models in KNIME

Module 16: Regression models in KNIME

Applying regression in KNIME on marine data
Comparing Random Forest regression with multi linear regression in KNIME on same dataset. Model evaluation and comparison in KNIME.
Importance & ease of low code no code platforms, predictions, inferences.

26 min

4.17 Classification models in KNIME

Module 17: Classification models in KNIME

Applying classification in KNIME on marine data
Comparing Random Forest classifier, decision tree classifier, naïve bayes classifier, logistic regression in KNIME on same dataset. Model evaluation & comparison in KNIME.
Importance & ease of low code no code platforms, predictions, inferences.

25 min

4.18 Optimisation in modelling

Module 18: Optimization in modelling

Optimization fundamentals, necessity to optimize.
Comparing various models to evaluate metrics and optimize best model.
Importance of optimization, applying on marine data.

19 min

4.19 Streamlit applications and Use case

Module 19: Streamlit applications and Use case

Significance of Streamlit application
Creating an application using marine data
Deploy it locally in your system using Streamlit
Importance of Streamlit application for making simple & effective day to day applications for simplifying tasks.

28 min

4.20 VS code, Github

Module 20: VS code, GitHub

Significance of VS code interpreter & GitHub
Saving your work in VS code, pushing it to GitHub repository to your account.
Deploying it in Streamlit community, pulling saved data from GitHub to deploy a web application using marine data.

Successful completion of this course will earn you a certificate of completion from Sea and Beyond. This certificate will be emailed to you and you could also share and show it on LinkedIn Please click on the button below to purchase the certificate.

Upgrade to certifcation for free