Data Analyst Interview Questions 2025 with Answers
Get Prepare for Data Analyst Interview with ONLEI Technologies. Data Analyst Interview Questions, Data Analytics Interview Questions.
Data Analytics Preparation Material Updated 2025.
Enroll Now in Data Analytics Course and Get ready for MNCs Jobs.
What is Data Analytics?
Data Analytics is the process of inspecting, cleansing, transforming, and modeling data to discover useful information, inform conclusions, and support decision-making.
What are the types of Data Analytics?
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics

Difference between Data Analytics and Data Science?
Data Analytics focuses on analyzing existing data, while Data Science involves predictive modeling, algorithms, and often includes machine learning.
What is the lifecycle of Data Analytics?
Data Discovery
Data Preparation
Data Modeling
Data Validation
Deployment
Monitoring & Optimization
What are structured and unstructured data?
Structured: Tabular data (e.g., SQL databases)
Unstructured: Images, videos, text data (e.g., emails, social media)
What is Data Cleansing?
It’s the process of detecting and correcting (or removing) corrupt or inaccurate records from a dataset.
What tools are used for Data Analytics?
Excel
SQL
Python
R
Tableau
Power BI
SAS
Apache Spark
What is the difference between OLAP and OLTP?
OLAP (Online Analytical Processing): Used for complex queries and data analysis.
OLTP (Online Transaction Processing): Used for day-to-day operations.
What are KPIs in analytics?
Key Performance Indicators are measurable values that indicate how effectively a company is achieving key business objectives.
What is the difference between data mining and data profiling?
Data Mining: Extracts patterns from data.
Data Profiling: Summarizes data’s characteristics (e.g., range, frequency).
What is a primary key?
A unique identifier for a record in a table.
What is a foreign key?
A field in a table that links to the primary key of another table.
Difference between INNER JOIN and LEFT JOIN?
INNER JOIN: Returns records with matching values in both tables.
LEFT JOIN: Returns all records from the left table, and matched records from the right.
What is normalization?
The process of organizing data to reduce redundancy.
What is denormalization?
Introducing redundancy to improve read performance.
How do you find duplicate records in a SQL table?
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1;
How do you get the second highest salary in SQL?
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
What is a subquery?
A query nested within another query.
What is a CTE (Common Table Expression)?
A temporary result set used within a SELECT, INSERT, UPDATE, or DELETE statement.
What is the difference between DELETE and TRUNCATE?
DELETE: Deletes rows with condition; can be rolled back.
TRUNCATE: Deletes all rows; cannot be rolled back (in most databases).
What are pivot tables?
A data summarization tool in Excel for data analysis.
What are slicers in Excel?
Visual filters that allow users to filter data in pivot tables easily.
What is VLOOKUP and INDEX-MATCH?
Functions to look up and retrieve data from a table. INDEX-MATCH is more flexible and faster than VLOOKUP.
Difference between COUNT, COUNTA, COUNTBLANK?
COUNT: Numbers only
COUNTA: All non-blank values
COUNTBLANK: Blank cells
What is conditional formatting?
A feature to apply formatting based on cell values.
What is a dashboard?
A visual representation of data using charts, graphs, KPIs.
What is Power BI?
A business analytics tool for visualizing data and sharing insights.
Difference between Power BI and Tableau?
Power BI: Microsoft ecosystem, lower cost
Tableau: Advanced visualizations, better performance on large datasets
What is DAX in Power BI?
Data Analysis Expressions – used to create custom calculations.
What are Measures and Calculated Columns in Power BI?
Measure: Calculates at query time (e.g., SUM, AVG)
Calculated Column: Adds a column in data model
What libraries do you use for Data Analysis in Python?
Pandas
NumPy
Matplotlib
Seaborn
Scikit-learn
What is Pandas used for?
Data manipulation and analysis in tabular form.
What is the difference between a Series and a DataFrame in Pandas?
Series: One-dimensional
DataFrame: Two-dimensional (table)
How to handle missing data in Python?
Using functions like dropna() or fillna().
What is groupby() in Pandas?
It splits the data into groups based on criteria and allows aggregation.
What is Seaborn used for?
For statistical data visualization in Python.
Difference between NumPy and Pandas?
NumPy: Numerical operations on arrays
Pandas: Data manipulation with labeled data
What are lambda functions in Python?
Anonymous, inline functions using the lambda keyword.
What is a dictionary in Python?
A collection of key-value pairs.
What is a list comprehension in Python?
A concise way to create lists using a single line of code.
What is the difference between mean, median, and mode?
Mean: Average
Median: Middle value
Mode: Most frequent value
What is standard deviation?
A measure of the amount of variation or dispersion in a dataset.
What is correlation?
A statistical measure that indicates the extent to which two variables fluctuate together.
What is regression analysis?
A predictive modeling technique to estimate relationships between variables.
What is hypothesis testing?
A statistical method to test assumptions (null vs. alternate hypothesis).
What is p-value?
The probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true.
Explain A/B testing.
A method to compare two versions of a product to determine which performs better.
What is outlier detection?
The process of identifying abnormal or rare data points.
Scenario: You see a sudden drop in website traffic. What will you do?
Check Google Analytics
Check for broken links, SEO issues
Review marketing activities
Analyze server logs for downtime
Scenario: A dashboard suddenly shows 0 sales. How do you debug?
Check data source connection
Verify refresh schedule
Confirm data pipeline is running
Validate raw data in database
How would you handle data drift in a machine learning model used in production?
Continuously monitor data distribution.
Implement statistical tests like K-S test.
Use retraining pipelines when significant drift is detected.
Explain the difference between supervised, unsupervised, and semi-supervised learning.
Supervised: Labeled data (e.g., regression, classification)
Unsupervised: No labels (e.g., clustering, association)
Semi-supervised: Small labeled + large unlabeled dataset
What is the curse of dimensionality and how do you address it?
It refers to the exponential increase in volume associated with adding extra dimensions to Euclidean space.
Use dimensionality reduction techniques (PCA, t-SNE, autoencoders).
Explain feature engineering and its importance.
It is the process of creating new input features from existing ones to improve model performance.
Involves transformations, combinations, discretization, encoding.
What is cross-validation and why is it used?
A technique to assess model performance by splitting data into training and validation sets multiple times.
Reduces overfitting and gives a better estimate of model performance.
What is the difference between Type I and Type II error?
Type I: False positive (rejecting true null hypothesis)
Type II: False negative (failing to reject false null hypothesis)
Explain ROC curve and AUC.
ROC curve: Plot of True Positive Rate vs. False Positive Rate.
AUC: Area under ROC curve, indicates classifier performance.
How do you select important features in a dataset?
Using methods like correlation analysis, mutual information, recursive feature elimination, Lasso regularization.
What is time series decomposition?
Splitting a time series into trend, seasonality, and residual components.
Scenario: Your model has high accuracy but performs poorly in real-world usage. What could be the issue?
Possible data leakage, overfitting, imbalance in classes, or unrepresentative test data.