44 TOP Data warehousing interview questions and answers pdf download

Latest Data warehousing Interview Questions and Answers for freshers and experienced pdf free download

Data warehousing interview Questions and Answers List

1. What is a data warehouse?
A data warehouse is a collection of data marts representing historical data from different operations in the company. This data is stored in a structure optimized for querying and data analysis as a data warehouse. Table design, dimensions and organization should be consistent throughout a data warehouse so that reports or queries across the data warehouse are consistent. A data warehouse can also be viewed as a database for historical data from different functions within a company.

2. What is difference between drill & scope of analysis?
Drilling can be done in drill down, up, through, and across; scope is the overall view of the drill exercise.

3. What is type 2 version dimension?
Version dimension is the SCD type II in real time it using because of it will maintain the current data and full historical data.

4. What is data warehouse architecture?
Data warehousing is the repository of integrated information data will be extracted from the heterogeneous sources. Data warehousing architecture contains the different; sources like oracle, flat files and ERP then after it have the staging area and Data warehousing, after that it has the different Data marts then it have the reports and it also have the ODS – Operation Data Store. This complete architecture is called the Data warehousing Architecture.

5. What is data analysis? Where it will be used?
Data analysis: consider that you are running a business and u store the data of that; in some form say in register or in a comp and at the year end you want know the profit or loss then it called data analysis .Data analysis use: then u want to know which product was sold the highest and if the business is running in a loss then finding, where we went wrong we do analysis.

6. What is a data mart?
A data mart is a segment of a data warehouse that can provide data for reporting and analysis on a section, unit, department or operation in the company, e.g. sales, payroll, production. Data marts are sometimes complete individual data warehouses which are usually smaller than the corporate data warehouse.

7. What are the benefits of data warehousing?
=> Data warehouses are designed to perform well with aggregate queries running on large amounts of data.
=> The structure of data warehouses is easier for end users to navigate, understand and query against unlike the relational databases primarily designed to handle lots of transactions.
=> Data warehouses enable queries that cut across different segments of a company’s operation. E.g. production data could be compared against inventory data even if they were originally stored in different databases with different structures.
=> Queries that would be complex in very normalized databases could be easier to build and maintain in data warehouses, decreasing the workload on transaction systems.
=> Data warehousing is an efficient way to manage and report on data that is from a variety of sources, non uniform and scattered throughout a company.
=> Data warehousing is an efficient way to manage demand for lots of information from lots of users.
=> Data warehousing provides the capability to analyze large amounts of historical data for nuggets of wisdom that can provide an organization with competitive advantage.

8. What are data modeling and data mining? Where it will be used?
Data modeling is the process of designing a data base model. In this data model data will be stored in two types of table fact table and dimension table.
Fact table contains the transaction data and dimension table contains the master data. Data mining is process of finding the hidden trends is called the data mining.

9. Where the Data cube technology is used?
A multi-dimensional structure called the data cube. A data abstraction allows one to view aggregated data from a number of perspectives. Conceptually, the cube consists of a core or base cuboids, surrounded by a collection of sub-cubes/cuboids that represent the aggregation of the base cuboids along one or more dimensions. We refer to the dimension to be aggregated as the measure attribute, while the remaining dimensions are known as the feature attributes.

10. What is the difference between dependent data warehouse and independent data warehouse?
Dependent departments are those, which depend on a data ware to for their data.Independent department are those, which get their data directly from the operational data sources in the organization.

11. What is Virtual Data Warehousing?
A virtual or point-to-point data warehousing strategy means that end-users are allowed to get at operational databases directly using whatever tools are enabled to the “data access network”.

12. What is the difference between metadata and data dictionary?
Meta data is nothing but information about data. It contains the information (i.e. data) about the graphs, its related files, abinitio commands, server information etc i.e. all kinds of information about project related information etc.

13. What is a cube in data warehousing concept?
Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

14. What are the different types of data warehousing?
Types of data warehousing are:
1. Enterprise Data warehousing
2. ODS (Operational Data Store)
3. Data Mart

15. What are the steps to build the data warehouse?
Gathering business requirements>>Identifying Sources>>Identifying Facts>>Defining Dimensions>>Define Attributes>>Redefine Dimensions / Attributes>>Organize Attribute Hierarchy>>Define Relationship>>Assign Unique Identifiers

16. What is OLAP?
OLAP stands for Online Analytical Processing.
It uses database tables (fact and dimension tables) to enable multidimensional viewing, analysis and querying of large amounts of data.
E.g. OLAP technology could provide management with fast answers to complex queries on their operational data or enable them to analyze their company’s historical data for trends and patterns.

17. What is OLTP?
OLTP stands for Online Transaction Processing.
OLTP uses normalized tables to quickly record large amounts of transactions while making sure that these updates of data occur in as few places as possible. Consequently OLTP database are designed for recording the daily operations and transactions of a business.
E.g. a time card system that supports a large production environment must record successfully a large number of updates during critical periods like lunch hour, breaks, startup and close of work.

18. What are dimensions?
Dimensions are categories by which summarized data can be viewed. E.g. a profit summary in a fact table can be viewed by a Time dimension (profit by month, quarter, year), Region dimension (profit by country, state, city), Product dimension (profit for product1, product2).

19. What are virtual cubes?
These are combinations of one or more real cubes and require no disk space to store them. They store only the definitions and not the data of the referenced source cubes. They are similar to views in relational databases.

20. What are MOLAP cubes?
MOLAP Cubes: stands for Multidimensional OLAP. In MOLAP cubes the data aggregations and a copy of the fact data are stored in a multidimensional structure on the Analysis Server computer. It is best when extra storage space is available on the Analysis Server computer and the best query performance is desired. MOLAP local cubes contain all the necessary data for calculating aggregates and can be used offline. MOLAP cubes provide the fastest query response time and performance but require additional storage space for the extra copy of data from the fact table.

21. What is Dimensional Modeling?
Dimensional Modeling is a design concept used by many data warehouse designers to build their data warehouse. In this design model all the data is stored in two types of tables – Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated.Dimension modeling is a method for designing data warehouse. Three types of modeling are there
1. Conceptual modeling
2. Logical modeling
3. Physical modeling

22. What is VLDB?
The perception of what constitutes a VLDB continues to grow. A one-terabyte database would normally be considered VLDB.Degenerate dimension: it does not have any link with dimensions and it will not have any attribute.

23. What is degenerate dimension table?
Degenerate Dimensions: If a table contains the values, which r neither dimension nor measures is called degenerate dimensions. For example invoice id, employee no.A degenerate dimension is data that is dimensional in nature but stored in a fact table.

24. What is a CUBE in data warehousing concept?
Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

25. What is the difference between star and snowflake schemas?
Star schema:
A single fact table with N number of DimensionSnowflake schema: Any dimensions with extended dimensions are known as snowflake schema.

26. What is loop in Data warehousing?
In DWH loops may exist between the tables. If loops exist, then query generation will take more time, because more than one path is available. It creates ambiguity also. Loops can be avoided by creating aliases of the table or by context.

27. Briefly state different between data ware house & data mart?
Data warehouse is made up of many datamarts. DWH contain many subject areas. However, data mart focuses on one subject area generally.

E.g. If there will be DHW of bank then there can be one data mart for accounts, one for Loans etc. This is high-level definitions.

28. What are ROLAP cubes?
ROLAP Cubes: stands for Relational OLAP. In ROLAP cubes a copy of data from the fact table is not made and the data aggregates are stored in tables in the source relational database. A ROLAP cube is best when there is limited space on the Analysis Server and query performance is not very important. ROLAP local cubes contain the dimensions and cube definitions but aggregates are calculated when they are needed. ROLAP cubes requires less storage space than MOLAP and HOLAP cubes.

29. What are HOLAP cubes?
HOLAP Cubes: stands for Hybrid OLAP. A ROLAP cube has a combination of the ROLAP and MOLAP cube characteristics. It does not create a copy of the source data however, data aggregations are stored in a multidimensional structure on the Analysis Server computer. HOLAP cubes are best when storage space is limited but faster query responses are needed.

30. What is the difference between OLAP and data warehouse?
Datawarehouse is the place where the data is stored for analyzing where as OLAP is the process of analyzing the data,managing aggregations, partitioning information into cubes for in depth visualization.

31. What is a Cube and Linked Cube with reference to data warehouse?
Cubes are logical representation of multidimensional data.The edge of the cube contains dimension members and the body of the cube contains data values. The linking in cube ensures that the data in the cubes remain consistent.

32. What is junk dimension?
A number of very small dimensions might be lumped together to form a single dimension, a junk dimension – the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.

33. What is snapshot with reference to data warehouse?
You can disconnect the report from the catalog to which it is attached by saving the report with a snapshot of the data.

34. What is active data warehousing?
An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly, efficiently and proactively.

35. Differences between star and snowflake schema?
Star schema – A single fact table with N number of Dimension, all dimensions will be linked directly with a fact table. This schema is de-normalized and results in simple join and less complex query as well as faster results.
Snow schema – Any dimensions with extended dimensions are know as snowflake schema, dimensions maybe interlinked or may have one to many relationship with other tables. This schema is normalized and results in complex join and very complex query as well as slower results.

36. Define schema hierarchy?
A concept hierarchy that is a total (or) partial order among attributes in a database
schema is called a schema hierarchy.

37. List out the OLAP operations in multidimensional data model?
 Slice and dice
 Pivot (or) rotate

38. What is roll-up operation?
The roll-up operation is also called drill-up operation which performs aggregation on a data cube either by climbing up a concept hierarchy for a dimension (or) by dimension reduction.

39.What are dependent and independent data marts?
Dependent data marts are sourced directly from enterprise data warehouses.
Independent data marts are data captured from one (or) more operational systems (or) external information providers (or) data generated locally with in particular department (or) geographic area.

40.What is virtual warehouse?
A virtual warehouse is a set of views over operational databases. For efficient
query processing, only some of the possible summary views may be materialized. A virtual warehouse is easy to build but requires excess capability on operational database servers.

41.Define indexing?
Indexing is a technique, which is used for efficient data retrieval (or) accessing data in a faster manner. When a table grows in volume, the indexes also increase in size requiring more storage.

42. Explain regression in predictive modeling?
Regression definition
Linear regression
Multiple regression
Non-linear regression
Other regression models

43. Explain statistical perspective in data mining?
Point estimation
Data summarization
Bayesian techniques
Hypothesis testing

44. Explain Bayesian classification.
Bayesian theorem
Naïve Bayesian classification
Bayesian belief networks
Bayesian learning

Leave a Reply

Your email address will not be published. Required fields are marked *