Download CV

Database Interview Questions

August 9, 2023

What is normalization?

Normalization, in the context of databases, is the process of organizing and structuring relational database tables to minimize redundancy and improve data integrity. The main goal of normalization is to eliminate data anomalies and inconsistencies by breaking down data into smaller, related tables that adhere to specific rules and guidelines. This process helps in reducing data duplication and improving the overall efficiency of the database.

Normalization involves several levels, known as normal forms, each with its own set of rules to achieve progressively higher levels of data organization. The most common normal forms include:

  1. First Normal Form (1NF): Ensures that each column contains only atomic (indivisible) values and each row is uniquely identifiable.
  2. Second Normal Form (2NF): Builds on 1NF and eliminates partial dependencies by ensuring that non-key attributes are functionally dependent on the entire primary key.
  3. Third Normal Form (3NF): Builds on 2NF and eliminates transitive dependencies by ensuring that non-key attributes are not dependent on other non-key attributes.

There are higher normal forms as well, but these are the most commonly discussed ones. By applying normalization, data redundancy is reduced, data integrity is improved, and it becomes easier to maintain and update the database.

In the context of your profession as a software engineer, understanding normalization is crucial when designing and developing databases to ensure optimal performance and data consistency.

How can we design a database?

Designing a database involves several steps and considerations to ensure that the database structure is efficient, scalable, and capable of meeting the requirements of the application it supports. Here’s a general guide on how to design a database:

  1. Requirements Gathering:
    • Understand the requirements of the application or system that the database will support.
    • Identify the data that needs to be stored, organized, and retrieved.
  2. Conceptual Design:
    • Create an Entity-Relationship Diagram (ERD) to visualize the entities (objects), their attributes, and the relationships between them.
    • Define primary keys for each entity to ensure uniqueness.
  3. Logical Design:
    • Refine the ERD and translate it into a logical data model.
    • Identify relationships, such as one-to-many, many-to-many, and one-to-one.
    • Normalize the data to eliminate redundancy and ensure data integrity.
  4. Physical Design:
    • Translate the logical model into the physical database schema.
    • Choose a database management system (DBMS) that suits your needs (e.g., MySQL, PostgreSQL, MongoDB).
    • Define data types for each attribute (e.g., integers, strings, dates).
    • Create tables based on the entities in your logical model.
  5. Indexing and Optimization:
    • Identify the columns that will be frequently used for searching and sorting.
    • Create indexes on those columns to improve query performance.
    • Consider database performance optimization techniques such as query optimization and caching.
  6. Security and Access Control:
    • Define user roles and permissions to restrict access to sensitive data.
    • Implement encryption for sensitive data at rest and in transit.
  7. Data Integrity and Validation:
    • Implement constraints to enforce data integrity, such as unique constraints, foreign key constraints, and check constraints.
    • Implement data validation to ensure that only valid data is entered into the database.
  8. Backup and Recovery:
    • Set up regular backup and recovery procedures to protect against data loss.
    • Consider disaster recovery plans and off-site backups.
  9. Testing and Quality Assurance:
    • Test the database design with sample data to ensure it functions as expected.
    • Perform stress testing and load testing to evaluate its performance under different conditions.
  10. Documentation:
    • Document the database schema, relationships, and any business rules.
    • Provide documentation for developers, administrators, and other stakeholders.
  11. Maintenance and Evolution:
    • Regularly monitor and maintain the database for performance, security, and updates.
    • Be prepared to make modifications as the application’s requirements evolve over time.

Remember that database design is iterative, and you might need to revisit and refine your design based on changing requirements or performance issues. It’s also important to collaborate with other team members, such as software developers and domain experts, to ensure the database meets the needs of the application.

What is trigger?

A trigger in the context of databases is a set of instructions or a piece of code that is automatically executed in response to a specific event or action that occurs within a database. Triggers are used to enforce business rules, maintain data integrity, and automate certain tasks without requiring manual intervention. They are typically associated with specific tables and are triggered by events such as INSERT, UPDATE, DELETE, or other database-related actions.

Triggers can be categorized into two main types based on when they are executed:

  1. Before Triggers (or “BEFORE” Triggers): These triggers are executed before the triggering event occurs. They can be used to validate or modify data before it’s actually inserted, updated, or deleted. For example, you could use a “BEFORE INSERT” trigger to automatically set a creation timestamp before a new record is added to a table.
  2. After Triggers (or “AFTER” Triggers): These triggers are executed after the triggering event has taken place. They are often used to perform actions based on changes made to the data. An “AFTER UPDATE” trigger, for instance, could be used to log the changes made to a specific table in a separate audit log table.

Triggers are implemented using procedural languages such as PL/SQL (Oracle), T-SQL (Microsoft SQL Server), or PL/pgSQL (PostgreSQL). The code within a trigger defines what actions should be taken when the trigger’s associated event occurs. Triggers can perform a variety of tasks, such as enforcing referential integrity, logging changes, updating related records, sending notifications, and more.

While triggers can be powerful tools, they should be used judiciously. Poorly designed triggers can impact database performance, introduce complexity, and make debugging more challenging. Therefore, it’s important to carefully plan and test triggers before implementing them in a production database.

What is a primary key in a database?

A primary key is a unique identifier for each record in a database table. It ensures that each row has a distinct identity and helps maintain data integrity.

What is a foreign key?

A foreign key is a column or set of columns in one table that refers to the primary key of another table. It establishes a relationship between the tables and maintains data integrity.

What is an index in a database?

An index is a database structure that improves the speed of data retrieval operations by providing a quick way to access rows based on specific columns.

What is the purpose of a stored procedure?

A stored procedure is a precompiled set of SQL statements that can be executed as a single unit. It’s used to encapsulate business logic and perform tasks within the database.

What is a view in a database?

A view is a virtual table derived from one or more tables in the database. It presents data from the underlying tables in a different format, often for simplifying queries or enforcing security.

What is ACID in the context of database transactions?

ACID stands for Atomicity, Consistency, Isolation, and Durability. It’s a set of properties that ensure reliable processing of database transactions.

What is normalization in the context of database design?

Normalization is the process of organizing and structuring a database to minimize redundancy and improve data integrity. It involves breaking down data into smaller tables and adhering to specific rules.

What is denormalization?

Denormalization is the process of intentionally introducing redundancy into a database to improve performance by reducing the need for complex joins and queries.

What is the difference between a clustered and a non-clustered index?

A clustered index determines the physical order of data in a table, while a non-clustered index provides a separate structure for quick data retrieval without affecting the physical order.

What is a composite key?

A composite key is a key that consists of two or more columns, used together to uniquely identify rows in a table. It’s an alternative to a single-column primary key.

What is the purpose of the SQL SELECT statement?

The SELECT statement is used to retrieve data from a database. It allows you to specify which columns to retrieve, which table to retrieve them from, and conditions for filtering the data.

What is data warehousing?

Data warehousing involves the process of collecting, storing, and managing data from various sources for analytical purposes. It often involves transforming and aggregating data to support business intelligence and decision-making.

What is the difference between a database and a database management system (DBMS)?

A database is a structured collection of data, while a DBMS is software that manages, stores, retrieves, and manipulates that data. The DBMS provides tools for creating, accessing, and maintaining databases.

What is a transaction in a database?

A transaction is a sequence of one or more database operations treated as a single unit of work. It ensures that either all operations are completed successfully, or none are applied.

What is a NoSQL database?

A NoSQL (Not Only SQL) database is a type of database that provides a non-relational approach to data storage and retrieval. It’s suitable for handling large volumes of unstructured or semi-structured data and offers high scalability and flexibility.

Posted in All
Write a comment