Storage solutions

Azure storage is one of the core services of Azure. Every service in azure indirectly depends on storage. So it is necessary to know about various storages offered by Azure.
Basic things that a storage solution should provide is :

  1. Durability
  2. High availability
  3. Security
  4. Sharing mechanism
  5. Accessibility

Case study 1 : Azure Migration

Company details : Kiran technology company uses

  1. On premise SQL server to store the transactional data
  2. On premise server with Disks to store the customer images and videos
  3. On premise Disk to store all the log data of applications
  4. Network file system to share the data among internal analytics team to generate business metrics
  5. A windows folder contains csv data of people they are gathering from social media. (This data can’t be inserted into SQL server because data doesn’t have a single schema as it is being pulled from various sources)
Kiran technology business increased 200 % from what it was in the last year. Good for them ! But, the systems struggle to scale and most common problems they faced were
  1. Data center having common power outages, less availability.
  2. Disks running out due to high volume data, high maintenance.
  3. Managing permission and licensing has become a pain as users and third party collaborators count increased.
  4. Processing the csv data has become slow due to the volume, no scaling.
Finally they decided to move on to the cloud !! They researched on below storage solutions

Azure Blob

  1. Optimized to store massive unstructured data.
  2. Storing files to distributed access
  3. Streaming video and audio
  4. Log analysis
  5. Can access the files using http protocol anywhere from the world.

Azure files

  1. Can access a file using SMB (Standard server message) protocol.

Azure Queue Service

  1. Used to store and retrieve messages.
  2. A message can be 64KB and can store millions of messages in the queue.

Azure table storage

  1. Key / attribute store with a schemaless design.

Azure data lake storage v2 VS Azure blob

Azure blob storage lacks the hierarchical structure and query syntax.
Specific GET operation needs to be performed to fetch the file.
Azure data lake gen 2 = Azure blob + Azure data lake gen 1.
  1. ADLS gen 2 supports Hadoop.
  2. Superset of POSIX permissions
  3. Cost effective
  4. Optimized driver.
Finally Kiran technologies decided to use
  1. ADLS Gen 2 to store video and images ( ADLS was chosen here they might have to search the video / images based on location, blob storage doesn’t allow that hierarchy)
  2. Log data is stored on Azure blob (because blob is cheapest storage)
  3. Log data is stored on Azure blob (because blob is cheapest storage)
  4. Shared files are kept on azure Files for sharing purpose
  5. Csv data is again stored on ADLS gen2 because of high availability that it offers.
  6. Sql server data can be copied to Azure sql database (which has connectors for mysql, postgres , MariaDB ).
  7. Generally data which is normalized , has schemas, many to many relationships, high integrity , requires strong consistency.

The need for designing the redundancy and disaster recovery systems

  1. Power outage could happen for a region
  2. DNS might get collapsed for a region
  3. Internet might be down for the region
Azure offers a redundancy option that will copy the data into a secondary server which is miles away from the primary data center so that the above disaster should not make the data unavailable.

Types of Redundant storage

LRS (Locally redundant storage)

Copies the data synchronously 3 times within a single physical location in the same primary region.

ZRS ( Zone redundant storage)

Copies the data across 3 different availability zones in the same primary region. Microsoft suggests this when consistency, high availability.

GRS (Geo redundant storage)

GRS triggers the LRS (Copies the data synchronously 3 times within a single physical location in the same primary region )
Then data is copied into a secondary region and triggers ZRS in primary and secondary location.
By default reads and writes goes to primary only, but we can configure and route the reads to secondary locations.

Access tires

Hot :

Frequently accessed data is stored here

Cold :

Not so frequently data is stored here

Archive :

Data that won’t be used and kept for backup purposes. Archived data will be stored for 180 days, deleting of archived data will be charged.
Cost wise Hot > cold > archive.