Simply a data is something that provides information about a particular thing and can be used for analysis. Data can have different sizes and formats.
For example, all the information of a particular person in Resume or CV including his educational details, personal interests, working experience, address etc. in pdf, docx file format having size in kb’s.
This is very small-sized data which can be easily retrieved and analyzed. But with the advent of newer technologies in this digital era, there has been a tremendous rise in the data size.
Data has grown from kilobytes(KB) to petabytes(PB). This huge amount of data is referred to as big data and requires advance tools and software for processing, analyzing and storing purposes.
Big Data can be divided into following three categories.
- Structured Data
- Unstructured Data
- Semi-structured Data
Structured Data
The data that has a structure and is well organized either in the form of tables or in some other way and can be easily operated is known as structured data. Searching and accessing information from such type of data is very easy.
For example, data stored in the relational database in the form of tables having multiple rows and columns. The spreadsheet is an another good example of structured data.
Unstructured Data
The data that is unstructured or unorganized Operating such type of data becomes difficult and requires advance tools and softwares to access information.
For Example, images and graphics, pdf files, word document, audio, video, emails, powerpoint presentations, webpages and web contents, wikis, streaming data, location coordinates etc.
Semi-Structured Data
Semi-structured data is basically a structured data that is unorganised. Web data such JSON(JavaScript Object Notation) files, BibTex files, .csv files, tab-delimited text files, XML and other markup languages are the examples of Semi-structured data found on the web.
Due to unorganized information, the semi-structured is difficult to retrieve, analyze and store as compared to structured data. It requires software framework like Apache Hadoop to perform all this.