Big data refers to the huge volume of data that cannotbe stored and processed with in a time frame intraditional file next question comes in mind is how big this dataneeds to be in order to classify as a big data. There is alot of misconception in referring a term big data. Weusually refer a data to be big if its size is in gigabyte,terabyte, Petabyte or Exabyte or anything larger thanthis size. This does not define a big data a small amount of file can be referred to as a bigdata depending upon the content is being ’s just take an example to make it clear. If we attacha 100 MB file to an email, we cannot be able to do a email does not support an attachment of this with respect to an email, this 100mb filecan be referred to as a big data. Similarly if we want toprocess 1 TB of data in a given time frame, we cannotdo this with a traditional system since the resourcewith it is not sufficient to accomplish this you are aware of various social sites such asFacebook, twitter, Google+, LinkedIn or YouTubecontains data in huge amount. But as the users aregrowing on these social sites, the storing and processingthe enormous data is becoming a challenging this data is important for various firms togenerate huge revenue which is not possible with atraditional file system. Here is what Hadoop comes inthe Data simply means that huge amountof structured, unstructured and semi-structureddata that has the ability to be processed for information. Now a days massive amount of dataproduced because of growth in technology,digitalization and by a variety of sources, includingbusiness application transactions, videos, picture ,electronic mails, social media, and so on. So to processthese data the big data concept is data: a data that does have a proper formatassociated to it known as structured data. For examplethe data stored in database files or data stored in Data: A data that does not have aproper format associated to it known as structured example the data stored in mail files or in data: a data that does not have any formatassociated to it known as structured data. For examplean image files, audio files and video data is categorized into 3 v’s associated with it thatare as follows:[1]Volume: It is the amount of data to be generated a huge : It is the speed at which the data : It refers to the different kind data which . Challenges Faced by Big DataThere are two main challenges faced by big data [2]i. How to store and manage huge volume of . How do we process and extract valuableinformation from huge volume data within a giventime main challenges lead to the development ofhadoop is an open source framework developed byduck cutting in 2006 and managed by the apachesoftware foundation. Hadoop was named after yellowtoy was designed to store and process dataefficiently. Hadoop framework comprises of two maincomponents that are:i. HDFS: It stands for Hadoop distributed filesystem which takes care of storage of data withinhadoop . MAPREDUCE: it takes care of a processing of adata that is present in the let’s just have a look on Hadoop cluster:Here in this there are two nodes that are Master Nodeand slave node is responsible for Name node and JobTracker demon. Here node is technical term used todenote machine present in the cluster and demon isthe technical term used to show the backgroundprocesses running on a Linux slave node on the other hand is responsible forrunning the data node and the task tracker name node and data node are responsible forstoring and managing the data and commonly referredto as storage node. Whereas the job tracker and tasktracker is responsible for processing and computing adata and commonly known as Compute the name node and job tracker runs on asingle machine whereas a data node and task trackerruns on different . Features Of Hadoop:[3]i. Cost effective system: It does not require anyspecial hardware. It simply can be implementedin a common machine technically known ascommodity . Large cluster of nodes: A hadoop system cansupport a large number of nodes which providesa huge storage and processing . Parallel processing: a hadoop cluster provide theaccessibility to access and manage data parallelwhich saves a lot of . Distributed data: it takes care of splinting anddistributing of data across all nodes within a also replicates the data over the entire . Automatic failover management: once and AFMis configured on a cluster, the admin needs not toworry about the failed machine. Hadoop replicatesthe configuration Here one copy of each data iscopied or replicated to the node in the same rackand the hadoop take care of the internetworkingbetween two . Data locality optimization: This is the mostpowerful thing of hadoop which make it the mostefficient feature. Here if a person requests for ahuge data which relies in some other place, themachine will sends the code of that data and thenother person compiles it and use it in particularas it saves a log to bandwidthvii. Heterogeneous cluster: node or machine can beof different vendor and can be working ondifferent flavor of operating . Scalability: in hadoop adding a machine orremoving a machine does not effect on a the adding or removing the component ofmachine does . Hadoop ArchitectureHadoop comprises of two componentsi. HDFSii. MAPREDUCEHadoop distributes big data in several chunks and storedata in several nodes within a cluster whichsignificantly reduces the replicates each part of data into each machinethat are present within the no. of copies replicated depends on the replicationfactor. By default the replication factor is 3. Thereforein this case there are 3 copies to each data on 3 differentmachines。reference:Mahajan, P., Gaba, G., & Chauhan, N. S. (2016). Big Data Security. IITM Journal of Management and IT, 7(1), 89-94.自己拿去翻译网站翻吧,不懂可以问