Big Data & Analytics (Version 2) – IoT Fundamentals: Big Data and Analytics End of Course Assessment Final Exam Answers Full Questions
1. A affected person who lives in Northern Canada has an MRI taken. The outcomes of the medical process are immediately transmitted to a specialist in Toronto who will evaluation the findings. Which three characteristics would finest describe the affected person data being transmitted? (Choose three.)

* non-public
* unstructured
* random
* structured
* in movement
* at rest

Explanation: Electronic medical info is private private information. The digital results of exams similar to x-rays, MRIs, and ultrasounds do not have a format of fixed fields so they’re considered unstructured. Because the knowledge is transmitted from one place to another for evaluate, it would be in movement. Data is at relaxation as soon as it is stored in a knowledge heart.

2. Which three key words are used to explain the distinction between Big Data and data? (Choose three.)

* volume
* vigor
* variety
* value
* velocity
* vibrancy

Explanation: Three key words can help distinguish knowledge from Big Data:

Volume – describes the amount of Big Data being transported and stored
Velocity – describes the rapid rate at which Big Data is transferring
Variety – describes the type of Big Data, which is never in a state that’s perfectly ready for processing and evaluation

three. What are three types of structured data? (Choose three.)

* e-commerce consumer accounts
* spreadsheet knowledge
* blogs
* white papers
* newspaper articles
* knowledge in relational databases

Explanation: Structured information is entered and maintained in mounted fields within a file or report, corresponding to data present in relational databases and spreadsheets. Structured information entry requires a certain format to reduce errors and make it simpler for computer interpretation.

four. What are two plain-text file types that are suitable with quite a few functions and use a normal technique of representing knowledge records? (Choose two.)

Explanation: As knowledge is collected from various sources and in varying codecs, it’s beneficial to utilize particular file varieties that permit simple conversion and common utility support. CSV, JSON, and XML are plain textual content file types that enable for accumulating and analyzing of knowledge in a format that is simply compatible and applicable for analysis.

5. Match the algorithm to the kind of studying algorithm.

6. Which two duties are part of the reworking knowledge process? (Choose two.)

* creating visual representations of the info
* amassing information required to carry out the analysis
* becoming a member of knowledge from multiple sources
* utilizing rules to change the source knowledge to the sort of knowledge wanted for a target database
* presenting the information gained from the data

Explanation: Transforming knowledge is the method of modifying information into a usable form. This consists of tasks similar to aggregating knowledge and sorting it. Collecting knowledge is the method of extracting data. Creating visible representations of information and presenting the data gained from the information are examples of the final steps which may be used in knowledge evaluation.

7. Match the kind of chart with the most effective use.

eight. What two benefits are gained when an organization adopts cloud computing and virtualization? (Choose two.)

* elimination of vulnerabilities to cyber assaults
* supplies a “pay-as-you-go” model, allowing organizations to treat computing and storage expenses as a utility
* permits fast responses to growing data volume necessities
* distributed processing of huge information units in the measurement of terabytes
* will increase the dependance on onsite IT assets

Explanation: Organizations can use virtualization to consolidate the number of required servers by working many virtual servers on a single bodily server. Cloud computing permits organizations to scale their options as required and to pay just for the sources they require.

9. Match the sort of error to the corresponding source of the error.

10. What are two features supported by NoSQL databases? (Choose two.)

* establishing relationships inside saved data
* relying on the relational database approach of linked tables
* importing unstructured information
* organizing data in columns, tables, and rows
* using the key-value storing approach

Explanation: NoSQL databases can use a key-value pair strategy to retailer information. A NoSQL database can import unstructured data. Organizing information in columns, tables, and rows; establishing relationships within the saved knowledge. NoSQL doesn’t rely on the relational database strategy of linked tables.

11. Match the phrases to the definition. (Not all options are used.)

12. What are two advantages of utilizing CFS over HDFS? (Choose two.)

* low-cost storage answer
* specialised hardware
* capacity to run a single database throughout a number of data facilities
* automatic failover of nodes, clusters, and data centers
* master-slave architecture

Explanation: Some of the advantages of using CFS over HDFS are are follows:
Better availability – CFS does not require shared storage solutions.
Basic hardware support – No particular servers are wanted and no particular community units are wanted for CFS.
Data integration – All knowledge that is written to CFS is replicated to both analytics and search nodes.
Automatic failover – As with availability, failover is automated due to replication.
Easier deployment – Clusters are simple to setup and could be running in a matter of minutes. CFS doesn’t require difficult storage requirements or master-slave configurations.
Supports a quantity of knowledge centres – CFS can run a single database across a quantity of knowledge facilities.

thirteen. Match each term to the correct definition. (Not all choices are used.)

14. With the variety of sensors and different end gadgets growing exponentially, which type of device is more and more used to higher handle Internet traffic for methods which are in motion?

* proxy servers
* cellular towers
* mobile routers
* Wi-Fi access points

Explanation: The fast improve of gadgets within the IoT is among the major causes for the exponential progress in data technology. With the number of sensors and other end devices rising exponentially, mobile routers are increasingly used to raised manage Internet visitors for techniques that are in movement.

15. Match the statistical term with the outline.

16. Which kind of knowledge helps managerial analysis in figuring out whether or not the company ought to expand its manufacturing facility?

* transactional
* analytical
* comparative
* capital

Explanation: The two main types of business info helpful to an organization are transactional info and analytical data. Transactional information is captured and stored as occasions happen. Transactional info can be utilized to research every day sales stories and production schedules to determine how a lot inventory to hold. Analytical data helps managerial evaluation duties like figuring out whether the organization ought to build a new manufacturing plant or hire extra sales personnel.

17. What networking technology is used when a company with a number of locations requires information and analysis available close to their community edge?

* fog computing
* Hadoop
* NoSQL
* virtualization

Explanation: Fog computing offers knowledge, compute, storage, and software companies to end-users. Fog traits embody proximity to end-users, dense geographical distribution, and assist for mobility. Services are hosted at the community edge or even on end devices corresponding to set-top-boxes or entry factors.

18. How is the Big Data infrastructure totally different from the traditional information infrastructure?

* Big Data platforms distribuite data on several computing and storage nodes.
* Security is integrated in all elements related to Big Data.
* Big Data includes fewer individuals within the group that may access the data.
* The Big Data infrastructure requires proprietary products and protocols to implement.

Explanation: In the Big Data infrastructure, applications, logs, occasion knowledge, sensor knowledge, mobility data, social media, and stream information may all provide knowledge into the Big Data infrastructure that would involve knowledge facilities, NoSQL, traditional database servers, storage, and Hadoop-based technology.

19. What is an instance of a relational database?

* Excel spreadsheet
* Hadoop
* Visual Network Index
* SQL server

Explanation: Two well-liked relational database management techniques are Oracle and SQL.

20. Match the variable with the outline.

21. What is a purpose of descriptive statistics?

* to check groups of knowledge units
* to make predictions about different values
* to summarize findings within a knowledge set
* to make generalizations about a inhabitants

Explanation: There are two types of statistics used within the analysis of knowledge: descriptive and inferential. Descriptive statistics are used to describe or summarize values in a knowledge set. Inferential statistics are used to make predictions about knowledge.

22. Five hundred persons are working in an office. For a examine, which term describes a group of fifty people that have been chosen to symbolize the whole office?

* class
* sample
* cluster
* group

Explanation: A population shares a common set of characteristics. Because it’s typically not possible to study a whole inhabitants, a consultant pattern of the inhabitants, known as a sample, is chosen for evaluation.

23. Which functionality does pandas provide to a Python environment?

* a set of APIs to allow sensors to send information to a Raspberry Pi
* an enhanced chip for processing graphical info
* a set of information structures and tools for knowledge analysis
* an algorithm to generate random numbers

Explanation: Pandas is an open supply library with high-performance information structures and tools for evaluation of enormous knowledge sets.

24. A knowledge analyst performs a correlation evaluation between two portions. The result of the analysis is an r value of zero.9. What does this mean?

* The two variables have nearly the same values.
* One variable keeps its value at 90% of the opposite variable.
* When one variable increases its value, the other variable decreases its worth.
* When one variable will increase its value, the other variable increases its worth in a really related fashion.

Explanation: The commonly used correlation coefficient, Pearson r, (or r value), is a amount that is expressed as a price between -1 and 1. Positive values point out a optimistic relationship between the modifications in two portions. Negative values point out an inverse relationship. The magnitude of both the optimistic or adverse values indicates the degree of correlation. The closer the worth is to 1 or -1, the stronger the connection.

25. A information analyst is processing a data set with pandas and notices a NaT. Which information kind is predicted for the lacking data?

* string
* timestamp
* object
* integer
* float

Explanation: In a pandas data set, NaN is used to indicate an undefined string, integer, or float. NaT is used to indicate a lacking timestamp.

26. Which type of studying algorithm can predict the value of a variable of a mortgage rate of interest based mostly on the worth of different variables?

* classification
* regression
* clustering
* affiliation

Explanation: An instance of how a regression algorithm could be used is to predict the price of a house by taking a look at variables corresponding to crime price, average income level in the neighborhood, and how far the home is from a college.

27. In a regression evaluation, which variable is called the predictor or explanatory variable?

* unbiased
* first
* prime
* dependent

Explanation: The dependent variable is recognized as the goal or response variable. The impartial variable is also identified as the predictor or explanatory variable.

28. When you carry out an experiment and follow the scientific method, what is step one that you need to take?

* Analyze gathered knowledge.
* Ask questions about an observation.
* Form a hypothesis.
* Perform research.

Explanation: The scientific methodology is often utilized in scientific discovery and accommodates the following steps:
Step 1. Ask a query about an statement corresponding to what, when, how, or why.
Step 2. Perform analysis.
Step 3. Form a hypothesis from this research.
Step four. Test the hypothesis by way of experimentation.
Step 5. Analyze the data from the experiments to draw a conclusion.
Step 6. Communicate the results of the method.

29. Which kind of validity is being used when a researcher compares the original conclusion towards different people in different places at other times?

* construct
* conclusion
* internal
* exterior

Explanation: Researchers generally carry out verification checks using 4 kinds of validity:
Construct validity – Does the examine really measure what it claims to measure?
Internal validity – Was the experiment really designed correctly? Did it embody all the steps of the scientific method?
External validity – Can the conclusions apply to other conditions or different individuals elsewhere at other times? Are there any other casual relationships within the examine that might trigger the results?
Conclusion validity – Based on the relationships in the information, are the conclusions of the examine reasonable?

30. Refer to the exhibit. What type of information exists outside of the choice boundary?

* historical
* big
* normal
* anomalous

Explanation: A scientist must calculate a decision boundary to detect anomalies. Anomalous data points are factors that lie past the choice boundary sphere.

31. What is a matplotlib module that contains a assortment of style functions?

Explanation: Pyplot is a matplotlib module that contains a collection of fashion capabilities. It can be utilized to create and customise a plot.

32. Which tool is on the market on-line and is used to create knowledge visualizations that embrace API libraries, determine converters, apps, and an open source JavaScript library?

Explanation: Plotly is a web-based tool that can be used to quickly generate knowledge visualizations. Plotly provides quite so much of sources for knowledge analysts and web builders including API libraries, figure converters, apps for Google Chrome, and an open source JavaScript library.

33. Which companies are supplied by a private cloud?

* on-line companies to trusted vendors
* multiple inside IT companies in an enterprise
* safe communications between sensors and actuators
* encrypted information storage in cloud computing

Explanation: Large enterprises usually have their own information middle to handle data storage and data processing wants. The information heart can be utilized to serve internal IT needs. In different words, the information heart turns into a non-public cloud, a cloud computing infrastructure only for inside providers.

34. Match the task and function to the suitable Big Data analytics methodology. (Not all options are used.)

Explanation: Data analytics, utilized to Big Data, can be categorized into three main varieties:

* Descriptive – provides details about the past state or efficiency of a person or an organization
* Predictive – makes an attempt to predict the lengthy run, based mostly on data and evaluation, or what is going to occur subsequent
* Prescriptive – predicts outcomes and suggests courses of actions that will hold the greatest profit to an organization

35. Which service is an example of an extension to the cloud computing services outlined by the National Institute of Standards and Technology?

Explanation: The National Institute of Standards and Technology (NIST) defines three major cloud computing providers, IaaS, PaaS, SaaS, of their Special Publication . Cloud service providers have prolonged this mannequin to also provide IT support for every of the cloud computing providers (ITaaS).

36. What is the principle operate of a hypervisor?

* It is used to create and manage multiple VM situations on a number machine.
* It is a tool that filters and checks safety credentials.
* It is software used to coordinate and prepare knowledge for analysis.
* It is a tool that synchronizes a gaggle of sensors.
* It is utilized by ISPs to watch cloud computing assets.

Explanation: A hypervisor is a key part of virtualization. A hypervisor is commonly software-based and is used to create and handle multiple VM cases.

37. Which answer improves the availability of big data purposes by keeping regularly requested data in reminiscence for fast access?

* sharding
* load balancing
* distributed databases
* memcaching

Explanation: Maintaining availability is the first concern for companies working with huge data. Some solutions to improve the availability embrace the following:
Load Balancing – deploying a number of web servers and DNS servers to reply to requests concurrently
Distributed Databases – improving database entry pace and demands
Memcaching – offloading demand on database servers by keeping frequently requested knowledge out there in reminiscence for fast access
Sharding – partitioning a large relational database across multiple servers to improve search speed

38. What is the first component in the big information pipeline?

* data processing
* information storage
* data transportation
* information ingestion

Explanation: The three fundamental parts of the massive knowledge pipeline are information ingestion, knowledge storage, and knowledge processing or compute.

39. Match the description to the right kind of knowledge safety. (Not all choices are used.)

40. How are file changes dealt with by Cassandra?

* A new file is created and the old deleted.
* Both variations are maintained.
* Changes are prepended.
* Changes are appended.

Explanation: Cassandra uses sequential read and writes to take care of quick speeds. Instead of appending information, when an addition to a file or a removing of information to a file happens, a model new file is created, and the old file or files are deleted.