Frank Kane's Taming Big Data with Apache Spark and Python by Frank Kane PDF

By Frank Kane

ISBN-10: 1787287947

ISBN-13: 9781787287945

Key Features

  • Understand how Spark should be dispensed throughout computing clusters
  • Develop and run Spark jobs successfully utilizing Python
  • A hands-on instructional by way of Frank Kane with over 15 real-world examples educating you huge information processing with Spark

Book Description

Frank Kane's Taming colossal information with Apache Spark and Python is your significant other to studying Apache Spark in a hands-on demeanour. Frank will begin you off via instructing you ways to establish Spark on a unmarried approach or on a cluster, and you may quickly stream directly to studying huge facts units utilizing Spark RDD, and constructing and working potent Spark jobs quick utilizing Python.

Apache Spark has emerged because the subsequent significant factor within the large information area – speedy emerging from an ascending expertise to a longtime celebrity in precisely an issue of years. Spark permits you to fast extract actionable insights from quite a lot of information, on a real-time foundation, making it a vital device in lots of smooth businesses.

Frank has packed this publication with over 15 interactive, fun-filled examples correct to the true international, and he'll empower you to appreciate the Spark environment and enforce production-grade real-time Spark initiatives with ease.

What you are going to learn

  • Find out how one can determine sizeable facts difficulties as Spark problems
  • Install and run Apache Spark in your machine or on a cluster
  • Analyze huge facts units throughout many CPUs utilizing Spark's Resilient disbursed Datasets
  • Implement computing device studying on Spark utilizing the MLlib library
  • Process non-stop streams of knowledge in genuine time utilizing the Spark streaming module
  • Perform complicated community research utilizing Spark's GraphX library
  • Use Amazon's Elastic MapReduce carrier to run your Spark jobs on a cluster

About the Author

My identify is Frank Kane. I spent 9 years at Amazon and IMDb, wrangling hundreds of thousands of purchaser rankings and client transactions to supply issues similar to customized techniques for videos and items and "people who acquired this additionally bought." I let you know, I want we had Apache Spark again then, while I spent years attempting to clear up those difficulties there. I carry 17 issued patents within the fields of disbursed computing, information mining, and computer studying. In 2012, I left to begin my very own winning corporation, Sundog software program, which specializes in digital truth setting know-how, and instructing others approximately great info analysis.

Table of Contents

  1. Getting begun with Spark
  2. Spark fundamentals and easy Examples
  3. Advanced Examples of Spark Programs
  4. Running Spark on a Cluster
  5. SparkSQL, Dataframes and Datasets
  6. Other Spark applied sciences and Libraries
  7. Where to head From right here? - studying extra approximately Spark and information Science

Show description

Read Online or Download Frank Kane's Taming Big Data with Apache Spark and Python PDF

Similar data modeling & design books

Integrating Geographic Information Systems and Agent-Based - download pdf or read online

This quantity provides a collection of coherent, cross-referenced views on incorporating the spatial illustration and analytical energy of GIS with agent-based modelling of evolutionary and non-linear approaches and phenomena. Many fresh advances in software program algorithms for incorporating geographic info in modeling social and ecological behaviors, and successes in using such algorithms, had now not been thoroughly mentioned within the literature.

Circos Data Visualization How-to - download pdf or read online

In DetailCompanies, non-profit organisations, and governments are gathering a large number of information. Analysts and photograph designers are confronted with a problem of conveying info to a large viewers. This publication introduces Circos, an artistic application to reveal tables in an attractive visualization. Readers will how to set up, create, and customise Circos diagrams utilizing real-life examples from the social sciences.

Download PDF by David Bihanic: New Challenges for Data Design

The current paintings presents a platform for top information designers whose imaginative and prescient and creativity support us to count on significant alterations taking place within the information layout box, and pre-empt the longer term. every one of them strives to supply new solutions to the query, “What demanding situations look ahead to facts layout? ” to prevent falling into too slender a way of thinking, each one works challenging to explain the breadth of knowledge layout this present day and to illustrate its frequent software throughout a number of company sectors.

Download e-book for kindle: Learning Qlik® Sense: The Official Guide by Christopher Ilacqua,Henric Cronström,James Richardson

Familiarize yourself with the imaginative and prescient of Qlik feel for subsequent new release company intelligence and knowledge discoveryAbout This BookGet insider perception on Qlik feel and its new method of enterprise intelligenceCreate your personal Qlik feel purposes, and administer server architectureExplore functional demonstrations for using Qlik experience to find information for revenues, human assets, and moreWho This publication Is ForLearning Qlik® feel is for a person looking to comprehend and make the most of the progressive new method of enterprise intelligence provided through Qlik experience.

Extra info for Frank Kane's Taming Big Data with Apache Spark and Python

Sample text

Download PDF sample

Frank Kane's Taming Big Data with Apache Spark and Python by Frank Kane


by Richard
4.3

Rated 4.09 of 5 – based on 31 votes