Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

April 6, 2018 | Author: Anonymous | Category: Technology
Report this link


Description

1. Combine Apache Hadoop & Elasticsearch to get the most of your big data...© Hortonworks Inc. 2013Page 1 2. Your Presenters Steve Mayzak (@smayzak) –  Head of Sales Engineering –  Seahawks fan!Mark Lochbihler (@mlochbihler) – Partner Solutions Engineer – HUGE FC Barcelona Fan!© Hortonworks Inc. 2013Page 2 3. Today’s Topics • Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A© Hortonworks Inc. 2013Page 3 4. Hadoop Adoption “Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data” --Mike Gualtieri, Forrester© Hortonworks Inc. 2013Page 4 5. APPLICATIONS  A Traditional Approach Under Pressure Custom   Applica4ons  Business     Analy4cs  Packaged   Applica4ons  DATA    SYSTEM  2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS  EDW  MPP  REPOSITORIES  15x  Machine  Data  by  2020   40  ZB  by  2020  SOURCES  Source: IDCExis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  © Hortonworks Inc. 2013Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  Page 5 6. APPLICATIONS  Emerging Modern Data Architecture Custom   Applica4ons  Business     Analy4cs  Packaged   Applica4ons   DEV  &  DATA   TOOLS  SOURCES  DATA    SYSTEM  BUILD  &   TEST  OPERATIONAL   TOOLS   RDBMS  EDW  MANAGE  &   MONITOR  MPP  REPOSITORIES  Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  © Hortonworks Inc. 2013Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  Page 6 7. MDA Driver #1: A New Approach to Insight Current Approach §  Apply schema on write §  Heavily dependent on ITHadoop Approach §  Apply schema on read §  Support range of access patterns to data stored in HDFS: polymorphic accessSingle Query Engine SQL Determine list of questions Design solutionRight Engine, Right Job batchinteractivereal-timein-memoryCollect structured data Ask questions from list Detect additional questions© Hortonworks Inc. 2013HADOOP Iterate over structure Transform and AnalyzePage 7 8. MDA Driver #2: Data Warehouse Optimization Current Reality §  EDW at capacity; some usage from low value workloads §  Older transformed data archived, unavailable for ongoing exploration §  Source data often discardedAugment with Hadoop §  Free up EDW resources from low value tasks §  Keep 100% of source data and historical data for ongoing exploration §  Mine data for value after loading it because of schema-on-readAnalytics20%ETL Process30%Analytics50%Operations50%Operations50%© Hortonworks Inc. 2013HADOOP Parse, cleanse, apply structure, transform Page 8 9. SCALEThe Common Journey with Hadoop MDA/Data LakeMore data andanalytic apps Cost, InsightIT Driven New Analytic AppsNew Types of DataLOB Driven SCOPE © Hortonworks Inc. 2013Page 9 10. Unlock Value in New Types of Data 1.  Social Understand how people are feeling and interacting – right now2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website3.  Sensor/Machine Discover patterns in data streaming from remote sensors and machines4.  GeographicValueAnalyze location-based data to manage operations where they occur5.  Server Logs Diagnose process failures and prevent security breaches6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents© Hortonworks Inc. 2013+ Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden valuePage 10 11. 20 Business Applications of Hadoop IndustryUse Case New Account Risk ScreensGeographic Clickstream SensorAssembly Line Quality AssuranceSensorCrowdsourced Quality AssuranceSocialUse Genomic Data in Medical TrialsStructuredMonitor Patient Vitals in Real-TimeSensorRecruit and Retain Patients for Drug TrialsSocial, ClickstreamImprove Prescription AdherenceSocial, Unstructured, GeographicUnify Exploration & Production DataSensor, Geographic & UnstructuredMonitor Rig Safety in Real-Time© Hortonworks Inc. 2013Clickstream, TextSupply Chain and LogisticsGovernmentServer Logs, Text, SocialWebsite OptimizationOil & GasMachine, Server LogsLocalized, Personalized PromotionsPharmaceuticalsMachine, Geographic360° View of the CustomerHealthcareGeographic, Sensor, TextReal-time Bandwidth AllocationManufacturingServer LogsInfrastructure InvestmentRetailTrading RiskCall Detail Records (CDRs)TelecomText, Server LogsInsurance UnderwritingFinancial ServicesType of DataSensor, UnstructuredETL Offload in Response to Federal Budgetary PressuresStructuredSentiment Analysis for Government ProgramsSocial Page 11 12. YARN Unlocks the Data Lake Vision Store all data in one place, interact in multiple ways Single Use SystemMulti-Use Data PlatformBatch AppsBatch, Interactive, Online, Streaming, …1st Gen of Hadoop2nd Gen of Hadoop Classic   Hadoop   Apps   Batch   MapReduce  MapReduce  Hive,  Pig,  others…   Batch  &  Interac4ve   Tez  Flexible  Data   Processing  Online  Data     Processing  HBase,  Accumulo  Stream     Processing   Storm  (cluster  resource  management    &  data  processing)  Efficient  Cluster  Resource     Management  &  Shared  Services  HDFS     others   …  Redundant,  Reliable  Storage  (redundant,  reliable  storage)  © Hortonworks Inc. 2013(YARN)  (HDFS)  Page 12 13. SCALEThe Common Journey with Hadoop MDA/Data LakeMore data andanalytic apps Cost, InsightIT Driven New Analytic AppsNew Types of DataLOB Driven SCOPE © Hortonworks Inc. 2013Page 13 14. Example Journey Towards a Data LakePB’sData LakePBRisk Management E.g., Fraud ReductionNew Business E.g., Data as a ProductDATATB’sCustomer Intimacy E.g., 360 Degree View of the CustomerDATA LAKE Operational Excellence E.g., Network MaintenanceAn architectural shift in the data center that uses Hadoop to deliver deep insight across a large, broad, diverse set of data at efficient scaleVALUE © Hortonworks Inc. 2013Page 14 15. Enabling Hadoop for the Enterprise1 2 3Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit allIntegration Interoperable with existing data center investmentsSkills Leverage your existing skills: development, analytics, operations2006© Hortonworks Inc. 2013200720082009201020112012201320142015Page 15 16. Core Capabilities of Enterprise Hadoop1  Presenta4on  &  Applica4on  Enable  both  exis4ng  and  new  applica4ons  to  provide     value  to  the  organiza4on  Capabilities  Opera4ons  Empower  Current  opera4ons  and   security  tools  to  manage  Hadoop  Ensure enterprise capabilities are delivered in 100% open source to benefit allData   Governance    BROAD  INSIGHT   Data  Access  Integrate  with   exis4ng  systems   and  move  data   in/out  and  within   the  environment  Access  your  data  simultaneously  in  mul4ple  ways   (batch,  interac4ve,  real4me)    EFFICIENT  SCALE  Security  Provide  layered   approach  to   security  through   Authen4ca4on,   Authoriza4on,   Accountability   and  Data   Protec4on  Opera4ons   Allow  you  to   deploy  and   effec4vely   manage  the   environment  Data  Management  Store  and  process  all  of  your  Corporate  Data  Assets    Deployment  Model  Provide  the  efficient  deployment  op4on  for  your  organiza4on    © Hortonworks Inc. 2013Page 16 17. 3Skills Leverage your existing skills: development, analytics, operationsIntegration Interoperable with existing data center investments© Hortonworks Inc. 2013ANALYST  2Ensure enterprise capabilities are delivered in 100% open source to benefit allOPERATOR  1CapabilitiesDEVELOPER  Enabling Familiar and Existing ToolsCOLLECT  PROCESS  BUILD  EXPLORE  QUERY  DELIVER  PROVISION  MANAGE  MONITO R  Page 17 18. APPLICATIONS  Requirements for Enterprise Hadoop1 DATA    SYSTEM  2 SOURCES  3Business     Analy4cs   CapabilitiesCustom   Applica4ons  Packaged   Applica4ons  Ensure enterprise capabilities are delivered in 100% open source to benefit allIntegrate with DEV  &  DATA   TOOLS  Applications BUILD  &  Business Intelligence, TEST   Developer IDEs, Data IntegrationSkillsOPERATIONAL   TOOLS  Leverage your existing RDBMS   EDW   skills: development, MPP   analytics, operationsMANAGE  &   Systems MONITOR  IntegrationPlatformsData Systems & Storage, Systems ManagementREPOSITORIES  Interoperable with existing data center investmentsExis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  © Hortonworks Inc. 2013Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  Operating Systems, Virtualization, Cloud, AppliancesPage 18 19. DATA  SYSTEM  APPLICATIONS  Elasticsearch in the Modern Data ArchitectureDEV  &  DATA  TOOLS  OPERATIONAL  TOOLS   RDBMS  EDW  HANAMPP  SOURCES  INFRASTRUCTURE  Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  © Hortonworks Inc. 2013Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  Page 19 20. Today’s Topics • Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A© Hortonworks Inc. 2013Page 20 21. What is Elasticsearch?Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 22. open-source RESTful API JSON over HTTP scales massivelyhigh availabilityschema free Elasticsearchreal time, search and analytics engine Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited Lucene baseddistributedmulti tenancy 23. The Elasticsearch ELK StackLogstash Elasticsearch Kibana Data From Any Source Instantly Analyze Actionable Insights Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 24. What about Elasticsearch the Company? •  Support 100s of Companies in Production environments •  Training Developers and Ops around the world on ELK •  Drive the ELK Projects forward, great things to come! •  Commercial products: Marvel to monitor and manage ELK •  Backed by the best: Benchmark, Index VenturesCopyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 25. Who’s using Elasticsearch?Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 26. What are people saying about Elasticsearch? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 27. Real-time Search • Europe’s largest professional social network• Over 14 Million members• New data available for search immediately vs 50 mins• “According to the customer survey that we conduct every quarter, search is the most important feature on our platform,” Dr. Daniel Olmedilla, Vice President, Data Science at XINGCopyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 28. How do they fit together?Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 29. ElasticsearchIndex seamlessly Free Text SearchAnalytics Elasticsearch-Hadoop LibraryIntegrate Natively Choice Clean, Enrich Raw data Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 30. Elasticsearch-Hadoop Library • Java Library for integrating Elasticsearch and Hadoop• Pig, Hive, Cascading, MapReduce• Search & Real-time Analytics with Elasticsearch, Hadoop as Data Lake• Scales with Hadoop• Works with Apache Hadoop, Certified on HDP 1.x and 2.x (Yarn compatible Binary)Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 31. Multiple Architectures -Same Hardware-1 for 1 Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 32. Multiple ArchitecturesES ES ES NodeNodeNode -Separate Hardware-Clusters of each-Scale Independently Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 33. Show me! • Hortonworks HDP Sandbox - making Hadoop easy!• Installed Elasticsearch, Marvel and Kibana on Sandbox• Upload elasticsearch-hadoop jar as Pig Storage lib• Index CSV data from Pig to Elasticsearch• Query Elasticsearch from Pig - best of both• Kibana to Visualize and DiscoverCopyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 34. Where to find us?elasticsearch.com elasticsearch.org @elasticsearch #elasticsearch IRC (webchat.freenode) Github elasticsearch/elasticsearchCopyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited 35. Try Hadoop Today… Get Involved More about Elasticsearch & Hortonworks hortonworks.com/partner/elasticsearchDownload the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2Contact us: [email protected] © Hortonworks Inc. 2013Page 35


Comments

Copyright © 2024 UPDOCS Inc.