1. Combine Apache Hadoop & Elasticsearch to get the most of your big data...© Hortonworks Inc. 2013Page 1 2. Your Presenters Steve Mayzak (@smayzak) – Head of Sales Engineering – Seahawks fan!Mark Lochbihler (@mlochbihler) – Partner Solutions Engineer – HUGE FC Barcelona Fan!© Hortonworks Inc. 2013Page 2 3. Today’s Topics • Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A© Hortonworks Inc. 2013Page 3 4. Hadoop Adoption “Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data” --Mike Gualtieri, Forrester© Hortonworks Inc. 2013Page 4 5. APPLICATIONS A Traditional Approach Under Pressure Custom Applica4ons Business Analy4cs Packaged Applica4ons DATA SYSTEM 2.8 ZB in 2012 85% from New Data Types RDBMS EDW MPP REPOSITORIES 15x Machine Data by 2020 40 ZB by 2020 SOURCES Source: IDCExis4ng Sources (CRM, ERP, Clickstream, Logs) © Hortonworks Inc. 2013Emerging Sources (Sensor, Sen4ment, Geo, Unstructured) Page 5 6. APPLICATIONS Emerging Modern Data Architecture Custom Applica4ons Business Analy4cs Packaged Applica4ons DEV & DATA TOOLS SOURCES DATA SYSTEM BUILD & TEST OPERATIONAL TOOLS RDBMS EDW MANAGE & MONITOR MPP REPOSITORIES Exis4ng Sources (CRM, ERP, Clickstream, Logs) © Hortonworks Inc. 2013Emerging Sources (Sensor, Sen4ment, Geo, Unstructured) Page 6 7. MDA Driver #1: A New Approach to Insight Current Approach § Apply schema on write § Heavily dependent on ITHadoop Approach § Apply schema on read § Support range of access patterns to data stored in HDFS: polymorphic accessSingle Query Engine SQL Determine list of questions Design solutionRight Engine, Right Job batchinteractivereal-timein-memoryCollect structured data Ask questions from list Detect additional questions© Hortonworks Inc. 2013HADOOP Iterate over structure Transform and AnalyzePage 7 8. MDA Driver #2: Data Warehouse Optimization Current Reality § EDW at capacity; some usage from low value workloads § Older transformed data archived, unavailable for ongoing exploration § Source data often discardedAugment with Hadoop § Free up EDW resources from low value tasks § Keep 100% of source data and historical data for ongoing exploration § Mine data for value after loading it because of schema-on-readAnalytics20%ETL Process30%Analytics50%Operations50%Operations50%© Hortonworks Inc. 2013HADOOP Parse, cleanse, apply structure, transform Page 8 9. SCALEThe Common Journey with Hadoop MDA/Data LakeMore data andanalytic apps Cost, InsightIT Driven New Analytic AppsNew Types of DataLOB Driven SCOPE © Hortonworks Inc. 2013Page 9 10. Unlock Value in New Types of Data 1. Social Understand how people are feeling and interacting – right now2. Clickstream Capture and analyze website visitors’ data trails and optimize your website3. Sensor/Machine Discover patterns in data streaming from remote sensors and machines4. GeographicValueAnalyze location-based data to manage operations where they occur5. Server Logs Diagnose process failures and prevent security breaches6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents© Hortonworks Inc. 2013+ Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden valuePage 10 11. 20 Business Applications of Hadoop IndustryUse Case New Account Risk ScreensGeographic Clickstream SensorAssembly Line Quality AssuranceSensorCrowdsourced Quality AssuranceSocialUse Genomic Data in Medical TrialsStructuredMonitor Patient Vitals in Real-TimeSensorRecruit and Retain Patients for Drug TrialsSocial, ClickstreamImprove Prescription AdherenceSocial, Unstructured, GeographicUnify Exploration & Production DataSensor, Geographic & UnstructuredMonitor Rig Safety in Real-Time© Hortonworks Inc. 2013Clickstream, TextSupply Chain and LogisticsGovernmentServer Logs, Text, SocialWebsite OptimizationOil & GasMachine, Server LogsLocalized, Personalized PromotionsPharmaceuticalsMachine, Geographic360° View of the CustomerHealthcareGeographic, Sensor, TextReal-time Bandwidth AllocationManufacturingServer LogsInfrastructure InvestmentRetailTrading RiskCall Detail Records (CDRs)TelecomText, Server LogsInsurance UnderwritingFinancial ServicesType of DataSensor, UnstructuredETL Offload in Response to Federal Budgetary PressuresStructuredSentiment Analysis for Government ProgramsSocial Page 11 12. YARN Unlocks the Data Lake Vision Store all data in one place, interact in multiple ways Single Use SystemMulti-Use Data PlatformBatch AppsBatch, Interactive, Online, Streaming, …1st Gen of Hadoop2nd Gen of Hadoop Classic Hadoop Apps Batch MapReduce MapReduce Hive, Pig, others… Batch & Interac4ve Tez Flexible Data Processing Online Data Processing HBase, Accumulo Stream Processing Storm (cluster resource management & data processing) Efficient Cluster Resource Management & Shared Services HDFS others … Redundant, Reliable Storage (redundant, reliable storage) © Hortonworks Inc. 2013(YARN) (HDFS) Page 12 13. SCALEThe Common Journey with Hadoop MDA/Data LakeMore data andanalytic apps Cost, InsightIT Driven New Analytic AppsNew Types of DataLOB Driven SCOPE © Hortonworks Inc. 2013Page 13 14. Example Journey Towards a Data LakePB’sData LakePBRisk Management E.g., Fraud ReductionNew Business E.g., Data as a ProductDATATB’sCustomer Intimacy E.g., 360 Degree View of the CustomerDATA LAKE Operational Excellence E.g., Network MaintenanceAn architectural shift in the data center that uses Hadoop to deliver deep insight across a large, broad, diverse set of data at efficient scaleVALUE © Hortonworks Inc. 2013Page 14 15. Enabling Hadoop for the Enterprise1 2 3Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit allIntegration Interoperable with existing data center investmentsSkills Leverage your existing skills: development, analytics, operations2006© Hortonworks Inc. 2013200720082009201020112012201320142015Page 15 16. Core Capabilities of Enterprise Hadoop1 Presenta4on & Applica4on Enable both exis4ng and new applica4ons to provide value to the organiza4on Capabilities Opera4ons Empower Current opera4ons and security tools to manage Hadoop Ensure enterprise capabilities are delivered in 100% open source to benefit allData Governance BROAD INSIGHT Data Access Integrate with exis4ng systems and move data in/out and within the environment Access your data simultaneously in mul4ple ways (batch, interac4ve, real4me) EFFICIENT SCALE Security Provide layered approach to security through Authen4ca4on, Authoriza4on, Accountability and Data Protec4on Opera4ons Allow you to deploy and effec4vely manage the environment Data Management Store and process all of your Corporate Data Assets Deployment Model Provide the efficient deployment op4on for your organiza4on © Hortonworks Inc. 2013Page 16 17. 3Skills Leverage your existing skills: development, analytics, operationsIntegration Interoperable with existing data center investments© Hortonworks Inc. 2013ANALYST 2Ensure enterprise capabilities are delivered in 100% open source to benefit allOPERATOR 1CapabilitiesDEVELOPER Enabling Familiar and Existing ToolsCOLLECT PROCESS BUILD EXPLORE QUERY DELIVER PROVISION MANAGE MONITO R Page 17 18. APPLICATIONS Requirements for Enterprise Hadoop1 DATA SYSTEM 2 SOURCES 3Business Analy4cs CapabilitiesCustom Applica4ons Packaged Applica4ons Ensure enterprise capabilities are delivered in 100% open source to benefit allIntegrate with DEV & DATA TOOLS Applications BUILD & Business Intelligence, TEST Developer IDEs, Data IntegrationSkillsOPERATIONAL TOOLS Leverage your existing RDBMS EDW skills: development, MPP analytics, operationsMANAGE & Systems MONITOR IntegrationPlatformsData Systems & Storage, Systems ManagementREPOSITORIES Interoperable with existing data center investmentsExis4ng Sources (CRM, ERP, Clickstream, Logs) © Hortonworks Inc. 2013Emerging Sources (Sensor, Sen4ment, Geo, Unstructured) Operating Systems, Virtualization, Cloud, AppliancesPage 18 19. DATA SYSTEM APPLICATIONS Elasticsearch in the Modern Data ArchitectureDEV & DATA TOOLS OPERATIONAL TOOLS RDBMS EDW HANAMPP SOURCES INFRASTRUCTURE Exis4ng Sources (CRM, ERP, Clickstream, Logs) © Hortonworks Inc. 2013Emerging Sources (Sensor, Sen4ment, Geo, Unstructured) Page 19 20. Today’s Topics • Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A© Hortonworks Inc. 2013Page 20 21. What is Elasticsearch?Copyright ElasBcsearch 2014. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 22. open-source RESTful API JSON over HTTP scales massivelyhigh availabilityschema free Elasticsearchreal time, search and analytics engine Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited Lucene baseddistributedmulti tenancy 23. The Elasticsearch ELK StackLogstash Elasticsearch Kibana Data From Any Source Instantly Analyze Actionable Insights Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 24. What about Elasticsearch the Company? • Support 100s of Companies in Production environments • Training Developers and Ops around the world on ELK • Drive the ELK Projects forward, great things to come! • Commercial products: Marvel to monitor and manage ELK • Backed by the best: Benchmark, Index VenturesCopyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 25. Who’s using Elasticsearch?Copyright ElasBcsearch 2014. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 26. What are people saying about Elasticsearch? Copyright ElasBcsearch 2014. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 27. Real-time Search • Europe’s largest professional social network• Over 14 Million members• New data available for search immediately vs 50 mins• “According to the customer survey that we conduct every quarter, search is the most important feature on our platform,” Dr. Daniel Olmedilla, Vice President, Data Science at XINGCopyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 28. How do they fit together?Copyright ElasBcsearch 2014. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 29. ElasticsearchIndex seamlessly Free Text SearchAnalytics Elasticsearch-Hadoop LibraryIntegrate Natively Choice Clean, Enrich Raw data Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 30. Elasticsearch-Hadoop Library • Java Library for integrating Elasticsearch and Hadoop• Pig, Hive, Cascading, MapReduce• Search & Real-time Analytics with Elasticsearch, Hadoop as Data Lake• Scales with Hadoop• Works with Apache Hadoop, Certified on HDP 1.x and 2.x (Yarn compatible Binary)Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 31. Multiple Architectures -Same Hardware-1 for 1 Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 32. Multiple ArchitecturesES ES ES NodeNodeNode -Separate Hardware-Clusters of each-Scale Independently Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 33. Show me! • Hortonworks HDP Sandbox - making Hadoop easy!• Installed Elasticsearch, Marvel and Kibana on Sandbox• Upload elasticsearch-hadoop jar as Pig Storage lib• Index CSV data from Pig to Elasticsearch• Query Elasticsearch from Pig - best of both• Kibana to Visualize and DiscoverCopyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 34. Where to find us?elasticsearch.com elasticsearch.org @elasticsearch #elasticsearch IRC (webchat.freenode) Github elasticsearch/elasticsearchCopyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited 35. Try Hadoop Today… Get Involved More about Elasticsearch & Hortonworks hortonworks.com/partner/elasticsearchDownload the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2Contact us:
[email protected] © Hortonworks Inc. 2013Page 35