MongoDB Europe 2016Old Billingsgate, London
15th November
Use my code rubenterceno20 for 20% off ticketsmongodb.com/europe
Conceptos Básicos 2016Introducción al Aggregation Framework
Rubén TerceñoSenior Solutions Architect, [email protected]@rubenTerceno
Agenda del CursoDate Time Webinar25-Mayo-2016 16:00 CEST Introducción a NoSQL 7-Junio-2016 16:00 CEST Su primera aplicación MongoDB 21-Junio-2016 16:00 CEST Diseño de esquema orientado a documentos 07-Julio-2016 16:00 CEST Indexación avanzada, índices de texto y geoespaciales 19-Julio-2016 16:00 CEST Introducción al Aggregation Framework 28-Julio-2016 16:00 CEST Despliegue en producción
Resumen de lo visto hasta ahora• ¿Porqué existe NoSQL?• Tipos de bases de datos NoSQL• Características clave de MongoDB
• Instalación y creación de bases de datos y colecciones• Operaciones CRUD, Índices y explain()
• Diseño de esquema dinámico• Jerarquía y documentos embebidos
• Búsquedas de texto libre y geoespaciales
Aggregation Framework• Un motor analítico nativo para MongoDB• Pero… que significa analítico?
• Si miramos a las BBDD clásicas tenemos dos tipos, OLTP y OLAP• OLTP : Online Transaction Processing
• Reservas de aviones• Operativa de cajeros• Gestión de clientes(CRM)
• OLAP : Online Analytical Processing• Cálculos de rentabilidad y Overbooking• Predicción de demanda y optimización y de recargasd e cajerois• Segmentación de clientes
¿Qué pinta tiene?
OLTP OLAP
OLAP – Territorio de Gigantes• Las queries OLAP normalmente requiren accesos totales los datos• Los resultados se almacenan para análisis comparativos y futuros• Spark y Hadoop son las tecnologías dominantes en esta área, pero:
• La complejidad es elevada• Están orientados al análisis algorítmico de datos (Hay que programar)• Requieren conocimiento de procesado y algorítmica paralela.
• Aggregation Framework ofrece aproximación más amistosa • Puedes hacer lo mismo con menos esfuerzo.
• Óptimo para analítica en Tiempo Real y descubrimiento
Agg. Frmwk – A Processing Pipeline
Project Lookup Group SortMatch
• Think unix pipeline• The output of one stage is passed to the input of the next stage• Each stage performs one job• Stages can be repeated• The input is a single collection
Pipeline Operators• $match
Filter documents
• $project/$redactReshape documents
• $groupSummarize documents
• $outCreate new collections
• $sampleReturn random samples
• $sortOrder documents
• $limit/$skipPaginate documents
• $lookupJoin two collections together
• $unwindExpand an array
• $geoNearReturn documents by distance
Model of the Aggregation Framework
The Containers dataset• https://github.com/terce13/geoData/blob/master/Containers.zip
• Unzip it
• mongorestore ./dump
Example ship document{ "_id" : ObjectId("56fda36a0a162d0f051f2c6d"),
"Built" : 2015,"Name" : "MSC Zoe","Length overall (m)" : 395.4,"Beam (m)" : 59,"Maximum TEU" : 19224,"GT" : 193000,"Owner" : "MSC","Country" : "Switzerland","route" : {
"origin" : {"Name" : "Tianjin","Country" : "China”},
"destination" : {"Name" : "Shanghai","Country" : "China”}},
"location" : {"type" : "Point","coordinates" : [
129.15693498213182, 18.108558232731916]},"EAT" : ISODate("2016-05-16T10:00:00Z”)}
Example container document{
"_id" : ObjectId("5719290546728347c6fbdc4c"),"container_id" : "00000001","type" : "40","cargo" : "Whales","Tons" : 38,"location" : {
"type" : "Point","coordinates" : [
129.15297142372992,18.108451503053704
]},"shipName" : "MSC Zoe"
}
Using the shellMongoDB Enterprise > db.ships.aggregate(){ "_id" : ObjectId("56fda36a0a162d0f051f2c6d"), "Built" : 2015, "Name" : "MSC Zoe", "Length overall (m)" : 395.4, "Beam (m)" : 59, "Maximum TEU" : 19224, "GT" : 193000, "Owner" : "MSC", "Country" : "Switzerland", "route" : { "origin" : { "Name" : "Tianjin", "Country" : "China" }, "destination" : { "Name" : "Shanghai", "Country" : "China" } }, "location" : { "type" : "Point", "coordinates" : [ 129.15693498213182, 18.108558232731916 ] }, "EAT" : ISODate("2016-05-16T10:00:00Z") }[...]{ "_id" : ObjectId("56fda36a0a162d0f051f2c7f"), "Built" : 2015, "Name" : "CMA CGM Bougainv", "Length overall (m)" : 398, "Beam (m)" : 54, "Maximum TEU" : 17722, "GT" : "", "Owner" : "CMA CGM", "Country" : "France", "route" : { "origin" : { "Name" : "Cartagena", "Country" : "Colombia" }, "destination" : { "Name" : "Antwerp", "Country" : "Belgium" } }, "location" : { "type" : "Point", "coordinates" : [ -80.76828572004653, 0.8313138025242637 ] }, "EAT" : ISODate("2016-05-17T11:00:00Z") }Type "it" for moreMongoDB Enterprise >
$limitMongoDB Enterprise > db.ships.aggregate([{$limit : 2}]){ "_id" : ObjectId("56fda36a0a162d0f051f2c6d"), "Built" : 2015, "Name" : "MSC Zoe", "Length overall (m)" : 395.4, "Beam (m)" : 59, "Maximum TEU" : 19224, "GT" : 193000, "Owner" : "MSC", "Country" : "Switzerland", "route" : { "origin" : { "Name" : "Tianjin", "Country" : "China" }, "destination" : { "Name" : "Shanghai", "Country" : "China" } }, "location" : { "type" : "Point", "coordinates" : [ 129.15693498213182, 18.108558232731916 ] }, "EAT" : ISODate("2016-05-16T10:00:00Z") }{ "_id" : ObjectId("56fda36a0a162d0f051f2c70"), "Built" : 2015, "Name" : "MSC Oscar", "Length overall (m)" : 395.4, "Beam (m)" : 59, "Maximum TEU" : 19224, "GT" : 192237, "Owner" : "MSC", "Country" : "Switzerland", "route" : { "origin" : { "Name" : "Kaohsiung", "Country" : "Taiwan" }, "destination" : { "Name" : "Shanghai", "Country" : "China" } }, "location" : { "type" : "Point", "coordinates" : [ 153.87348512279215, 44.683039336234614 ] }, "EAT" : ISODate("2016-05-25T09:00:00Z") }MongoDB Enterprise >
$skipMongoDB Enterprise > db.ships.aggregate([{$limit : 7}, {$skip: 5}]){ "_id" : ObjectId("56fda36a0a162d0f051f2c72"), "Built" : 2014, "Name" : "CSCL Pacific Ocean", "Length overall (m)" : 399.67, "Beam (m)" : 58.6, "Maximum TEU" : 19100, "GT" : 187541, "Owner" : "CSCL", "Country" : "China", "route" : { "origin" : { "Name" : "Ningbo-Zhoushan", "Country" : "China" }, "destination" : { "Name" : "Kaohsiung", "Country" : "Taiwan" } }, "location" : { "type" : "Point", "coordinates" : [ -168.88031791679694, 41.72272856411992 ] }, "EAT" : ISODate("2016-05-24T04:00:00Z") }{ "_id" : ObjectId("56fda36a0a162d0f051f2c73"), "Built" : 2015, "Name" : "CSCL Indian Ocean", "Length overall (m)" : 399.67, "Beam (m)" : 58.6, "Maximum TEU" : 19100, "GT" : 187541, "Owner" : "CSCL", "Country" : "China", "route" : { "origin" : { "Name" : "Jebel Ali (Dubai)", "Country" : "United Arab Emirates" }, "destination" : { "Name" : "Ho Chi Minh City (Saigon)", "Country" : "Vietnam" } }, "location" : { "type" : "Point", "coordinates" : [ -136.48534585524587, 27.322294568378965 ] }, "EAT" : ISODate("2016-05-27T12:00:00Z") }MongoDB Enterprise >
$sample{$sample: {size : 10}}
$matchMongoDB Enterprise > db.ships.aggregate([{$match :{"Country": "China"}}])
MongoDB Enterprise > db.ships.aggregate([{$match :{"route.origin.Country": "China"}}])
MongoDB Enterprise > db.ships.aggregate([{$match : {location: {$geoWithin: {$geometry : caribe.geometry}}}}])
$geoWithindb.ships.aggregate([ { $geoNear: { near: { type: "Point", coordinates: [ -122.4252, 37.8283 ] }, distanceField: "dist.calculated", maxDistance: 3000000, distanceMultiplier: 0.001, query: { cargo: "Iron" }, limit : 1000000, includeLocs: "dist.location", spherical: true } }])
$lookup{$lookup : {from: "containers", as: "cargo", localField: "Name", foreignField: "shipName"}}
$unwind{$unwind: "$cargo"}
$group{$group : {_id: {ship: "$Name", cargo : "$cargo.cargo", route: "$route", location: "$location"}, sum: {$sum: "$cargo.Tons"}}}
$projectvar project = {$project: {_id : {ship: "$_id.ship", route: "$_id.route", location: "$_id.location"}, cargo : {type : "$_id.cargo", Tons: "$sum"}}}
$out{$out : "ships2"}
Summary• A pipeline of operations• Select, project, group, sort, lookup• $out must appear last in an aggregation pipeline• There are a range of accumulators (see the group by
documentation)• Very powerful way to reshape and analyze data• Shard aware to gain maximum performance for large clusters
Próximo WebinarDespliegue en producción
• 28 de Julio 2016 – 16:00 CEST, 11:00 ART, 9:00
• ¡Regístrese si aún no lo ha hecho!• ¿Qué necesita saber para asegurarse de que el sistema MongoDB
funcione y escale en un entorno de producción?• En esta charla, haremos un recorrido por nuestro decálogo para el
despliegue en producción y analizaremos los aspectos básicos de algunas de las herramientas automatizadas que MongoDB ofrece para gestionar los sistemas en producción.
• Regístrese en : https://www.mongodb.com/webinars
• Denos su opinión, por favor: [email protected]
¿Preguntas?