BigData
-
Upload
svet-ivantchev -
Category
Technology
-
view
2.664 -
download
1
description
Transcript of BigData
![Page 1: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/1.jpg)
BigData
Svet Ivantchev, eFaberUniEE, 15 de marzo de 2011
miércoles 16 de marzo de 2011
![Page 2: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/2.jpg)
miércoles 16 de marzo de 2011
![Page 3: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/3.jpg)
miércoles 16 de marzo de 2011
![Page 4: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/4.jpg)
miércoles 16 de marzo de 2011
![Page 5: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/5.jpg)
miércoles 16 de marzo de 2011
![Page 6: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/6.jpg)
iMac 2000 vs iPhone 2010
miércoles 16 de marzo de 2011
![Page 7: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/7.jpg)
http://www.washingtonpost.com/wp-dyn/content/article/2011/02/10/AR2011021004916.htmlmiércoles 16 de marzo de 2011
![Page 8: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/8.jpg)
miércoles 16 de marzo de 2011
![Page 9: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/9.jpg)
miércoles 16 de marzo de 2011
![Page 10: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/10.jpg)
Tipo de información
• Información [no]estructurada
• Interna vs externa
• Encyclopedia Britannica vs Wikipedia
miércoles 16 de marzo de 2011
![Page 11: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/11.jpg)
BigData
Cuando las viejas técnicas ya no nos valen
captura - almacenamiento - transformación - análisis - visualización
miércoles 16 de marzo de 2011
![Page 12: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/12.jpg)
Con que “pagamos”
• Conceptos
• cambios en la infraestructura
• noSQL
• MapReduce
• mucho más ...
miércoles 16 de marzo de 2011
![Page 13: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/13.jpg)
Infraestructura
• “El servidor” vs VMs de “usar y tirar”
• Backup: super-RAID+super-Backup vs multiples copias
• Libertad de experimentar con nuevas herramientas (macroscopes)
• Ancho de banda: HDD, Internet o Seur
miércoles 16 de marzo de 2011
![Page 14: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/14.jpg)
http://blog.jteam.nl/2009/08/04/introduction-to-hadoop/
Idea de MapReduce
miércoles 16 de marzo de 2011
![Page 15: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/15.jpg)
BD Relacional MapReduce
Tamaño Gigabytes (10^9) Petabytes (10^15)
Acceso Interactivo y Batch Batch
Actualizaciones Rectura y escritura múltiple
Pocas escrituras, muchas lecturas
Estuctura Estática Dinámica
Integridad Alta Baja
Escalado Nolineal Lineal
Hadoop: The Definitive Guide, O’Reilly, 2010
miércoles 16 de marzo de 2011
![Page 16: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/16.jpg)
miércoles 16 de marzo de 2011
![Page 17: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/17.jpg)
miércoles 16 de marzo de 2011
![Page 18: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/18.jpg)
miércoles 16 de marzo de 2011
![Page 19: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/19.jpg)
miércoles 16 de marzo de 2011
![Page 20: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/20.jpg)
miércoles 16 de marzo de 2011
![Page 21: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/21.jpg)
miércoles 16 de marzo de 2011
![Page 22: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/22.jpg)
miércoles 16 de marzo de 2011
![Page 23: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/23.jpg)
miércoles 16 de marzo de 2011
![Page 24: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/24.jpg)
CouchDB/Couchbase
http://www.couchbase.com/downloads
http://www.couchbase.com/downloadsmiércoles 16 de marzo de 2011
![Page 25: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/25.jpg)
miércoles 16 de marzo de 2011
![Page 26: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/26.jpg)
Demo
miércoles 16 de marzo de 2011
![Page 27: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/27.jpg)
miércoles 16 de marzo de 2011
![Page 28: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/28.jpg)
miércoles 16 de marzo de 2011
![Page 29: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/29.jpg)
miércoles 16 de marzo de 2011
![Page 30: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/30.jpg)
miércoles 16 de marzo de 2011
![Page 31: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/31.jpg)
miércoles 16 de marzo de 2011
![Page 32: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/32.jpg)
miércoles 16 de marzo de 2011
![Page 33: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/33.jpg)
miércoles 16 de marzo de 2011
![Page 34: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/34.jpg)
miércoles 16 de marzo de 2011
![Page 35: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/35.jpg)
miércoles 16 de marzo de 2011
![Page 36: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/36.jpg)
miércoles 16 de marzo de 2011
![Page 37: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/37.jpg)
miércoles 16 de marzo de 2011
![Page 38: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/38.jpg)
miércoles 16 de marzo de 2011
![Page 39: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/39.jpg)
miércoles 16 de marzo de 2011
![Page 40: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/40.jpg)
miércoles 16 de marzo de 2011
![Page 41: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/41.jpg)
Google Books
• 129 000 000 libros publicados
• 15 000 000 libros escaneados (1700-2010)
• 5 000 000 analizados con sus metadatos
miércoles 16 de marzo de 2011
![Page 42: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/42.jpg)
http://ngrams.googlelabs.com/
miércoles 16 de marzo de 2011
![Page 43: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/43.jpg)
miércoles 16 de marzo de 2011
![Page 44: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/44.jpg)
miércoles 16 de marzo de 2011
![Page 45: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/45.jpg)
DIY
• Amazon EC2
• Amazon S3
• Apache Hadoop y Hive
• Amazon Elastic MapReduce
miércoles 16 de marzo de 2011
![Page 46: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/46.jpg)
http://ngrams.googlelabs.com/datasets
miércoles 16 de marzo de 2011
![Page 47: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/47.jpg)
miércoles 16 de marzo de 2011
![Page 48: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/48.jpg)
elastic-mapreduce --create --alive --hive-interactive --hive-versions 0.7
elastic-mapreduce --list mi-flow-id elastic-mapreduce --ssh mi-flow-id
$ hive hive> set hive.base.inputformat=org.apache.hadoop.hive.ql.io.HiveInputFormat; hive> set mapred.min.split.size=134217728;
http://aws.amazon.com/articles/5249664154115844
Tendencias Siglo XX
miércoles 16 de marzo de 2011
![Page 49: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/49.jpg)
CREATE EXTERNAL TABLE english_1grams ( gram string, year int, occurrences bigint, pages bigint, books bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS SEQUENCEFILE LOCATION 's3://datasets.elasticmapreduce/ngrams/books/20090715/eng-all/1gram/';
miércoles 16 de marzo de 2011
![Page 50: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/50.jpg)
CREATE TABLE normalized ( gram string, year int, occurrences bigint );
INSERT OVERWRITE TABLE normalized SELECT lower(gram), year, occurrences FROM english_1grams WHERE year >= 1890 AND gram REGEXP "^[A-Za-z+'-]+$";
miércoles 16 de marzo de 2011
![Page 51: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/51.jpg)
CREATE TABLE by_decade ( gram string, decade int, ratio double );
INSERT OVERWRITE TABLE by_decade SELECT a.gram, b.decade, sum(a.occurrences) / b.total FROM normalized a JOIN ( SELECT substr(year, 0, 3) as decade, sum(occurrences) as total FROM normalized GROUP BY substr(year, 0, 3) ) b ON substr(a.year, 0, 3) = b.decade GROUP BY a.gram, b.decade, b.total;
miércoles 16 de marzo de 2011
![Page 52: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/52.jpg)
SELECT a.gram as gram, a.decade as decade, a.ratio as ratio, a.ratio / b.ratio as increase FROM by_decade a JOIN by_decade b ON a.gram = b.gram and a.decade - 1 = b.decade WHERE a.ratio > 0.000001 and a.decade >= 190 DISTRIBUTE BY decade SORT BY decade ASC, increase DESC;
miércoles 16 de marzo de 2011
![Page 53: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/53.jpg)
1900radium, ionization, automobiles, petrol, archivo, automobile, electrons, mukden,
anopheles, marconi, botha, ladysmith, lhasa, boxers, suprema, aboord, rotor, turkes, wireless, conveyor, manchurian, erythrocytes, shoare, thirtie, kop, tuskegee, thorium,
audiencia, bvo, arteriosclerosis
1910cowperwood, britling, boches, montessori, venizelos, bolsheviki, salvarsan, photoplay, pacifists, joffre, petrograd, pacifist, bolshevism, airmen, kerensky, foch, boche, serbia,
serbian, hindenburg, madero, serbians, bombing, ameen, anaphylaxis, aviators, syndicalism, aviator, biplane, taxi
1920bacteriophage, fascist, mussolini, fascism, sablin, latvia, insulin, peyrol, volstead,
czechoslovakia, iraq, vitamin, kenya, curricular, swaraj, reparations, broadcasting, slovakia, vitamins, gandhi, automotive, kemal, zoning, jazz, isotopes, isoelectric,
airscrew, shivaji, czechoslovak, stabilization
1930dollfuss, goebbels, manchukuo, hitler, sudeten, hitler's, rearmament, nazis, wpa, nazi, nra, manchoukuo totalitarian, pwa, tva, stalin's, peiping, homeroom, kulaks, stalin,
devaluation, bta, carotene, broadcasts, corporative, comintern, ergosterol, reichswehr, ussr, businessmen
miércoles 16 de marzo de 2011
![Page 54: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/54.jpg)
1930dollfuss, goebbels, manchukuo, hitler, sudeten, hitler's, rearmament, nazis, wpa, nazi, nra,
manchoukuo totalitarian, pwa, tva, stalin's, peiping, homeroom, kulaks, stalin, devaluation, bta, carotene, broadcasts, corporative, comintern, ergosterol, reichswehr,
ussr, businessmen
1940waveguide, luftwaffe, plutonium, streptomycin, darlan, gaulle, beachhead, lanny, jeeps,
penicillin, alamein, radar, bandwidth, psia, thiamine, quisling, sulfathiazole, wpb, airborne, jeep, aftr, bdg, tobruk, pakistan, sulfonamides, evacuees, guadalcanal, airfields, unesco,
rommel
1950qumran, transistors, chlorpromazine, transistor, automation, terramycin,
chloramphenicol, khrushchev, reserpine, pradesh, nasser, vietnamese, shri, uttar, madhya, vietnam, adenauer, aureomycin, nato, annexure, dna, edc, rna, biophys, pyarelal,
cortisone, semiconductors, rajasthan, minh
1960tshombe, bhupesh, vietcong, lumumba, ribosomal, lasers, ribosomes, ieee, aerospace,
malawi, thant, fortran, zambia, medicare, lysosomes, nlf, laser, tanzania, efta, oecd, astronaut, teilhard, goldwater, programed, uar, software, autoimmune, spacecraft, eec,
nasamiércoles 16 de marzo de 2011
![Page 55: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/55.jpg)
1970biofeedback, sexist, sexism, multinationals, namibia, bangladesh, microprocessor,
watergate, chicano, lifestyle, cytosol, medicaid, trh, chicanos, plasmid, jovanovich, ldcs, apg, pediatr, cyclase, isbn, immunotherapy, prostaglandin, opec, prostaglandins,
gentamicin, bangla, radioimmunoassay, epa, ophthalmol
1980htlv, dbase, interleukin, spreadsheet, vlsi, videotex, calmodulin, sandinistas, contras, isdn,
gorbachev's, sandinista, gorbachev, workstation, workstations, fsln, captopril, hybridoma, ifn, robotics, kda, fibronectin, khomeini, sql, robotic, oncogenes, rajiv,
xiaoping, unix, microsoft
1990netscape, cyberspace, html, endothelin, toolbar, biodiversity, mpeg, tqm, harpercollins, applet, reengineering, nafta, http, c++, newsgroups, gallopade, belarus, internet, apec,
url, yeltsin, adhd, apoptosis, integrin, usenet, hypermedia, globalisation, netware, africanamerican, myanmar
2000bibliobazaar, itunes, cengage, qaeda, wsdl, aspx, xslt, actionscript, xpath, sharepoint,
blogs, easyread, ipod, xhtml, blog, rfid, google, writeline, proteomics, bluetooth, voip, microarray, mysql, microarrays, putin, dreamweaver, dvds, ejb, xml, osama
miércoles 16 de marzo de 2011
![Page 56: BigData](https://reader034.fdocuments.mx/reader034/viewer/2022051412/54834d6fb4af9f2e7c8b48dd/html5/thumbnails/56.jpg)
Q & A
miércoles 16 de marzo de 2011