Tajinder Presentation6
-
Upload
tajinder-singh -
Category
Documents
-
view
107 -
download
0
Transcript of Tajinder Presentation6
![Page 1: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/1.jpg)
crimeX Real time crime analysis and alert system
Tajinder Singh
![Page 2: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/2.jpg)
Motivation
![Page 3: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/3.jpg)
Motivation
• How criminals operate
• Dynamics between criminals and anti criminal squad
![Page 5: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/5.jpg)
Pipeline
Crime data (real)
User data (real)
Crime data (batch)
Ingestion Batch Layer Serving Layer
Real Time
![Page 6: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/6.jpg)
Data flow
• Seed: http://us-city.census.okfn.org/dataset/crime-stats
• Engineered Data (600 GB)
Data sources
![Page 7: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/7.jpg)
Data flow
{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,
“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}
Crime data (batch)
Batch Processing
![Page 8: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/8.jpg)
Data flow
Crime data (batch)
Batch Processing
{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,
“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}
![Page 9: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/9.jpg)
Data flow
Crime data (batch)
Batch Processing
{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,
“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}
+ Python Script (Refining)
![Page 10: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/10.jpg)
Data flow
Crime data (batch)
Batch Processing
{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,
“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”,
“zip”:”90007”, “city”: “los angeles”, “state”:”california”, “country”:”usa”}
Index Type: crimes
![Page 11: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/11.jpg)
Data flow
{ “crimetype”: “robbery”, “lat”: “34.5462”, “lon”: “-118.453”}
Real Time Processing
Crime data User data
{ “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243” }
![Page 12: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/12.jpg)
Data flow
{ “crimetype”: “robbery”, “lat”: “34.5462”, “lon”: “-118.453”}
Real Time Processing
[ Processing ]
Crime data User data { “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243” }
![Page 13: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/13.jpg)
Data flow Real Time Processing
{ “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”, “lat”: “34.5462”,
“lon”: “-118.453”, “zip”:”90007”, “city”: “los angeles”, “state”:”california”,
“country”:”usa”}
Crime data User data
{ “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243”,
”zip”:”90007”, “city”: “los angeles”, “state”:”california”, “country”:”usa” }
Index Type: crimes_realtime and user-subscribe-crime
![Page 14: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/14.jpg)
Data flow use case 1 (batch)
Input [ “location”:”2611 portland street, los
angeles”]
![Page 15: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/15.jpg)
Data flow use case 1 (batch)
Output Fields
Distance Covered (radius)
Total crimes analyzed
Average latency*
Crime Types
Average latency* : Average difference between crimes occurring timestamp & crimes reporting timestamp
![Page 16: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/16.jpg)
Data flow use case 1 (batch)
Output Fields
Distance Covered (radius)
Total crimes analyzed
Average latency*
Crime Types
Average latency* : Average difference between crimes occurring timestamp & crimes reporting timestamp
[output]
![Page 17: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/17.jpg)
Data flow use case 2 (real)
Real Time [ “crimetype”:”robbery”, “lat”:
”34.2353”, “lon”:”-113.42534”]
![Page 18: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/18.jpg)
Data flow use case 2 (real)
Output Fields
Distance Covered (radius)
Total crimes analyzed
Average latency*
Crime Types
Alert nearby users
User Phone number
User Name
User latitude
User longitude
[output]
![Page 19: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/19.jpg)
Challenge: Front-end display after 5 seconds per request
Reason:
• A lot of I/O operations (all crime documents were fetched to the UI)
• Business logic and query execution on front-end (flask)
Solution:
• Query execution on Elasticsearch cluster
• NO I/O operation
• Dynamic scripting enabled on ES cluster.
• Used Groovy scripts as opposed to Javascript, Python, MVEL (built-in),
expression (built-in) etc.
Challenge: Network Latency
Solution: Co-locate Storm and Elasticsearch cluster nodes to reduce network
latency
Performance Optimization
Challenges
![Page 20: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/20.jpg)
Caveat: Vulnerable to outside attacks (Security vulnerability)
Reason:
• Enabled dynamic scripting
Solution:
• Don’t run Elasticsearch as root
• Provide read-only access to requisite directories
Performance Optimization
Challenges
![Page 21: Tajinder Presentation6](https://reader031.fdocuments.mx/reader031/viewer/2022022201/5889630a1a28abef658b6d3b/html5/thumbnails/21.jpg)
about me
Tajinder Singh [University of Southern California]
5 yrs experience in web development