Charla en el CBM

41
Pasado, presente y futuro de la búsqueda de literatura científica Ramón AlonsoAllende

Transcript of Charla en el CBM

Page 1: Charla en el CBM

Pasado, presente y futuro de la búsqueda de literatura científica

Ramón Alonso-­Allende

Page 2: Charla en el CBM

Pasado, presente y futuro de la búsqueda de literatura científica

Ramón Alonso-­Allende

Page 3: Charla en el CBM

1990’s2000’s

ïêáíÉ

ëÉ~êÅÜ êÉ~Ç

ÉñéÉêáãÉåí

Science CicleFuture

Search =Integration + Meaning + Social

Tod

ay

Relevance + Complete

+ Easy

-­ TimeVal

ue

syst

em

Page 4: Charla en el CBM

Sistemas de información

1995 2000 2005 2010

Page 5: Charla en el CBM
Page 6: Charla en el CBM

0

250.000

500.000

750.000

1.000.000

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

Sear

ches

(100

0s)

Searches in PubMed

Page 7: Charla en el CBM

Retos

Page 8: Charla en el CBM

Retos

‣ Manejar cantidades ingentes de información.

‣ Ambigüedad del lenguaje.

‣ Tiempo.

‣ Mantenerse al día.

jordinho_dp

Page 9: Charla en el CBM

0

20.000.000

40.000.000

60.000.000

80.000.000

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

GB PDB Medline SwissProt

Mucha información heterogenea

Page 10: Charla en el CBM

43% Genes humanos tienen nombres ambiguos

Page 11: Charla en el CBM

Algunos datos

‣ 5.892 términos pueden ser genes o enfermedades

‣ 3.963 nombres hacen referencia a 2 genes diferentes

‣ Un término hace referencia a 114 genes

0

1.000

2.000

3.000

4.000

2 3 4 5 6 7 8 9Núm

ero

de t

érm

inos

Número de conceptos

Disease GenesDrugs

Page 12: Charla en el CBM

Algunos Ejemplos

sps AAt1stiff-­man syndrome(Diseases or Syndromes)

annuloaortic ectasia  (Diseases or Syndromes)

polystyrene sulfonate  (Pharmacological substances)

alanine aminotransferase  (Genes and Proteins)

systolic blood pressure  (Biological functions)

spermine synthase (Genes and Proteins)

Page 13: Charla en el CBM

Language ambiguitypóåoåóãë eoãoåóãë ^Åêoåóã

aáÑÑÉêÉåí=ïoêÇ=Ñoê=íÜÉ=

ë~ãÉ=ÄáoãÉÇáÅ~ä=Éåíáíó

p~ãÉ=å~ãÉ=Ñoê=ÇáÑÑÉêÉåí=

ÄáoãÉÇáÅ~ä=ÉåíáíáÉë

RÉÇìÅÉ=ïoêÇ=

êÉéêÉëÉåíáåÖ=~=

ÄáoãÉÇáÅ~ä=Éåíáíó

• få=eìã~å=íÜÉêÉ=~êÉ=~í=

äÉ~ëí=RKQNU=ÖÉåÉë=ïáíÜ=

ëóåoåóãë=EPUB=oÑ=íÜÉ=

íoí~ä=ÖÉåoãÉF

• aêìÖë=Ü~îÉ=~=

ÅoããÉêÅá~ä=å~ãÉ=~åÇ=~=

ÅÜÉãáÅ~ä=å~ãÉ

póãÄoä=m^m=áë=~å=~äá~ë=

ÑoêW

• m^m=Em~åÅêÉ~íáíáëJ

~ëëoÅá~íÉÇ=éêoíÉáåF

• jRmpPM=EjáíoÅÜoÇ=

êáÄoëoã~ä=éêoí=PMëF

• m^mli^=Emoäó^=

éoäáãÉê~ëÉ=~äéÜ~F

p`q=ëí~åÇë=ÑoêW

• píÉã=`Éää=qê~åëéä~åí

• pÉÅêÉíáå

• p~äãoå=Å~äÅáíoåáå

Page 14: Charla en el CBM

Inmanejable

‣ More than 25 MM documents considering scientific articles, grants, biomedical patents… relevant sources of information for biomedical researchers.

‣ 2,000 new scientific papers published everyday

‣ 5 years to read the new scientific material produced every 24 hours.

‣ Scan 130 journals and read 27 articles per day to follow a single disease, like breast cancer.

Page 15: Charla en el CBM

Mantenerse al día

‣Alertas en buscadores

‣emailling eTOCs

‣Feeds RSS

Page 16: Charla en el CBM

0%

10%

20%

30%

40%

50%

60%

70%

80%

All

Bioc

hem

estr

y

Mol

. & C

ell B

iol.

Gen

etic

s

Biot

echn

olog

y

Bioi

nfro

mat

ics

Med

icin

e

Oth

er

Search tasks & Lab work by discipline%

tim

e

Searchin literature Searching data form DB Working in the labRoos, A., Kumpulainen, S., Järvelin, K and Hedlund, T. (2008). "The information environment of researchers in molecular medicine" Information Research, 13(3) paper 353. [Available at http://InformationR.net/ir/13-­3/paper353.html]

Page 17: Charla en el CBM

Cómo afrontamos retos

Page 18: Charla en el CBM

Afrontamos los retos:

‣ Integrando información para el usuarios.

‣ Analizando el texto (text mining).

‣ Funcionalidad útil.

‣ Tecnología + Interfaz sencillo = - Tiempo

Page 19: Charla en el CBM

Integración de datosSequence DBsUniProtGenBankRefSeqPIREMBLEntrez ProteinUniSTS

Gene DBsGDBEnsemblEntrez GeneUniGeneH-­InvDBMGCHGNC

Pathway DBsKEGGECReactome

Domain DBsPfamPROSITESMARTProDomInterPro

Other DBsAffymetrixGOPDBMIMCCDSHPRDHGNC

Page 20: Charla en el CBM

Text miningGene: GH1Growth Hormone 1GeneID: 2688

Synonym: GHNSynonym: GH

Gene: GG1Gamma Glutamyl HydrolaseGeneID: 8836

Synonym: conjugasaSynonym: GH

adenoma (0.300)adipocyte (0.418)adipose (0.324)age-related (0.442)genotropin (19.368)

antifolate (2.850)carboxypeptidase (12.618)folate (0.674)gamma-glu-x (15.452)antifolylpoly-gamma-glutamate (12.054)

Page 21: Charla en el CBM

Medlineabstracts

Open access Texto completo

Proyectos I+Dabstracts

Datos indexados

NU=j NQRKMMM NIR=j

[=Qj=ÅoåÅÉéíoë

[=OMM=j=êÉä~ÅáoåÉë

Page 22: Charla en el CBM

Comparison: Use-­Case: Looking for the gene SCT

PubMed: SCT is Solid-­ Cystic tumor

Google Scholar: SCT is name of author

novo|seek: SCT ismeaning you are looking for:-Secretine-Stem Cell transplantation

Page 23: Charla en el CBM

novo|seek vs. Google Scholar

dooÖäÉ=pÅÜoä~êW=åo=ï~ó=ío=ÑoÅìë=íÜÉ=ëÉ~êÅÜ=ÄÉóoåÇ=êÉ~ÇáåÖW=íáãÉJÅoåëìãáåÖ

Page 24: Charla en el CBM

Semantic SearchDiscovery

Knowledge Extraction

Concept relations

‣Search more efficiently.

‣Extract more information.

‣Put into relation different sources of information

‣Gain time

Techonology

by L cornide

Page 25: Charla en el CBM

e.g. Search of breast cancerDetection of breast carcinoma cells in effusions is associated with rapidly fatal outcomeWomen who do not receive regular mammograms are more likely than others to have breast cancer diagnosed at an advanced stage[…] thereby providing higher cytotoxicity against the 4T1 mouse mammary carcinoma cell line

All of this keywords are referred to the same biomedical concept, a search by breast cancer will retrieve this three documents

‣ Use of context and semantic information to identify the relevant information

e.g. Search of CAT, that could be referred to the enzyme Catalase or to the animal, “cat”.[..] activity of antioxidant enzymes (GSH-­Px, SOD, CAT) and content of malondialdehyde (MDA) were

determined[…] 26 free-­living lynx, 53 domestic cats, 28 dogs, 33 red foxes (Vulpes vulpes) […]

The same keyword is referred to different biomedical concepts. Using the context, we can identify that only the first sentence talks about an enzyme

Semantic Search

‣ Conceptual search

by L cornide

Page 26: Charla en el CBM

Concept Relations

e.g. Search for Alzheimer’s DiseaseThe apolipoprotein E gene (APOE) polymorphism genotyping has an allegedly important predictive value for coronary heart disorders and Alzheimer's disease.Apolipoprotein E (apoE), a ligand for the low-­density lipoprotein receptor family, has been implicated in modulating glial inflammatory responses and the risk of neurodegeneration associated with Alzheimer's disease.Although many genes have been suggested to be associated with AD, with the exception of APOE, most polymorphic variants of potential risk exhibit a very weak association with AD

The protein apolipoprotein E and Alzheimer disease are related with a relevance of 36%

by L cornide

Page 27: Charla en el CBM

Knowledge Extraction

‣ Based on the detected relations between concepts, we can extract automatically knowledge from text

e.g. Obtain the knowledge about Breast cancer, extracted from literature[…] BRCA1 or BRCA2 […] Information was recorded on prophylactic mastectomy, prophylactic oophorectomy, use of tamoxifen [..] had a bilateral prophylactic oophorectomy. […] breast cancer, 248 (18.0%) had had a prophylactic bilateral mastectomy. Among those who did not have a prophylactic mastectomy, only 76 women (5.5%) took tamoxifen and 40 women (2.9%) took raloxifene for breast cancer prevention. […].

Genes BRCA1 and BRCA2 are related with breast cancer. Tamoxifen and Raloxifene are drugs used in its treatment, and mastectomy and oophorectomy are usual procedures to treat it.

by L cornide

Page 28: Charla en el CBM

Make new Discoveries

‣ Discover hidden relations between concepts that have not been described before in the scientific literature

e.g. Obtain the knowledge about Breast cancer, extracted from literature[…] meal fatty acids appear to be an important determinant of vascular reactivity, with fish oils significantly improving postprandial endothelium-­independent vasodilationNumerous studies have documented longer bleeding times and decreased platelet aggregation in subjects ingesting omega-­3 fatty acidsvasomotor pain, in particular the fact of reactional vasodilation during Raynaud's syndrome, inflammation in the region surrounding zones of ischemic necrosis, and infection of ulcersObjective judgement on effects of medicine in patients with Raynaud's phenomenon-­-­measurement of cutaneous blood flow using laser Doppler flowmeter and platelet aggregation activity

By finding evidence of a relation between fish oils and vasodilatation and platelet aggregation, and evidence in the link between these two functions and Raynaud’s syndrome, we can uncover a new discovery that was not described previously in the literature, the possible treatment of Raynaud’s Syndrome with fish oil.

by L cornide

Page 29: Charla en el CBM
Page 30: Charla en el CBM
Page 31: Charla en el CBM
Page 32: Charla en el CBM
Page 33: Charla en el CBM
Page 34: Charla en el CBM
Page 35: Charla en el CBM
Page 36: Charla en el CBM
Page 37: Charla en el CBM

El Futuro

‣ Información estructurada.

‣ Identificador de usuario.

‣ El artículo del futuro.

‣ Búsqueda social.

Page 38: Charla en el CBM

http://beta.cell.com/erickson/

Page 39: Charla en el CBM

Collective

CollaborativeQ&A

Friend-­Filtered

Social Search

http://www.readwriteweb.com/archives/3_flavors_of_social_search_what_to_expect.php

Page 40: Charla en el CBM

Beta testers

Colaboración en el desarrollo de uno de los principales buscadores biomédicos en el mercado.

Acceso a los últimas actualizaciones de nuestro buscador.

Regalo seguro.

www.novoseek.com/betatesters.html

Page 41: Charla en el CBM

Contacto

Ramón Alonso-­AllendeMarketing & Business [email protected]: +34 91 141 71 50