20091119 Cbm

41
Pasado, presente y futuro de la búsqueda de literatura científica Ramón AlonsoAllende

description

Novoseek presentation to the scientist at the Molecular Biology Reseach Institute Severo Ochoa.

Transcript of 20091119 Cbm

Page 1: 20091119 Cbm

Pasado, presente y futuro de la búsqueda de literatura científica

Ramón Alonso-­Allende

Page 2: 20091119 Cbm

Pasado, presente y futuro de la búsqueda de literatura científica

Ramón Alonso-­Allende

Page 3: 20091119 Cbm

1990’s2000’s

ïêáíÉ

ëÉ~êÅÜ êÉ~Ç

ÉñéÉêáãÉåí

Science CicleFuture

Search =Integration + Meaning + Social

Tod

ay

Relevance + Complete

+ Easy

-­ TimeVal

ue

syst

em

Page 4: 20091119 Cbm

Sistemas de información

1995 2000 2005 2010

Page 5: 20091119 Cbm
Page 6: 20091119 Cbm

0

250.000

500.000

750.000

1.000.000

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

Sear

ches

(100

0s)

Searches in PubMed

Page 7: 20091119 Cbm

Retos

Page 8: 20091119 Cbm

Retos

‣ Manejar cantidades ingentes de información.

‣ Ambigüedad del lenguaje.

‣ Tiempo.

‣ Mantenerse al día.

jordinho_dp

Page 9: 20091119 Cbm

0

20.000.000

40.000.000

60.000.000

80.000.000

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

GB PDB Medline SwissProt

Mucha información heterogenea

Page 10: 20091119 Cbm

43% Genes humanos tienen nombres ambiguos

Page 11: 20091119 Cbm

Algunos datos

‣ 5.892 términos pueden ser genes o enfermedades

‣ 3.963 nombres hacen referencia a 2 genes diferentes

‣ Un término hace referencia a 114 genes

0

1.000

2.000

3.000

4.000

2 3 4 5 6 7 8 9Núm

ero

de t

érm

inos

Número de conceptos

Disease GenesDrugs

Page 12: 20091119 Cbm

Algunos Ejemplos

sps AAt1stiff-­man syndrome(Diseases or Syndromes)

annuloaortic ectasia  (Diseases or Syndromes)

polystyrene sulfonate  (Pharmacological substances)

alanine aminotransferase  (Genes and Proteins)

systolic blood pressure  (Biological functions)

spermine synthase (Genes and Proteins)

Page 13: 20091119 Cbm

Language ambiguitypóåoåóãë eoãoåóãë ^Åêoåóã

aáÑÑÉêÉåí=ïoêÇ=Ñoê=íÜÉ=

ë~ãÉ=ÄáoãÉÇáÅ~ä=Éåíáíó

p~ãÉ=å~ãÉ=Ñoê=ÇáÑÑÉêÉåí=

ÄáoãÉÇáÅ~ä=ÉåíáíáÉë

RÉÇìÅÉ=ïoêÇ=

êÉéêÉëÉåíáåÖ=~=

ÄáoãÉÇáÅ~ä=Éåíáíó

• få=eìã~å=íÜÉêÉ=~êÉ=~í=

äÉ~ëí=RKQNU=ÖÉåÉë=ïáíÜ=

ëóåoåóãë=EPUB=oÑ=íÜÉ=

íoí~ä=ÖÉåoãÉF

• aêìÖë=Ü~îÉ=~=

ÅoããÉêÅá~ä=å~ãÉ=~åÇ=~=

ÅÜÉãáÅ~ä=å~ãÉ

póãÄoä=m^m=áë=~å=~äá~ë=

ÑoêW

• m^m=Em~åÅêÉ~íáíáëJ

~ëëoÅá~íÉÇ=éêoíÉáåF

• jRmpPM=EjáíoÅÜoÇ=

êáÄoëoã~ä=éêoí=PMëF

• m^mli^=Emoäó^=

éoäáãÉê~ëÉ=~äéÜ~F

p`q=ëí~åÇë=ÑoêW

• píÉã=`Éää=qê~åëéä~åí

• pÉÅêÉíáå

• p~äãoå=Å~äÅáíoåáå

Page 14: 20091119 Cbm

Inmanejable

‣ More than 25 MM documents considering scientific articles, grants, biomedical patents… relevant sources of information for biomedical researchers.

‣ 2,000 new scientific papers published everyday

‣ 5 years to read the new scientific material produced every 24 hours.

‣ Scan 130 journals and read 27 articles per day to follow a single disease, like breast cancer.

Page 15: 20091119 Cbm

Mantenerse al día

‣Alertas en buscadores

‣emailling eTOCs

‣Feeds RSS

Page 16: 20091119 Cbm

0%

10%

20%

30%

40%

50%

60%

70%

80%

All

Bioc

hem

estr

y

Mol

. & C

ell B

iol.

Gen

etic

s

Biot

echn

olog

y

Bioi

nfro

mat

ics

Med

icin

e

Oth

er

Search tasks & Lab work by discipline%

tim

e

Searchin literature Searching data form DB Working in the labRoos, A., Kumpulainen, S., Järvelin, K and Hedlund, T. (2008). "The information environment of researchers in molecular medicine" Information Research, 13(3) paper 353. [Available at http://InformationR.net/ir/13-­3/paper353.html]

Page 17: 20091119 Cbm

Cómo afrontamos retos

Page 18: 20091119 Cbm

Afrontamos los retos:

‣ Integrando información para el usuarios.

‣ Analizando el texto (text mining).

‣ Funcionalidad útil.

‣ Tecnología + Interfaz sencillo = - Tiempo

Page 19: 20091119 Cbm

Integración de datosSequence DBsUniProtGenBankRefSeqPIREMBLEntrez ProteinUniSTS

Gene DBsGDBEnsemblEntrez GeneUniGeneH-­InvDBMGCHGNC

Pathway DBsKEGGECReactome

Domain DBsPfamPROSITESMARTProDomInterPro

Other DBsAffymetrixGOPDBMIMCCDSHPRDHGNC

Page 20: 20091119 Cbm

Text miningGene: GH1Growth Hormone 1GeneID: 2688

Synonym: GHNSynonym: GH

Gene: GG1Gamma Glutamyl HydrolaseGeneID: 8836

Synonym: conjugasaSynonym: GH

adenoma (0.300)adipocyte (0.418)adipose (0.324)age-related (0.442)genotropin (19.368)

antifolate (2.850)carboxypeptidase (12.618)folate (0.674)gamma-glu-x (15.452)antifolylpoly-gamma-glutamate (12.054)

Page 21: 20091119 Cbm

Medlineabstracts

Open access Texto completo

Proyectos I+Dabstracts

Datos indexados

NU=j NQRKMMM NIR=j

[=Qj=ÅoåÅÉéíoë

[=OMM=j=êÉä~ÅáoåÉë

Page 22: 20091119 Cbm

Comparison: Use-­Case: Looking for the gene SCT

PubMed: SCT is Solid-­ Cystic tumor

Google Scholar: SCT is name of author

novo|seek: SCT ismeaning you are looking for:-Secretine-Stem Cell transplantation

Page 23: 20091119 Cbm

novo|seek vs. Google Scholar

dooÖäÉ=pÅÜoä~êW=åo=ï~ó=ío=ÑoÅìë=íÜÉ=ëÉ~êÅÜ=ÄÉóoåÇ=êÉ~ÇáåÖW=íáãÉJÅoåëìãáåÖ

Page 24: 20091119 Cbm

Semantic SearchDiscovery

Knowledge Extraction

Concept relations

‣Search more efficiently.

‣Extract more information.

‣Put into relation different sources of information

‣Gain time

Techonology

by L cornide

Page 25: 20091119 Cbm

e.g. Search of breast cancerDetection of breast carcinoma cells in effusions is associated with rapidly fatal outcomeWomen who do not receive regular mammograms are more likely than others to have breast cancer diagnosed at an advanced stage[…] thereby providing higher cytotoxicity against the 4T1 mouse mammary carcinoma cell line

All of this keywords are referred to the same biomedical concept, a search by breast cancer will retrieve this three documents

‣ Use of context and semantic information to identify the relevant information

e.g. Search of CAT, that could be referred to the enzyme Catalase or to the animal, “cat”.[..] activity of antioxidant enzymes (GSH-­Px, SOD, CAT) and content of malondialdehyde (MDA) were

determined[…] 26 free-­living lynx, 53 domestic cats, 28 dogs, 33 red foxes (Vulpes vulpes) […]

The same keyword is referred to different biomedical concepts. Using the context, we can identify that only the first sentence talks about an enzyme

Semantic Search

‣ Conceptual search

by L cornide

Page 26: 20091119 Cbm

Concept Relations

e.g. Search for Alzheimer’s DiseaseThe apolipoprotein E gene (APOE) polymorphism genotyping has an allegedly important predictive value for coronary heart disorders and Alzheimer's disease.Apolipoprotein E (apoE), a ligand for the low-­density lipoprotein receptor family, has been implicated in modulating glial inflammatory responses and the risk of neurodegeneration associated with Alzheimer's disease.Although many genes have been suggested to be associated with AD, with the exception of APOE, most polymorphic variants of potential risk exhibit a very weak association with AD

The protein apolipoprotein E and Alzheimer disease are related with a relevance of 36%

by L cornide

Page 27: 20091119 Cbm

Knowledge Extraction

‣ Based on the detected relations between concepts, we can extract automatically knowledge from text

e.g. Obtain the knowledge about Breast cancer, extracted from literature[…] BRCA1 or BRCA2 […] Information was recorded on prophylactic mastectomy, prophylactic oophorectomy, use of tamoxifen [..] had a bilateral prophylactic oophorectomy. […] breast cancer, 248 (18.0%) had had a prophylactic bilateral mastectomy. Among those who did not have a prophylactic mastectomy, only 76 women (5.5%) took tamoxifen and 40 women (2.9%) took raloxifene for breast cancer prevention. […].

Genes BRCA1 and BRCA2 are related with breast cancer. Tamoxifen and Raloxifene are drugs used in its treatment, and mastectomy and oophorectomy are usual procedures to treat it.

by L cornide

Page 28: 20091119 Cbm

Make new Discoveries

‣ Discover hidden relations between concepts that have not been described before in the scientific literature

e.g. Obtain the knowledge about Breast cancer, extracted from literature[…] meal fatty acids appear to be an important determinant of vascular reactivity, with fish oils significantly improving postprandial endothelium-­independent vasodilationNumerous studies have documented longer bleeding times and decreased platelet aggregation in subjects ingesting omega-­3 fatty acidsvasomotor pain, in particular the fact of reactional vasodilation during Raynaud's syndrome, inflammation in the region surrounding zones of ischemic necrosis, and infection of ulcersObjective judgement on effects of medicine in patients with Raynaud's phenomenon-­-­measurement of cutaneous blood flow using laser Doppler flowmeter and platelet aggregation activity

By finding evidence of a relation between fish oils and vasodilatation and platelet aggregation, and evidence in the link between these two functions and Raynaud’s syndrome, we can uncover a new discovery that was not described previously in the literature, the possible treatment of Raynaud’s Syndrome with fish oil.

by L cornide

Page 29: 20091119 Cbm
Page 30: 20091119 Cbm
Page 31: 20091119 Cbm
Page 32: 20091119 Cbm
Page 33: 20091119 Cbm
Page 34: 20091119 Cbm
Page 35: 20091119 Cbm
Page 36: 20091119 Cbm
Page 37: 20091119 Cbm

El Futuro

‣ Información estructurada.

‣ Identificador de usuario.

‣ El artículo del futuro.

‣ Búsqueda social.

Page 38: 20091119 Cbm

http://beta.cell.com/erickson/

Page 39: 20091119 Cbm

Collective

CollaborativeQ&A

Friend-­Filtered

Social Search

http://www.readwriteweb.com/archives/3_flavors_of_social_search_what_to_expect.php

Page 40: 20091119 Cbm

Beta testers

Colaboración en el desarrollo de uno de los principales buscadores biomédicos en el mercado.

Acceso a los últimas actualizaciones de nuestro buscador.

Regalo seguro.

www.novoseek.com/betatesters.html

Page 41: 20091119 Cbm

Contacto

Ramón Alonso-­AllendeMarketing & Business [email protected]: +34 91 141 71 50