Computer Science > Databases
[Submitted on 5 Jul 2020]
Title:DrugDBEmbed : Semantic Queries on Relational Database using Supervised Column Encodings
View PDFAbstract:Traditional relational databases contain a lot of latent semantic information that have largely remained untapped due to the difficulty involved in automatically extracting such information. Recent works have proposed unsupervised machine learning approaches to extract such hidden information by textifying the database columns and then projecting the text tokens onto a fixed dimensional semantic vector space. However, in certain databases, task-specific class labels may be available, which unsupervised approaches are unable to lever in a principled manner. Also, when embeddings are generated at individual token level, then column encoding of multi-token text column has to be computed by taking the average of the vectors of the tokens present in that column for any given row. Such averaging approach may not produce the best semantic vector representation of the multi-token text column, as observed while encoding paragraphs or documents in natural language processing domain. With these shortcomings in mind, we propose a supervised machine learning approach using a Bi-LSTM based sequence encoder to directly generate column encodings for multi-token text columns of the DrugBank database, which contains gold standard drug-drug interaction (DDI) labels. Our text data driven encoding approach achieves very high Accuracy on the supervised DDI prediction task for some columns and we use those supervised column encodings to simulate and evaluate the Analogy SQL queries on relational data to demonstrate the efficacy of our technique.
Submission history
From: Bortik Bandyopadhyay [view email][v1] Sun, 5 Jul 2020 16:53:17 UTC (259 KB)
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.