Salesforce is using AI to democratize SQL so anyone can query databases in natural language

SQL is about as easy as it gets in the world of programming, and yet its learning curve is still steep enough to prevent many people from interacting with relational databases. Salesforce’s AI research team took it upon itself to explore how machine learning might be able to open doors for those without knowledge of SQL.

Their recent paper, Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning, builds on sequence to sequence models typically employed in machine translation. A reinforcement learning twist allowed the team to obtain promising results translating natural language database queries into SQL.

In practice this means that you could simply ask who the winningest team in college football is and an appropriate database could be automatically queried to tell you that it is in fact the University of Michigan.

“We don’t actually have just one way of writing a query the correct way,” Victor Zhong, one of the Salesforce researchers who worked on the project, explained to me in an interview. “If I give a natural language question, there might be two or three ways to write the query. We use reinforcement learning to encourage use of queries that obtain same result.”

You can imagine how machine translation problems can quickly become massively complex with large vocabularies. The more you can limit the number of possible translations for each missing word, the simpler your problem becomes. To this avail, Salesforce opted to limit its vocabulary to words used in database labels, the words in the question being asked and the words typically used in SQL queries.

The idea of democratizing SQL isn’t new. Startups like ClearGraph, which was recently acquired by Tableau, have made it their business to open up data with English rather than SQL.

“Some models perform execution on a database itself,” added Zhong. “But there’s potential privacy concerns if you’re asking a question about Social Security numbers.”

Outside of the paper itself, Salesforce’s biggest contribution here comes in the form of the WikiSQL data set it constructed to aid in building its model. First HTML tables were collected from Wikipedia. These tables became the basis for randomly generated SQL queries. These queries were used to form questions that were then passed off to humans for paraphrasing over Amazon Mechanical Turk. Each paraphrasing was verified twice with additional human guidance. The resulting data set is the largest such data set in existence.

More From this publisher : HERE

    Recommended Products

  • Spazeship Pro Spazeship Pro
  • BleuPage Pro -> Content Fetcher Like a blog in your niche? Want to repost from that blog to your social media networks? Content Fetcher is made exactly for this purpose, just put the FEED URL of your favorite blog and select your social media networks and see the magic unfold.
  • Viddyoze (Personal) Join 70,000 Other Happy Users and Start Creating incredible Animations! (Personal Usage)
  • 1A Video Matrix Simply the very best Automated Live Stream Video Social Media Posting Software And Marketers Tools

Tags: ClearGraph computing data management database machine learning relational database Salesforce SQL

Related Post "Salesforce is using AI to democratize SQL so anyone can query databases in natural language"

Spotify and Tencent agree to swap stakes in their music businesses
Tencent’s overseas investment splurge continues after the