Postgres SAIL question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Postgres SAIL question

Stasinos Konstantopoulos
Dear all,

we are trying to have Sesame operate over a modified Postgres, where we
have implemented an index that returns multiple URIs that fit a
constraint. In order to expose this functionality through Sesame, we
are thinking of defining a special prefix. All triple patterns involving
a URI with this prefix, should be handled as follows:

1. Sesame (as it currently does) first looks up the internal id for this
URI in the uri_values table. The difference is that multiple URIs might
fit the constraint, so that this might return multiple IDs.

2. Sesame normally uses the single ID returned while building the SQL
query that retrieves triples from the property-named tables. It is not
obvious how to make this work for multiple IDs from step 1.

We are thinking of hacking the Postgres SAIL so that a single SQL query
is built, which JOINs the results of looking up the internal IDs of URIs
with triple retrieval. We would like to ask if anybody knows why the
SAIL is implemented as it currently is (as opposed to building a single
query in the first place) and point us to places where things might
break.

Thanks in advance,
Stasinos

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: Postgres SAIL question

Jeen Broekstra
Hi Stanisos,

On 2/10/13 3:44 PM, Stasinos Konstantopoulos wrote:

> we are trying to have Sesame operate over a modified Postgres, where we
> have implemented an index that returns multiple URIs that fit a
> constraint. In order to expose this functionality through Sesame, we
> are thinking of defining a special prefix. All triple patterns involving
> a URI with this prefix, should be handled as follows:
>
> 1. Sesame (as it currently does) first looks up the internal id for this
> URI in the uri_values table. The difference is that multiple URIs might
> fit the constraint, so that this might return multiple IDs.
>
> 2. Sesame normally uses the single ID returned while building the SQL
> query that retrieves triples from the property-named tables. It is not
> obvious how to make this work for multiple IDs from step 1.
>
> We are thinking of hacking the Postgres SAIL so that a single SQL query
> is built, which JOINs the results of looking up the internal IDs of URIs
> with triple retrieval. We would like to ask if anybody knows why the
> SAIL is implemented as it currently is (as opposed to building a single
> query in the first place) and point us to places where things might
> break.

First of all: you should know that the Sesame RDBMS/MySQL/PostgreSQL
Sail has been deprecated and is no longer officially supported (the main
reason for this decision that it its ongoing maintenance and development
put too much pressure on the dev team, against very little gain in terms
of a good scalable storage solution). It's of course still shipped with
Sesame and is functional, but new developments such as full SPARQL 1.1
support are not implemented.

Now, to answer your immediate question, if I remember correctly the
reason the RDBMS Sail does id-retrieval and actual query in two separate
steps is that the SAIL has a built-in id-caching mechanism vor Value
objects (stored as a property in the Value objects themselves - see
RdbmsValue and its subclasses). The idea therefore is that if the id is
cached the first part can be skipped, and only the second step is
executed. The thinking behind this approach was that Value objects can
be long-lived over the course of a session and repetitive queries using
the same collections of Values can be sped up by such a cache.

Now, to put a caveat on the above: this is from the top of my head and
without having looked at the RDBMS Sail code in quite a while, so I may
have some of the details wrong, but that's the general gist.


HTH,

Jeen




------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: Postgres SAIL question

Jeen Broekstra
Hi Stanisos,

On 2/10/13 3:44 PM, Stasinos Konstantopoulos wrote:

> we are trying to have Sesame operate over a modified Postgres, where
> we have implemented an index that returns multiple URIs that fit a
> constraint. In order to expose this functionality through Sesame, we
> are thinking of defining a special prefix. All triple patterns
> involving a URI with this prefix, should be handled as follows:
>
> 1. Sesame (as it currently does) first looks up the internal id for
> this URI in the uri_values table. The difference is that multiple
> URIs might fit the constraint, so that this might return multiple
> IDs.
>
> 2. Sesame normally uses the single ID returned while building the
> SQL query that retrieves triples from the property-named tables. It
> is not obvious how to make this work for multiple IDs from step 1.
>
> We are thinking of hacking the Postgres SAIL so that a single SQL
> query is built, which JOINs the results of looking up the internal
> IDs of URIs with triple retrieval. We would like to ask if anybody
> knows why the SAIL is implemented as it currently is (as opposed to
> building a single query in the first place) and point us to places
> where things might break.

First of all: you should know that the Sesame RDBMS/MySQL/PostgreSQL
Sail has been deprecated and is no longer officially supported (the main
reason for this decision that it its ongoing maintenance and development
put too much pressure on the dev team, against very little gain in terms
of a good scalable storage solution). It's of course still shipped with
Sesame and is functional, but new developments such as full SPARQL 1.1
support are not implemented.

Now, to answer your immediate question, if I remember correctly the
reason the RDBMS Sail does id-retrieval and actual query in two separate
steps is that the SAIL has a built-in id-caching mechanism vor Value
objects (stored as a property in the Value objects themselves - see
RdbmsValue and its subclasses). The idea therefore is that if the id is
cached the first part can be skipped, and only the second step is
executed. The thinking behind this approach was that Value objects can
be long-lived over the course of a session and repetitive queries using
the same collections of Values can be sped up by such a cache.

Now, to put a caveat on the above: this is from the top of my head and
without having looked at the RDBMS Sail code in quite a while, so I may
have some of the details wrong, but that's the general gist.


HTH,

Jeen

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: Postgres SAIL question

Stasinos Konstantopoulos

Jeen, hi again.

Thank you for your answer. Please see inline for some comments.

On Sun Oct  6 15:02:49 2013 Jeen Broekstra said:

> On 2/10/13 3:44 PM, Stasinos Konstantopoulos wrote:
>
> > we are trying to have Sesame operate over a modified Postgres, where
> > we have implemented an index that returns multiple URIs that fit a
> > constraint. In order to expose this functionality through Sesame, we
> > are thinking of defining a special prefix. All triple patterns
> > involving a URI with this prefix, should be handled as follows:
> >
> > 1. Sesame (as it currently does) first looks up the internal id for
> > this URI in the uri_values table. The difference is that multiple
> > URIs might fit the constraint, so that this might return multiple
> > IDs.
> >
> > 2. Sesame normally uses the single ID returned while building the
> > SQL query that retrieves triples from the property-named tables. It
> > is not obvious how to make this work for multiple IDs from step 1.
> >
> > We are thinking of hacking the Postgres SAIL so that a single SQL
> > query is built, which JOINs the results of looking up the internal
> > IDs of URIs with triple retrieval. We would like to ask if anybody
> > knows why the SAIL is implemented as it currently is (as opposed to
> > building a single query in the first place) and point us to places
> > where things might break.
>
> First of all: you should know that the Sesame RDBMS/MySQL/PostgreSQL
> Sail has been deprecated and is no longer officially supported (the main
> reason for this decision that it its ongoing maintenance and development
> put too much pressure on the dev team, against very little gain in terms
> of a good scalable storage solution). It's of course still shipped with
> Sesame and is functional, but new developments such as full SPARQL 1.1
> support are not implemented.

I noticed that, and was wondering about the reasons. Thanks for the
explanation.

Sparcity of developer resources and prioritization of their usage I
fully understand and sympathise with. Having said that, I find your
assertion that there is little gain (especially for larger stores)
somewhat surprizing, although I must admit it has been a couple of years
since the last time I tried filling up a Native Store to the bursting
point. Are there any publicly available datasets and/or published
experimental results to this effect?

> Now, to answer your immediate question, if I remember correctly the
> reason the RDBMS Sail does id-retrieval and actual query in two separate
> steps is that the SAIL has a built-in id-caching mechanism vor Value
> objects (stored as a property in the Value objects themselves - see
> RdbmsValue and its subclasses). The idea therefore is that if the id is
> cached the first part can be skipped, and only the second step is
> executed. The thinking behind this approach was that Value objects can
> be long-lived over the course of a session and repetitive queries using
> the same collections of Values can be sped up by such a cache.
>
> Now, to put a caveat on the above: this is from the top of my head and
> without having looked at the RDBMS Sail code in quite a while, so I may
> have some of the details wrong, but that's the general gist.

Thanks, this helped a lot. We have circumvented this mechanism by
changing the query poses to the triples tables. We will keep the list
posted if (when!) we get encouraging results.

Best,
Stasinos

------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general