native store performs much better than inmemory store for update queries

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

native store performs much better than inmemory store for update queries

René Verheij
Hi,

I've got a dataset of around 600.000 triples in an in memory store without any inference

I noticed some INSERT / DELETE queries are very slow, so I created a native store and inserted the same data.

Then I made a test file with some select, insert and delete queries, which gave these results:

in memory:
select: 1 second
insert data: 10 seconds
delete data: 10 seconds

native:
select: 2.6 seconds
insert data: 0.02 seconds
delete data: 0.02 seconds

so the select is a bit slower for native as expected, but insert/delete is about 500 times slower :-/
surely this must have to do with some type of inference right?

But to be clear, the SYSTEM repository tells me the type of this memory store called test1 is "openrdf:MemoryStore", so there shouldnt be any inference right?



any idea what causes this?

Cheers, René

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: native store performs much better than inmemory store for update queries

René Verheij
bumb? got to put this system live soon, so any help would be much appreciated!


2013/10/9 René Verheij <[hidden email]>
Hi,

I've got a dataset of around 600.000 triples in an in memory store without any inference

I noticed some INSERT / DELETE queries are very slow, so I created a native store and inserted the same data.

Then I made a test file with some select, insert and delete queries, which gave these results:

in memory:
select: 1 second
insert data: 10 seconds
delete data: 10 seconds

native:
select: 2.6 seconds
insert data: 0.02 seconds
delete data: 0.02 seconds

so the select is a bit slower for native as expected, but insert/delete is about 500 times slower :-/
surely this must have to do with some type of inference right?

But to be clear, the SYSTEM repository tells me the type of this memory store called test1 is "openrdf:MemoryStore", so there shouldnt be any inference right?



any idea what causes this?

Cheers, René


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: native store performs much better than inmemory store for update queries

Jeen Broekstra
In reply to this post by René Verheij
On 9/10/13 11:53 PM, René Verheij wrote:

> Hi,
>
> I've got a dataset of around 600.000 triples in an in memory store
> /without any inference
>
> /
> I noticed some INSERT / DELETE queries are very slow, so I created a
> native store and inserted the same data.
>
> Then I made a test file with some select, insert and delete queries,
> which gave these results:
>
> *in memory:*
> select: 1 second
> insert data: *10 seconds*
> delete data: *10 seconds*
>
> *native:*
> select: 2.6 seconds
> insert data: 0.02 seconds
> delete data: 0.02 seconds
>
> so the select is a bit slower for native as expected, but insert/delete
> is about 500 times slower :-/
> surely this must have to do with some type of inference right?
>
> But to be clear, the SYSTEM repository tells me the type of this memory
> store called test1 is "openrdf:MemoryStore", so there shouldnt be any
> inference right?

That's correct.

> any idea what causes this?

It's hard to say. It certainly doesn't look right but I haven't seen
this before. Can you provide the actual test code and queries so we can
try and reproduce your results? Easiest way is to open a ticket in JIRA
and add your test code as an attachment there.

Cheers,

Jeen

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: native store performs much better than inmemory store for update queries

James Leigh
In reply to this post by René Verheij
On Wed, 2013-10-09 at 12:53 +0200, René Verheij wrote:

> Hi,
>
>
> I've got a dataset of around 600.000 triples in an in memory store
> without any inference
>
>
> I noticed some INSERT / DELETE queries are very slow, so I created a
> native store and inserted the same data.
>
>
> Then I made a test file with some select, insert and delete queries,
> which gave these results:
>
>
> in memory:
> select: 1 second
> insert data: 10 seconds
> delete data: 10 seconds
>
>
> native:
>
> select: 2.6 seconds
> insert data: 0.02 seconds
> delete data: 0.02 seconds
>
> so the select is a bit slower for native as expected, but
> insert/delete is about 500 times slower :-/
> surely this must have to do with some type of inference right?

How did you setup the two stores? If you pass a dataDir to the memory
store, it will slow down write operations significantly.

The memory store uses a hash based approach, while the native uses a
btree based approach. Due to OS disk caching, both likely are operating
in physical memory space.

Can you share some statistics on the machine you are using? Try using
jconsole or visualvm to monitor memory usage.

There is what you should watch out for:
      * I/O operations.
              * Memory store has an option to serialize to disk on
                commit, this is a complete data dump and can take a
                significant amount of time.
      * GC overhead
              * The native store is much more conservative on memory
                usage and reduces the need for system garbage collection
                (gc).

James





------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: native store performs much better than inmemory store for update queries

René Verheij
Thanks for your replies!

              * Memory store has an option to serialize to disk on
                commit, this is a complete data dump and can take a
                significant amount of time.

Aha! ... Yes, this would be the  ms:persist "true" setting in de SYSTEM repository, right? I didn't think about that yet.

I just did a test with ms:persist "false" and that solves it.

in memory (without persist):
select: 1 second
insert data: 0.015 seconds
delete data: 0.015 seconds

Great! Tnx! 

However, that leads to another question: if I turn "persist" off in my production environment, and the server loses power for whatever reason, I will loose all the data right?
So it means I will need to make backups from time to time (just saw your sesame-backup tool Jeen, great!) and if the server ever gets reset I need to recreate the stores and re-insert the last backup? Am I thinking correct here?


Just to complete the info: I'm doing everything in PHP, so I'm only using Sesame over HTTP, and I was automatically setting up the stores with this code:

$systemStore = new SesameStore($this->dsn,'SYSTEM');
$context = ROOT.'repositories/'.$this->repository;
$systemStore->append('

<'.$context.'> a rep:RepositoryContext.
[] a rep:Repository ;
  rep:repositoryID "'.$this->repository.'" ;
  rdfs:label "'.$this->repository.'" ;
  rep:repositoryImpl [
 rep:repositoryType "openrdf:SailRepository" ;
 sr:sailImpl [
sail:sailType "openrdf:MemoryStore" ;
ms:persist "true"
 ]
  ].',$context ,self::TURTLE);






2013/10/11 James Leigh <[hidden email]>
On Wed, 2013-10-09 at 12:53 +0200, René Verheij wrote:
> Hi,
>
>
> I've got a dataset of around 600.000 triples in an in memory store
> without any inference
>
>
> I noticed some INSERT / DELETE queries are very slow, so I created a
> native store and inserted the same data.
>
>
> Then I made a test file with some select, insert and delete queries,
> which gave these results:
>
>
> in memory:
> select: 1 second
> insert data: 10 seconds
> delete data: 10 seconds
>
>
> native:
>
> select: 2.6 seconds
> insert data: 0.02 seconds
> delete data: 0.02 seconds
>
> so the select is a bit slower for native as expected, but
> insert/delete is about 500 times slower :-/
> surely this must have to do with some type of inference right?

How did you setup the two stores? If you pass a dataDir to the memory
store, it will slow down write operations significantly.

The memory store uses a hash based approach, while the native uses a
btree based approach. Due to OS disk caching, both likely are operating
in physical memory space.

Can you share some statistics on the machine you are using? Try using
jconsole or visualvm to monitor memory usage.

There is what you should watch out for:
      * I/O operations.
              * Memory store has an option to serialize to disk on
                commit, this is a complete data dump and can take a
                significant amount of time.
      * GC overhead
              * The native store is much more conservative on memory
                usage and reduces the need for system garbage collection
                (gc).

James





------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: native store performs much better than inmemory store for update queries

Jeen Broekstra
On 14/10/13 10:35 PM, René Verheij wrote:

> Thanks for your replies!
>
>                * Memory store has an option to serialize to disk on
>                  commit, this is a complete data dump and can take a
>                  significant amount of time.
>
> Aha! ... Yes, this would be the *ms:persist "true"* setting in de SYSTEM
> repository, right? I didn't think about that yet.
>
> I just did a test with ms:persist "false" and that solves it.
>
> *in memory (without persist):*
> select: 1 second
> insert data: 0.015 seconds
> delete data: 0.015 seconds
>
> Great! Tnx!
>
> However, that leads to another question: if I turn "persist" off in my
> production environment, and the server loses power for whatever reason,
> I will loose all the data right?

Yes.

However, you may also consider an alternative: instead of turning
persist off completely, set the 'syncDelay' to a larger value. The
syncDelay is the number of milliseconds the memory store waits between
transactions before serializing to disk. If a new transaction comes in
in before the delay ends, the serialization to disk is postponed. So, in
your scenario, with a number of consecutive updates in rapid succession,
the contents would only be serialized to disk at the end, which should
significantly improve performance.

> So it means I will need to make backups from time to time (just saw your
> sesame-backup <https://bitbucket.org/insightng-ondemand/sesame-backup>
> tool Jeen, great!) and if the server ever gets reset I need to recreate
> the stores and re-insert the last backup? Am I thinking correct here?

Yes, and I'd say that making backups is a good idea regardless of
whether you've disabled the memory store's persistence or not.

The memory store's persistence mechanism is not secure or in any way
guaranteed failsafe - if a failure occurs during serialization the data
can easily be corrupted. So making backups is a good idea in any case.

Jeen

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: native store performs much better than inmemory store for update queries

René Verheij
agreed making backups is a good idea :) .. Though I will increase the amount to daily backups if I completely turn off persist.

As I understand from your description, setting a syncDelay would still mean serialisation to disk would happen within one HTTP request right?
That is, no http response would come untill the serialisation - at the end of a series of queries - is complete?

Because in that case, I will still have to turn persistance off completely, since I actually make many small update requests, where sometimes the user will have to wait for the response. And then any serialisation to disk (which happens to take 10 seconds) is just too much. 

Thanks for you help Jeen, much appreciated.


2013/10/14 Jeen Broekstra <[hidden email]>
On 14/10/13 10:35 PM, René Verheij wrote:
> Thanks for your replies!
>
>                * Memory store has an option to serialize to disk on
>                  commit, this is a complete data dump and can take a
>                  significant amount of time.
>
> Aha! ... Yes, this would be the *ms:persist "true"* setting in de SYSTEM
> repository, right? I didn't think about that yet.
>
> I just did a test with ms:persist "false" and that solves it.
>
> *in memory (without persist):*
> select: 1 second
> insert data: 0.015 seconds
> delete data: 0.015 seconds
>
> Great! Tnx!
>
> However, that leads to another question: if I turn "persist" off in my
> production environment, and the server loses power for whatever reason,
> I will loose all the data right?

Yes.

However, you may also consider an alternative: instead of turning
persist off completely, set the 'syncDelay' to a larger value. The
syncDelay is the number of milliseconds the memory store waits between
transactions before serializing to disk. If a new transaction comes in
in before the delay ends, the serialization to disk is postponed. So, in
your scenario, with a number of consecutive updates in rapid succession,
the contents would only be serialized to disk at the end, which should
significantly improve performance.

> So it means I will need to make backups from time to time (just saw your
> sesame-backup <https://bitbucket.org/insightng-ondemand/sesame-backup>
> tool Jeen, great!) and if the server ever gets reset I need to recreate
> the stores and re-insert the last backup? Am I thinking correct here?

Yes, and I'd say that making backups is a good idea regardless of
whether you've disabled the memory store's persistence or not.

The memory store's persistence mechanism is not secure or in any way
guaranteed failsafe - if a failure occurs during serialization the data
can easily be corrupted. So making backups is a good idea in any case.

Jeen

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: native store performs much better than inmemory store for update queries

Arjohn Kampman-2
On 14/10/2013 13:41, René Verheij wrote:
As I understand from your description, setting a syncDelay would still mean serialisation to disk would happen within one HTTP request right?
That is, no http response would come untill the serialisation - at the end of a series of queries - is complete?

Not exactly. Upon commit, the memory store starts a timer to save to disk after <syncDelay> milliseconds. When that time has passed, a background thread will take care of the save-to-file operation. However, if a new transaction is started before time runs out, then the timer is stopped and restarted upon the next commit. So the file store operation will be postponed as long as new transactions are started within the specified time after the last commit.

--
Arjohn Kampman
Vound, LLC
[hidden email]
http://www.vound-software.com

This email and its attachment are private. The sender does not waive any confidentiality or privilege.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: native store performs much better than inmemory store for update queries

René Verheij
I just got around testing this, and actually it seems it does.

( please remember: I'm talking about http requests only, I'm using Sesame as a triple store that I only access through HTTP in PHP.)

It seems that a store with persist = true AND syncDelay = 500 still finishes the syncing before returning the HTTP request, as my update requests went up to over 10 seconds again with this setting (and they take about 0.1 second with persist = false)

Can anyone corfirm that this is expected behaviour?


2013/10/14 Arjohn Kampman <[hidden email]>
On 14/10/2013 13:41, René Verheij wrote:
As I understand from your description, setting a syncDelay would still mean serialisation to disk would happen within one HTTP request right?
That is, no http response would come untill the serialisation - at the end of a series of queries - is complete?

Not exactly. Upon commit, the memory store starts a timer to save to disk after <syncDelay> milliseconds. When that time has passed, a background thread will take care of the save-to-file operation. However, if a new transaction is started before time runs out, then the timer is stopped and restarted upon the next commit. So the file store operation will be postponed as long as new transactions are started within the specified time after the last commit.

--
Arjohn Kampman
Vound, LLC
[hidden email]
http://www.vound-software.com

This email and its attachment are private. The sender does not waive any confidentiality or privilege.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: native store performs much better than inmemory store for update queries

Arjohn Kampman-2
AFAIR, that's not how it is supposed to work. I implemented this a long, long time ago though. Did you try with a larger delay? Also, if you enable debug logging, you should see "syncing data to file..." messages in the logs.

On 17/10/2013 20:58, René Verheij wrote:
I just got around testing this, and actually it seems it does.

( please remember: I'm talking about http requests only, I'm using Sesame as a triple store that I only access through HTTP in PHP.)

It seems that a store with persist = true AND syncDelay = 500 still finishes the syncing before returning the HTTP request, as my update requests went up to over 10 seconds again with this setting (and they take about 0.1 second with persist = false)

Can anyone corfirm that this is expected behaviour?


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general