Configuring Sesame for a benchmark

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Configuring Sesame for a benchmark

Saud Aljaloud-2
Dear Sesame team,

We are investigating how efficient different triple stores, including Sesame, handle literal strings within SPARQL. 
To this end, We are now working on benchmarking these triple stores against a set of specific queries, using the Berlin Benchmark (BSBM) test driver [1], dataset and matrices[2].

We are using the latest Sesame releases: openrdf-sesame-2.7.6 and luceneSail-1.1.0.

To get the best out of Sesame, we would like to ask your valuable feedback and other optimisations that can boost the performance of Sesame.
I should provide more info, but non-public communication with someone/group from Sesame who are willing to be directly contacted is preferable.

Below are the general setup we did on a test run.

=======================
Tomcat/Sesame Configurations:
1- edit catalina.sh :
export JAVA_OPTS='-Xmx20G '

2- edit: console.sh 
JAVA_OPT=-Xmx20G

=======================
Sesame Test procedure: for BSBM1M (one million triples): using a machine with specs [3,4]

1- Start server:
./tomcat/bin/startup.sh
create lucenesail.
open BSBM1M. 

2- load data using console.sh:
load /home/path/bsbmtools-0.2/dataset_1M.ttl.

3- Shutdown:
./tomcat/bin/shutdown.sh


4- Flush OS memory and swap.

5- Restart server:
./tomcat/bin/startup.sh

6- Run test using BSBM driver:
./testdriver -ucf usecases/literalSearch/sesame.txt -w 1000 -o Sesame_1Client_BSBM1M.xml http://localhost:7788/openrdf-workbench/repositories/BSBM1M/query?limit=0

=======================


Kind Regards,

Saud

[3] [2]2x AMD Opteron 4280 Processor (2.8GHz, 8C, 8M L2/8M L3 Cache, 95W), DDR3-1600 MHz 128GB Memory for 2CPU (8x16GB Quad Rank LV RDIMMs) 1066MHz 2x 300GB, SAS 6Gbps, 3.5-in, 15K RPM Hard Drive (Hot-plug) SAS 6/iR Controller, For Hot Plug HDD Chassis No Optical Drive Redundant Power Supply (2 PSU) 500W 2M Rack Power Cord C13/C14 12A iDRAC6 Enterprise Sliding Ready Rack Rails C11 Hot-Swap - R0 for SAS 6iR, Min. 2 Max. 4 SAS/SATA Hot Plug Drives
[4] Red Hat Enterprise Linux Server release 6.4 (Santiago), java version "1.7.0_51", 




------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Configuring Sesame for a benchmark

Jeen Broekstra
On 22/04/14 4:21, Saud Aljaloud wrote:

> We are investigating how efficient different triple stores, including
> Sesame, handle literal strings within SPARQL.

Interesting. Are you planning to publish these results?

As a small nitpick: "Sesame" is a framework, not a particular
triplestore. I assume you will be testing multiple triplestores,
including the Sesame native and memory stores, but also third-party
Sesame stores, like for example OWLIM or Bigdata?

> To this end, We are now working on benchmarking these triple stores
> against a set of specific queries, using the Berlin Benchmark (BSBM)
> test driver [1], dataset and matrices[2].
>
> We are using the latest Sesame releases: openrdf-sesame-2.7.6 and
> luceneSail-1.1.0.

That's not actually the latest release. Sesame is currently at release
2.7.11, with a 2.8.0 release coming out in a few weeks time hopefully
(though we have no definite schedule for this yet).

As for the LuceneSail, there two things to note: first of all, it's not
a Sesame core component but a third-party library, and as far as I know
it hasn't seen active development for a while (though we are actually
working on updating and reintegrating it into Sesame - this is planned
for release 2.8).

Secondly, you're aware that the LuceneSail works with specific query
patterns (using certain built-in properties) only? It will not have any
influence on the performance of standard SPARQL functions like REGEX or
STRSTARTS.

> To get the best out of Sesame, we would like to ask your valuable
> feedback and other optimisations that can boost the performance of Sesame.
> I should provide more info, but non-public communication with
> someone/group from Sesame who are willing to be directly contacted is
> preferable.

I can't currently commit to private consulting unfortunately, but I am
happy to try and help out as best I can on this public forum.

> Below are the general setup we did on a test run.
>
> =======================
> *Tomcat/Sesame Configurations*:
> 1- edit catalina.sh :
> export JAVA_OPTS='-Xmx20G '
>
> 2- edit: console.sh
> JAVA_OPT=-Xmx20G

Assuming you are running on a 64-bit platform with a recent 64-bit jvm
this will work, though I am not sure that you will ever need such a huge
heap size - most SPARQL querying on the native store at least is
I/O-constrained rather than memory constrained (unless you start doing
things like grouping, aggregating, or sorting). But if you can spare it
can't hurt either.

I should also point out that obviously the console and the Tomcat server
are running in different JREs, and if you're using them at the same time
you will need a system with 40G of RAM to spare, or the setting becomes
pointless. But looking at your machine specs, that part of it seems
taken care of.

Last but not least I'd also advise to increase the PermGen space on at
least the (Tomcat) server side. Sesame's SPARQL engine uses a
(recursive) visitor pattern for its query evaluation, which may run into
StackOverflowErrors on huge or very complex datasets/queries, otherwise.
Increasing PermGen space mitigates that. Not sure what size you'll need
but if you can spare it, make it 512M or more and see how far it gets
you :)

> =======================
> *Sesame Test procedure*: for BSBM1M (one million triples): using a
> machine with specs [3,4]
>
> 1- Start server:
> ./tomcat/bin/startup.sh
> create lucenesail.
> open BSBM1M.

I am slightly out of touch with the lucenesail's inner working but IIRC
it is not a completely independent store, but works in combination with
another store (e.g. a Sesame native or in-memory store). If so, you will
have to pick one and configure it properly.

For example, in the case of the native store, you should configure the
indexes - given that the dataset is not all that large I'd just go all
out and create a full index (spoc, posc, ospc, cspo, sopc, pcso). This
will negatively impact data upload speed, but it should give you the
best possible query speed.

> 2- load data using console.sh:
> load /home/path/bsbmtools-0.2/dataset_1M.ttl.

For the given platform and dataset size, this should be fine.

> 3- Shutdown:
> ./tomcat/bin/shutdown.sh
>
>
> 4- Flush OS memory and swap.
>
> 5- Restart server:
> ./tomcat/bin/startup.sh
>
> 6- Run test using BSBM driver:
> ./testdriver -ucf usecases/literalSearch/sesame.txt -w 1000 -o
> Sesame_1Client_BSBM1M.xml
> http://localhost:7788/openrdf-workbench/repositories/BSBM1M/query?limit=0 <http://localhost:5555/openrdf-workbench/repositories/BSBM1M/query?limit=0>

Ah, I see now why you're using Sesame on Tomcat: you're relying on the
BSBM test driver. That's fine. Looking at your execution parameters:
don't you think that 1000 warmup runs is slightly excessive? I would
expect in the order of 10-20 to be more than enough to get our indexes
warmed up nicely.

HTH. Let us know how you get on and if anything is unclear.

Cheers,

Jeen


------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Configuring Sesame for a benchmark

Saud Aljaloud-2
Hi Jeen,


On 23 Apr 2014, at 01:39, Jeen Broekstra <[hidden email]> wrote:
> Interesting. Are you planning to publish these results?

Yes, and once this is done, I'll drop a link here as well.


> I assume you will be testing multiple triplestores,
> including the Sesame native and memory stores, but also third-party
> Sesame stores, like for example OWLIM or Bigdata?

We are mainly looking at disk-based stores. Sesame native store is the target one within OpenRDF, also the above and others :)


> That's not actually the latest release. Sesame is currently at release
> 2.7.11, with a 2.8.0 release coming out in a few weeks time hopefully
> (though we have no definite schedule for this yet).

Thanks for pointing this out.

> As for the LuceneSail, there two things to note: first of all, it's not
> a Sesame core component but a third-party library, and as far as I know
> it hasn't seen active development for a while (though we are actually
> working on updating and reintegrating it into Sesame - this is planned
> for release 2.8).

One part of the test is full text capabilities/performance of different stores, without this, Sesame native may not be a candidate.
It's good to hear that there will be an official support from Sesame for full text.
I'd be interested to see what would change.


> Secondly, you're aware that the LuceneSail works with specific query
> patterns (using certain built-in properties) only?

This is the case for all other stores, as full text is not standardised yet :(


> Last but not least I'd also advise to increase the PermGen space on at
> least the (Tomcat) server side. Sesame's SPARQL engine uses a
> (recursive) visitor pattern for its query evaluation, which may run into
> StackOverflowErrors on huge or very complex datasets/queries, otherwise.
> Increasing PermGen space mitigates that. Not sure what size you'll need
> but if you can spare it, make it 512M or more and see how far it gets
> you :)

Nice, I'll keep that in mind.

> Looking at your execution parameters:
> don't you think that 1000 warmup runs is slightly excessive?

BSBM generates different resources within each run/query which should challenge the caching strategies. The original BSBM even runs 2000 warm ups, so we plan for the final runs.

> HTH. Let us know how you get on and if anything is unclear.


Many thanks for your comments,

Saud



> On 22/04/14 4:21, Saud Aljaloud wrote:
>
>> We are investigating how efficient different triple stores, including
>> Sesame, handle literal strings within SPARQL.
>
> Interesting. Are you planning to publish these results?
>
> As a small nitpick: "Sesame" is a framework, not a particular
> triplestore. I assume you will be testing multiple triplestores,
> including the Sesame native and memory stores, but also third-party
> Sesame stores, like for example OWLIM or Bigdata?
>
>> To this end, We are now working on benchmarking these triple stores
>> against a set of specific queries, using the Berlin Benchmark (BSBM)
>> test driver [1], dataset and matrices[2].
>>
>> We are using the latest Sesame releases: openrdf-sesame-2.7.6 and
>> luceneSail-1.1.0.
>
> That's not actually the latest release. Sesame is currently at release
> 2.7.11, with a 2.8.0 release coming out in a few weeks time hopefully
> (though we have no definite schedule for this yet).
>
> As for the LuceneSail, there two things to note: first of all, it's not
> a Sesame core component but a third-party library, and as far as I know
> it hasn't seen active development for a while (though we are actually
> working on updating and reintegrating it into Sesame - this is planned
> for release 2.8).
>
> Secondly, you're aware that the LuceneSail works with specific query
> patterns (using certain built-in properties) only? It will not have any
> influence on the performance of standard SPARQL functions like REGEX or
> STRSTARTS.
>
>> To get the best out of Sesame, we would like to ask your valuable
>> feedback and other optimisations that can boost the performance of Sesame.
>> I should provide more info, but non-public communication with
>> someone/group from Sesame who are willing to be directly contacted is
>> preferable.
>
> I can't currently commit to private consulting unfortunately, but I am
> happy to try and help out as best I can on this public forum.
>
>> Below are the general setup we did on a test run.
>>
>> =======================
>> *Tomcat/Sesame Configurations*:
>> 1- edit catalina.sh :
>> export JAVA_OPTS='-Xmx20G '
>>
>> 2- edit: console.sh
>> JAVA_OPT=-Xmx20G
>
> Assuming you are running on a 64-bit platform with a recent 64-bit jvm
> this will work, though I am not sure that you will ever need such a huge
> heap size - most SPARQL querying on the native store at least is
> I/O-constrained rather than memory constrained (unless you start doing
> things like grouping, aggregating, or sorting). But if you can spare it
> can't hurt either.
>
> I should also point out that obviously the console and the Tomcat server
> are running in different JREs, and if you're using them at the same time
> you will need a system with 40G of RAM to spare, or the setting becomes
> pointless. But looking at your machine specs, that part of it seems
> taken care of.
>
> Last but not least I'd also advise to increase the PermGen space on at
> least the (Tomcat) server side. Sesame's SPARQL engine uses a
> (recursive) visitor pattern for its query evaluation, which may run into
> StackOverflowErrors on huge or very complex datasets/queries, otherwise.
> Increasing PermGen space mitigates that. Not sure what size you'll need
> but if you can spare it, make it 512M or more and see how far it gets
> you :)
>
>> =======================
>> *Sesame Test procedure*: for BSBM1M (one million triples): using a
>> machine with specs [3,4]
>>
>> 1- Start server:
>> ./tomcat/bin/startup.sh
>> create lucenesail.
>> open BSBM1M.
>
> I am slightly out of touch with the lucenesail's inner working but IIRC
> it is not a completely independent store, but works in combination with
> another store (e.g. a Sesame native or in-memory store). If so, you will
> have to pick one and configure it properly.
>
> For example, in the case of the native store, you should configure the
> indexes - given that the dataset is not all that large I'd just go all
> out and create a full index (spoc, posc, ospc, cspo, sopc, pcso). This
> will negatively impact data upload speed, but it should give you the
> best possible query speed.
>
>> 2- load data using console.sh:
>> load /home/path/bsbmtools-0.2/dataset_1M.ttl.
>
> For the given platform and dataset size, this should be fine.
>
>> 3- Shutdown:
>> ./tomcat/bin/shutdown.sh
>>
>>
>> 4- Flush OS memory and swap.
>>
>> 5- Restart server:
>> ./tomcat/bin/startup.sh
>>
>> 6- Run test using BSBM driver:
>> ./testdriver -ucf usecases/literalSearch/sesame.txt -w 1000 -o
>> Sesame_1Client_BSBM1M.xml
>> http://localhost:7788/openrdf-workbench/repositories/BSBM1M/query?limit=0 <http://localhost:5555/openrdf-workbench/repositories/BSBM1M/query?limit=0>
>
> Ah, I see now why you're using Sesame on Tomcat: you're relying on the
> BSBM test driver. That's fine. Looking at your execution parameters:
> don't you think that 1000 warmup runs is slightly excessive? I would
> expect in the order of 10-20 to be more than enough to get our indexes
> warmed up nicely.
>
> HTH. Let us know how you get on and if anything is unclear.
>
> Cheers,
>
> Jeen
>
>
> ------------------------------------------------------------------------------
> Start Your Social Network Today - Download eXo Platform
> Build your Enterprise Intranet with eXo Platform Software
> Java Based Open Source Intranet - Social, Extensible, Cloud Ready
> Get Started Now And Turn Your Intranet Into A Collaboration Platform
> http://p.sf.net/sfu/ExoPlatform
> _______________________________________________
> Sesame-general mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/sesame-general


------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Configuring Sesame for a benchmark

Saud Aljaloud-2
In reply to this post by Saud Aljaloud-2
Hi Muhammad,


On 23 Apr 2014, at 02:31, Muhammad Saleem <[hidden email]> wrote:

Hi Saud,

This can be a very nice evaluation work. Just a small suggestion, since you are planning to use BSBM (a synthatic data benchmark) for comparison, I might be more interested to see the performance evaluation on real data benchmark with real queries. You may consider DBpedia SPARQL benchmark [1] containing real data and queries selected from query-log. 



Good suggestion. That could be the next stage.

Many thanks,
Saud





Best,
Muhammad Saleem
PhD student AKSW, University of Leipzig, Germany
Skype: saleem.muhammd


On Mon, Apr 21, 2014 at 6:21 PM, Saud Aljaloud <[hidden email]> wrote:
Dear Sesame team,

We are investigating how efficient different triple stores, including Sesame, handle literal strings within SPARQL. 
To this end, We are now working on benchmarking these triple stores against a set of specific queries, using the Berlin Benchmark (BSBM) test driver [1], dataset and matrices[2].

We are using the latest Sesame releases: openrdf-sesame-2.7.6 and luceneSail-1.1.0.

To get the best out of Sesame, we would like to ask your valuable feedback and other optimisations that can boost the performance of Sesame.
I should provide more info, but non-public communication with someone/group from Sesame who are willing to be directly contacted is preferable.

Below are the general setup we did on a test run.

=======================
Tomcat/Sesame Configurations:
1- edit catalina.sh :
export JAVA_OPTS='-Xmx20G '

2- edit: console.sh 
JAVA_OPT=-Xmx20G

=======================
Sesame Test procedure: for BSBM1M (one million triples): using a machine with specs [3,4]

1- Start server:
./tomcat/bin/startup.sh
create lucenesail.
open BSBM1M. 

2- load data using console.sh:
load /home/path/bsbmtools-0.2/dataset_1M.ttl.

3- Shutdown:
./tomcat/bin/shutdown.sh


4- Flush OS memory and swap.

5- Restart server:
./tomcat/bin/startup.sh

6- Run test using BSBM driver:
./testdriver -ucf usecases/literalSearch/sesame.txt -w 1000 -o Sesame_1Client_BSBM1M.xml http://localhost:7788/openrdf-workbench/repositories/BSBM1M/query?limit=0

=======================


Kind Regards,

Saud

[3] [2]2x AMD Opteron 4280 Processor (2.8GHz, 8C, 8M L2/8M L3 Cache, 95W), DDR3-1600 MHz 128GB Memory for 2CPU (8x16GB Quad Rank LV RDIMMs) 1066MHz 2x 300GB, SAS 6Gbps, 3.5-in, 15K RPM Hard Drive (Hot-plug) SAS 6/iR Controller, For Hot Plug HDD Chassis No Optical Drive Redundant Power Supply (2 PSU) 500W 2M Rack Power Cord C13/C14 12A iDRAC6 Enterprise Sliding Ready Rack Rails C11 Hot-Swap - R0 for SAS 6iR, Min. 2 Max. 4 SAS/SATA Hot Plug Drives
[4] Red Hat Enterprise Linux Server release 6.4 (Santiago), java version "1.7.0_51", 




------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general




------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Loading...