TurtleWriter and blank nodes

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

TurtleWriter and blank nodes

Simon Rainer
Dear list,

I'm using Rio to serialize RDF to Turtle like shown in the user docs: http://openrdf.callimachus.net/sesame/2.7/docs/users.docbook?view#chapter-rio3

I have data which includes blank nodes. The Turtle serialization automatically creates labels for those blank nodes, and then serializes the nodes "flat", i.e. as separate resources in the output, so technically the RDF is fine.

But is there a way to produce Turtle that has those blank nodes "inlined", like so:

:a      a           foaf:Person ;
        foaf:knows  [ foaf:name  "Bob" ] ;
        foaf:name   "Alice" .

Best regards,
Rainer



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: TurtleWriter and blank nodes

James Leigh
Hi Rainer,

The rio API (used in Sesame) is design for streaming unordered triples
and is not well suited for pretty serialization. Before one can
serialize blank nodes like this, the triples need to be sorted. Sorting
can become a problem for large datasets, because there is nothing in RDF
blank nodes to indicate all properties of a blank node have been read.

I wrote a turtle writer that pretty serializes turtle using the compact
blank node syntax, so long as the order of the triples being serialized
closely matches the order in which they are written. This implementation
also writes out URIs using a relative URI when possible.

The DescribeResult[1] groups the triples to be written by blank node
(all triples with the same blank node appear together). The
ArrangedWriter[2] preserves the blank node grouping, but also (to a
degree) arranges the properties alphabetically (by URI). Finally, the
TurtleStreamWriter[3] serializes them to disk using the compact blank
node syntax when possible.

Regards,
James

[1] https://code.google.com/p/callimachus/source/browse/src/org/callimachusproject/io/DescribeResult.java?name=1.2.x
[2] https://code.google.com/p/callimachus/source/browse/src/org/callimachusproject/io/ArrangedWriter.java?name=1.2.x
[3] https://code.google.com/p/callimachus/source/browse/src/org/callimachusproject/io/TurtleStreamWriter.java?name=1.2.x



On Tue, 2013-10-08 at 10:12 +0000, Simon Rainer wrote:

> Dear list,
>
> I'm using Rio to serialize RDF to Turtle like shown in the user docs: http://openrdf.callimachus.net/sesame/2.7/docs/users.docbook?view#chapter-rio3
>
> I have data which includes blank nodes. The Turtle serialization automatically creates labels for those blank nodes, and then serializes the nodes "flat", i.e. as separate resources in the output, so technically the RDF is fine.
>
> But is there a way to produce Turtle that has those blank nodes "inlined", like so:
>
> :a      a           foaf:Person ;
>         foaf:knows  [ foaf:name  "Bob" ] ;
>         foaf:name   "Alice" .
>
> Best regards,
> Rainer
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> _______________________________________________
> Sesame-general mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/sesame-general
>




------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: TurtleWriter and blank nodes

Simon Rainer
Hi James,

fantastic - exactly what I was looking for and works like a charm. Many many thanks!

Cheers,
Rainer


________________________________________
Von: James Leigh [[hidden email]]
Gesendet: Dienstag, 08. Oktober 2013 16:20
An: Sesame discussion list
Betreff: Re: [Sesame] TurtleWriter and blank nodes

Hi Rainer,

The rio API (used in Sesame) is design for streaming unordered triples
and is not well suited for pretty serialization. Before one can
serialize blank nodes like this, the triples need to be sorted. Sorting
can become a problem for large datasets, because there is nothing in RDF
blank nodes to indicate all properties of a blank node have been read.

I wrote a turtle writer that pretty serializes turtle using the compact
blank node syntax, so long as the order of the triples being serialized
closely matches the order in which they are written. This implementation
also writes out URIs using a relative URI when possible.

The DescribeResult[1] groups the triples to be written by blank node
(all triples with the same blank node appear together). The
ArrangedWriter[2] preserves the blank node grouping, but also (to a
degree) arranges the properties alphabetically (by URI). Finally, the
TurtleStreamWriter[3] serializes them to disk using the compact blank
node syntax when possible.

Regards,
James

[1] https://code.google.com/p/callimachus/source/browse/src/org/callimachusproject/io/DescribeResult.java?name=1.2.x
[2] https://code.google.com/p/callimachus/source/browse/src/org/callimachusproject/io/ArrangedWriter.java?name=1.2.x
[3] https://code.google.com/p/callimachus/source/browse/src/org/callimachusproject/io/TurtleStreamWriter.java?name=1.2.x



On Tue, 2013-10-08 at 10:12 +0000, Simon Rainer wrote:

> Dear list,
>
> I'm using Rio to serialize RDF to Turtle like shown in the user docs: http://openrdf.callimachus.net/sesame/2.7/docs/users.docbook?view#chapter-rio3
>
> I have data which includes blank nodes. The Turtle serialization automatically creates labels for those blank nodes, and then serializes the nodes "flat", i.e. as separate resources in the output, so technically the RDF is fine.
>
> But is there a way to produce Turtle that has those blank nodes "inlined", like so:
>
> :a      a           foaf:Person ;
>         foaf:knows  [ foaf:name  "Bob" ] ;
>         foaf:name   "Alice" .
>
> Best regards,
> Rainer
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> _______________________________________________
> Sesame-general mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/sesame-general
>




------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|

Re: TurtleWriter and blank nodes

Peter Ansell-2
In reply to this post by James Leigh
Jeen implemented a similar concept to DescribeResult in the Sesame codebase generically as BufferedGroupingRDFHandler ( SES-1080 )

I have also been working on relative uri serialisation in SES-1871 and have a branch open on BitBucket although no pull request yet.

It may be useful to implement a WriterConfig setting for the anonymous blank node syntax. As you mention there is a basic reason why any streaming API cannot support anonymous blank nodes on arbitrary collections but if users want the feature, particularly for small in-memory datasets, then we could support it without writing a new serialiser just for that feature.

--
Peter

On 09/10/2013, at 1:16 AM, James Leigh <[hidden email]> wrote:

> Hi Rainer,
>
> The rio API (used in Sesame) is design for streaming unordered triples
> and is not well suited for pretty serialization. Before one can
> serialize blank nodes like this, the triples need to be sorted. Sorting
> can become a problem for large datasets, because there is nothing in RDF
> blank nodes to indicate all properties of a blank node have been read.
>
> I wrote a turtle writer that pretty serializes turtle using the compact
> blank node syntax, so long as the order of the triples being serialized
> closely matches the order in which they are written. This implementation
> also writes out URIs using a relative URI when possible.
>
> The DescribeResult[1] groups the triples to be written by blank node
> (all triples with the same blank node appear together). The
> ArrangedWriter[2] preserves the blank node grouping, but also (to a
> degree) arranges the properties alphabetically (by URI). Finally, the
> TurtleStreamWriter[3] serializes them to disk using the compact blank
> node syntax when possible.
>
> Regards,
> James
>
> [1] https://code.google.com/p/callimachus/source/browse/src/org/callimachusproject/io/DescribeResult.java?name=1.2.x
> [2] https://code.google.com/p/callimachus/source/browse/src/org/callimachusproject/io/ArrangedWriter.java?name=1.2.x
> [3] https://code.google.com/p/callimachus/source/browse/src/org/callimachusproject/io/TurtleStreamWriter.java?name=1.2.x
>
>
>
> On Tue, 2013-10-08 at 10:12 +0000, Simon Rainer wrote:
>> Dear list,
>>
>> I'm using Rio to serialize RDF to Turtle like shown in the user docs: http://openrdf.callimachus.net/sesame/2.7/docs/users.docbook?view#chapter-rio3
>>
>> I have data which includes blank nodes. The Turtle serialization automatically creates labels for those blank nodes, and then serializes the nodes "flat", i.e. as separate resources in the output, so technically the RDF is fine.
>>
>> But is there a way to produce Turtle that has those blank nodes "inlined", like so:
>>
>> :a      a           foaf:Person ;
>>        foaf:knows  [ foaf:name  "Bob" ] ;
>>        foaf:name   "Alice" .
>>
>> Best regards,
>> Rainer
>>
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
>> the latest Intel processors and coprocessors. See abstracts and register >
>> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Sesame-general mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/sesame-general
>
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> _______________________________________________
> Sesame-general mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/sesame-general

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general