Fatal Error - Content is not allowed in prolog - Concurrent parsing of RDF

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fatal Error - Content is not allowed in prolog - Concurrent parsing of RDF

Jerry George
Hi,

We are trying to benchmark a set of pre-processing steps (parsing,validation,etc) before actually committing to Sesame SAIL. 

We are using an RDF client (with Sesame 2.6.9) backend. We have over 40 simultaneous requests to parse/validate using Java Concurrency API. Later, using the multiple spawned threads, we add the RDF files to SAIL (via Sesame REST API). However, we seems to encounter the following,

[Fatal Error] www.xyz.com:1:1: Content is not allowed in prolog.

The code seems to work perfectly, if I do just fewer requests or if the code is run slower, with thread sleeps. Please note that I do have some pretty detailed and custom validation/processing steps, using RDFHandlerBase before posting the file to repository. Would the problem be related to open issue SES-1066.

Besides, could anyone give me some tips as to usage of Sesame in environment were we expect upto 100-200 requests/sec. For SPARQL requests, Jeen had  proposed an answer here,
Not sure if it is still relevant for Sesame 2.6.9.

Any help would be much appreciated.

Thanks.

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fatal Error - Content is not allowed in prolog - Concurrent parsing of RDF

Peter Ansell-2
On 11 March 2013 08:07, Jerry George <[hidden email]> wrote:

> Hi,
>
> We are trying to benchmark a set of pre-processing steps
> (parsing,validation,etc) before actually committing to Sesame SAIL.
>
> We are using an RDF client (with Sesame 2.6.9) backend. We have over 40
> simultaneous requests to parse/validate using Java Concurrency API. Later,
> using the multiple spawned threads, we add the RDF files to SAIL (via Sesame
> REST API). However, we seems to encounter the following,
>
> [Fatal Error] www.xyz.com:1:1: Content is not allowed in prolog.

This error indicates that the RDF/XML parser was sent invalid XML. The
RDF/XML parser in Sesame is not threadsafe, so a new instance must be
created for each invocation. There may be a bug in the sail/rest api
that causes a parser instance to be reused between threads.

> The code seems to work perfectly, if I do just fewer requests or if the code
> is run slower, with thread sleeps. Please note that I do have some pretty
> detailed and custom validation/processing steps, using RDFHandlerBase before
> posting the file to repository. Would the problem be related to open issue
> SES-1066.

Do you see the error message before you submit the file to the REST
API? Are you reusing a parser instance anywhere across threads in your
client?

> Besides, could anyone give me some tips as to usage of Sesame in environment
> were we expect upto 100-200 requests/sec. For SPARQL requests, Jeen had
> proposed an answer here,
> http://answers.semanticweb.com/questions/17914/openrdf-sesame-multiple-and-parallel-connections-to-httprepository
> Not sure if it is still relevant for Sesame 2.6.9.
>
> Any help would be much appreciated.
>
> Thanks.
>
> ------------------------------------------------------------------------------
> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
> endpoint security space. For insight on selecting the right partner to
> tackle endpoint security challenges, access the full report.
> http://p.sf.net/sfu/symantec-dev2dev
> _______________________________________________
> Sesame-general mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/sesame-general
>

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fatal Error - Content is not allowed in prolog - Concurrent parsing of RDF

Jerry George
Hi Peter,
Thank you for insight on the parsing. It is before I submit the file to Sesame, that I receive the error. To be precise, during the parsing and validation phase. Besides that, I am also receiving,

RDFHandlerException: RDFParseException: The encoding declaration is required in the text declaration.

Thanks, but is there any way around this?

Regards,
Jerry

On Sun, Mar 10, 2013 at 7:12 PM, Peter Ansell <[hidden email]> wrote:
On 11 March 2013 08:07, Jerry George <[hidden email]> wrote:
> Hi,
>
> We are trying to benchmark a set of pre-processing steps
> (parsing,validation,etc) before actually committing to Sesame SAIL.
>
> We are using an RDF client (with Sesame 2.6.9) backend. We have over 40
> simultaneous requests to parse/validate using Java Concurrency API. Later,
> using the multiple spawned threads, we add the RDF files to SAIL (via Sesame
> REST API). However, we seems to encounter the following,
>
> [Fatal Error] www.xyz.com:1:1: Content is not allowed in prolog.

This error indicates that the RDF/XML parser was sent invalid XML. The
RDF/XML parser in Sesame is not threadsafe, so a new instance must be
created for each invocation. There may be a bug in the sail/rest api
that causes a parser instance to be reused between threads.

> The code seems to work perfectly, if I do just fewer requests or if the code
> is run slower, with thread sleeps. Please note that I do have some pretty
> detailed and custom validation/processing steps, using RDFHandlerBase before
> posting the file to repository. Would the problem be related to open issue
> SES-1066.

Do you see the error message before you submit the file to the REST
API? Are you reusing a parser instance anywhere across threads in your
client?

> Besides, could anyone give me some tips as to usage of Sesame in environment
> were we expect upto 100-200 requests/sec. For SPARQL requests, Jeen had
> proposed an answer here,
> http://answers.semanticweb.com/questions/17914/openrdf-sesame-multiple-and-parallel-connections-to-httprepository
> Not sure if it is still relevant for Sesame 2.6.9.
>
> Any help would be much appreciated.
>
> Thanks.
>
> ------------------------------------------------------------------------------
> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
> endpoint security space. For insight on selecting the right partner to
> tackle endpoint security challenges, access the full report.
> http://p.sf.net/sfu/symantec-dev2dev
> _______________________________________________
> Sesame-general mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/sesame-general
>

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fatal Error - Content is not allowed in prolog - Concurrent parsing of RDF

Peter Ansell-2
On 13 March 2013 10:50, Jerry George <[hidden email]> wrote:

> Hi Peter,
> Thank you for insight on the parsing. It is before I submit the file to
> Sesame, that I receive the error. To be precise, during the parsing and
> validation phase. Besides that, I am also receiving,
>
> RDFHandlerException: RDFParseException: The encoding declaration is required
> in the text declaration.
>
> Thanks, but is there any way around this?
>
> Regards,
> Jerry
>

Hi Jerry,

Both of the error messages you quoted sound like they come from the SAX parser.

Not sure how to further diagnose those sorts of issues without test
files that reliably (or semi-reliably) cause the issue. Could you send
me samples of the files that are causing the issues and the lines of
code you are using for the parser creation and running the parser?

Thanks,

Peter

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fatal Error - Content is not allowed in prolog - Concurrent parsing of RDF

Jerry George
Hi Peter, 

Please find the sample code,

ValidatorHandler myCustomRDFValidator = new ValidatorHandler();
RDFParser rdfParser = Rio.createParser(RDFFormat.RDFXML);
rdfParser.setRDFHandler(myCustomRDFValidator);
rdfParser.parse(iStream, "http://www.xyz.com");

public class ValidatorHandler extends RDFHandlerBase {
@Override
public void handleStatement(Statement currentStmt) {}
@Override
public void handleNamespace(final String prefix, String uri) {}
}

Inside the validator handler, we have some custom logic to verify namespaces, etc... 

RDF XML is standard file (validated with apache any23), but a little heavy, around 420 lines each. I am receiving it as rdf/xml string via a REST interface (Jersey) hosted on Tomcat server.

Just to re-iterate, it works without problems if we are not running several threads at once.

Thanks


On Tue, Mar 12, 2013 at 8:59 PM, Peter Ansell <[hidden email]> wrote:
On 13 March 2013 10:50, Jerry George <[hidden email]> wrote:
> Hi Peter,
> Thank you for insight on the parsing. It is before I submit the file to
> Sesame, that I receive the error. To be precise, during the parsing and
> validation phase. Besides that, I am also receiving,
>
> RDFHandlerException: RDFParseException: The encoding declaration is required
> in the text declaration.
>
> Thanks, but is there any way around this?
>
> Regards,
> Jerry
>

Hi Jerry,

Both of the error messages you quoted sound like they come from the SAX parser.

Not sure how to further diagnose those sorts of issues without test
files that reliably (or semi-reliably) cause the issue. Could you send
me samples of the files that are causing the issues and the lines of
code you are using for the parser creation and running the parser?

Thanks,

Peter

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fatal Error - Content is not allowed in prolog - Concurrent parsing of RDF

Peter Ansell-2
On 15 March 2013 11:00, Jerry George <[hidden email]> wrote:

> Hi Peter,
>
> Please find the sample code,
>
> ValidatorHandler myCustomRDFValidator = new ValidatorHandler();
> RDFParser rdfParser = Rio.createParser(RDFFormat.RDFXML);
> rdfParser.setRDFHandler(myCustomRDFValidator);
> rdfParser.parse(iStream, "http://www.xyz.com");
>
> public class ValidatorHandler extends RDFHandlerBase {
> @Override
> public void handleStatement(Statement currentStmt) {}
> @Override
> public void handleNamespace(final String prefix, String uri) {}
> }
>
> Inside the validator handler, we have some custom logic to verify
> namespaces, etc...

That all looks fairly standard.

> RDF XML is standard file (validated with apache any23), but a little heavy,
> around 420 lines each. I am receiving it as rdf/xml string via a REST
> interface (Jersey) hosted on Tomcat server.

Any23 uses the Sesame Rio RDFXML parser.

> Just to re-iterate, it works without problems if we are not running several
> threads at once.

So if you make multiple requests to the server *and there is a new
rdfParser instance created for each thread* the Rio parser starts
throwing the SAX exceptions?

The next step will be for you to manually verify that the contents of
each of the multiple InputStreams are not corrupted by Jersey/Tomcat
on the way in by dumping each of the input streams manually to a log
file before running the parser over them (will likely need to create a
fresh input stream/reader after this process or Sesame will complain
that the input stream is empty). If you are able to manually checking
that the files are all intact including the cases where the Sesame
RDFXMLParser throws a parse exception then we will need to figure out
what to do next.

There are various options after that step including changing which SAX
XMLReader that is used if it turns out that there are threadsafety
issues with the one you are using.

Peter

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Sesame-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/sesame-general
S.L
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fatal Error - Content is not allowed in prolog - Concurrent parsing of RDF

S.L
This post has NOT been accepted by the mailing list yet.
In reply to this post by Jerry George

Peter,

I am facing the same exact issue with this error showing up when used in a concurrent multi-threaded fashion, can you please let me know how you overcame this issue?

Thanks.
Loading...