Streaming extensions for JsonPath#93
Conversation
|
Could you please provide a unit test that demonstrates usage. |
|
A unit test has been added. It doesn't cover all use cases however. |
|
I did this a week ago. Could you please provide a unit test that demonstrates usage.— |
|
I've seen it. Haven't had time to check it out in detail. |
|
Oh this would be lovely with streaming support. |
|
Just waiting for it to be accepted... Oh this would be lovely with streaming support.— |
There was a problem hiding this comment.
I want Predicate to be a functional interface. This will not work well with java 8.
There was a problem hiding this comment.
What do you mean by functional interface?
Hunter
On Monday, July 27, 2015 12:49 AM, kallestenflo <notifications@github.com> wrote:
In json-path/src/main/java/com/jayway/jsonpath/Predicate.java:> @@ -23,6 +24,8 @@
boolean apply(PredicateContext ctx);
- boolean check(TokenStack stack, int idx);
I want Predicate to be a functional interface. This will not work well with java 8.—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
I think he means that the interface should only have 1 method, which is what allows the use with lambdas in the Java 8 onwards. eg so you cannot add a new check method.
There was a problem hiding this comment.
So it seems there are a couple of options:
- no streaming -- but there are JsonPath users who do want streaming2) split the predicate interface into two interfaces3) add a small class that takes a Predicate and calls check() ie this would be a "proxy"/"facade" that is functional and the Predicate interface remains as the logical interface to uniary and binary operators4) give up on lambdas
Not sure I understand why you care about lambdas so much in a non-streaming context. Lambdas are just syntactic sugar unless you are doing something that needs streaming ie the DOM is too big for memory. In the non-streaming case I doubt the difference in performance between a lambda and an anonymous inner class would matter. I guess it all about Scala probably though huh...
Its probably bad design to tie the Predicate interface to both the way to call predicates on a "parsing context" and as the way to extend the logical operators JsonPath supports (AND, NOT, OR, etc).
Either way, I would feel more comfortable if someone else made this decision for several reasons including the fact that working on this project is a low priority task for me. My company doesn't make parsers, we make streaming DB engines. We just needed a streaming XPath like thing for JSON and this project seems to be the closest to that today. If something else was available, we would use that instead and this pull request wouldn't exist.
Hunter
PS On a side note, its extra asks like this that make me less likely to donate code to open projects in the future. I'm trying to do the right thing and give back to the project and all I get for that decision is extra work to do. There was no way I could know you wanted to functional interfaces in JsonPath when I was writing this bit of code. Adding comments to that effect in JsonPath would be very helpful. I apologize if you did that and I overlooked it.PSS I've been working with open source projects for 20 years and donated 100,000s LOC to various projects so I feel like I've already paid that debt so to speak for all the libraries I've included in my commercial work over the years.
On Tuesday, July 28, 2015 12:58 AM, Claus Ibsen <notifications@github.com> wrote:
In json-path/src/main/java/com/jayway/jsonpath/Predicate.java:> @@ -23,6 +24,8 @@
boolean apply(PredicateContext ctx);
- boolean check(TokenStack stack, int idx);
I think he means that the interface should only have 1 method, which is what allows the use with lambdas in the Java 8 onwards. eg so you cannot add a new check method.—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
FYI, this is a pet project of mine that I work with on my free time, there is no one else. I appreciate all contributions but I also need to feel comfortable supporting them in the future. I have have made some breaking changes to the API over the years and streaming support is defiantly a feature that I would like.
Sorry about the slow pace and lack of comments, I hope to have more time for this project soon.
There was a problem hiding this comment.
Appreciated, I manage my own open source project (vjdbc) which is thankfully mature at this point, my day job as an architect (we make a streaming DB) and a side project (www.biovj.com) which is a traditional desktop application for video and projection mapping. Plus, I'm actively contributing to another 3 opensource projects or so. So I understand how finding time for JsonPath is hard. However, JsonPath isn't my project and if the streaming extension requires an architectural change (ie changing/specifying what exactly the Predicate interface is) then I need guidance from you about how to do that change. So I need you to answer my question. Should we split the Predicate interface into two things, one that represents logical operators (AND, OR, etc) and one that represents how to evaluate a predicate? Or should you should be creating facades to support Java8 lambdas instead of overloading the Predicate interface. For example:
public class NonStreamingPredicateEval { private Predicate pred;
public NonStreamingPredicateEval(Predicate pred) { this.pred = pred; }
public boolean apply(PredicateContext ctx) { return pred.apply(ctx); }}
Either way, its not enough to simply point out the problem. You have to decide upon a solution (or at least a direction) even if you don't personally implement the solution. What I'm not going to do is implement a solution myself just to have you reject it again later on. Instead I'm going to wait for you to tell me what you want.
Hunter
PS If lambdas are such an important feature, you should have something in the docs or javadocs that spells out which interfaces are meant to support lambdas so devs don't add methods to those interfaces. Apologies if you did this already and I just didn't see it.
On Monday, August 3, 2015 12:02 AM, kallestenflo <notifications@github.com> wrote:
In json-path/src/main/java/com/jayway/jsonpath/Predicate.java:> @@ -23,6 +24,8 @@
boolean apply(PredicateContext ctx);
- boolean check(TokenStack stack, int idx);
FYI, this is a pet project of mine that I work with on my free time, there is no one else. I appreciate all contributions but I also need to feel comfortable supporting them in the future. I have have made some breaking changes to the API over the years and streaming support is defiantly a feature that I would like. Sorry about the slow pace and lack of comments, I hope to have more time for this project soon. —
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
So funny to read this after all those years, because his project 'vjdbc' has been forked here on GitHub... :D
https://github.com/daitangio/vjdbc
|
When is this going to be merged in? |
|
I'm waiting for an executive decision on what to do with the Predicate interface and how to support lambdas. There are two alternatives I can think of:1) We split the Predicate interface into two things, one that represents logical operators (AND, OR, etc) and one that represents how to evaluate a predicate? 2) Creating facades to support Java8 lambdas instead of overloading the Predicate interface. For example:public class NonStreamingPredicateEval { private Predicate pred; public NonStreamingPredicateEval(Predicate pred) { this.pred = pred; } public boolean apply(PredicateContext ctx) { return pred.apply(ctx); }} When is this going to be merged in?— |
|
Why not just make a base Predicate interface and extend it to support the DBK
|
|
Ok, kallestenflo I made the change you requested in the way that Dave suggested. You may want to look at what operators(criteria) handle streaming correctly. There might be cases where a criteria matches more than once in cases where it doesn't make any sense (ie the SIZE criteria). Many of the comparison criteria won't work in the streaming case where "history" (ie values from previous matches) isn't being saved. It might be better to just not compile some criteria for streaming cases. |
|
Is there a way to make this API pull back the tokens it's matched rather than just the path? |
|
// how to get tokens from the streaming APITokenStackElement elem = stack.getStack().peek();switch (elem.getType()) {case STRING_TOKEN: String val = ((StringToken)elem).value; break;case FLOAT_TOKEN: float val = ((FloatToken)elem).value; break;case INTEGER_TOKEN: int val = ((IntToken)elem).value; break;} Is there a way to make this API pull back the tokens it's matched rather than just the path?— |
…y basic expressions supported.
|
Yep we've been getting OOM issues with this when used in conjunction with Apache Camel streaming and hadn't realised it was loading the entire InputStream under the covers. I've forked this and made some changes @ https://github.com/NetNow/JsonPath which is rough around the edges but we needed a quick fix to split large payloads. The read method now generates a bunch of JSONObjects for a registered path which in turn get passed to the resultFound callback. A basic unit test is in json-path -> JacksonTest_Split. |
|
Can you provide me a diff of your version and my pull request so I can review the changes and move them back into my pull request? Yep we've been getting OOM issues with this when used in conjunction with Apache Camel streaming and hadn't realised it was loading the entire InputStream under the covers. I've forked this and made some changes @ https://github.com/NetNow/JsonPath which is rough around the edges but we needed a quick fix to split large payloads. The read method now generates a bunch of JSONObjects for a registered path which in turn get passed to the resultFound callback. A basic unit test is in json-path -> JacksonTest_Split.— |
|
@hunterpayne: I've created a pull request on your repo: hunterpayne#1 |
|
That's nice. I'm really glad you are adding to this code. Do I pull up your changes into my pull request? Also, I want my pull request into the mainline as sooner rather than later. Why is that taking so long? @hunterpayne: I've created a pull request on your repo: hunterpayne#1— |
|
OK, I have not had a lot of time but here are some thoughts... The JsonPath APIThe streaming API should be available via the JsonPath class. I think it should be more similar to the current API. Access to classes located in the internal packages is not allowed. Is this the the intended usage in it's simplest form? Path path = PathCompiler.compile("store.book[*].title");
InputStream stream = getClass().getClassLoader().getResourceAsStream("goessner.json");
final TokenStack stack = new TokenStack(JACKSON_CONFIGURATION);
stack.registerPath(path);
JsonFactory factory = new JsonFactory();
stack.read(factory.createParser(stream), new EvaluationCallback() {
@Override
public void resultFound(Path path) {
System.out.println(path);
TokenStackElement elem = stack.getStack().peek();
ObjectToken token = (ObjectToken) elem;
String stringToken = ((StringToken) token.getValue()).value;
System.out.println(stringToken);
}
@Override
public void resultFoundExit(Path path) {
}
});What about this? InputStream inputStream = someInputStream();
JsonPath.using(JACKSON_CONFIGURATION).stream(inputStream, "store.book[*].title", StreamingCallback(){
@Override
public void onResult(String path, StreamingResult result) {
Object o = result.value();
String s = result.value(String.class);
}
});
// or using defaults
JsonPath.stream(inputStream, "store.book[*].title", StreamingCallback(){
@Override
public void onResult(String path, StreamingResult result) {
if("$['store']['book'][0]['title']".equals(path)) {
Object o = result.value();
String s = result.value(String.class);
}
}
});SreamingJsonProvider SPIWouldn't it be valuable to have a streaming SPI? Error handlingIf a path is not acceptable for streaming a InvalidPathException (or some other JsonPathException subclass) should be thrown not NullPointerExceptions. |
|
When doing streaming, you setup several paths beforehand and an event handler (or possibly an event handler for each path) and then you start parsing a stream which might take a very long time. The entire point of a streaming API is that only one pass over the document is ever done. There is a clear division between setup (registering paths and event handlers) and inner loop (reading InputStream). You can't confuse those two concepts in streaming and expect it to work for most use cases. OK, I have not had a lot of time but here are some thoughts... InputStream stream = getClass().getClassLoader().getResourceAsStream("goessner.json"); final TokenStack stack = new TokenStack(JACKSON_CONFIGURATION); JsonFactory factory = new JsonFactory(); stack.read(factory.createParser(stream), new EvaluationCallback() { });I was more thinking something like thisInputStream inputStream = someInputStream(); JsonPath.using(JACKSON_CONFIGURATION).stream(inputStream, "store.book[*].title", StreamingCallback(){ // or using defaults assert (jacksonJsonProvider instanceof StreamingJsonProvider); Configuration JACKSON_CONFIGURATION = Configuration.builder() Error handling |
|
Yes, it makes sense to restrict the Paths that are generated to Paths that streaming can implement correctly. But that implies that either a special path compiler is used or the path compiler knows if its in the streaming use case or not. final TokenStack stack = new TokenStack(conf);
final Path path1 = stack.registerPath("$.someExp.someOtherExp");
final Path path2 = stack.registerPath("$.someExp.someOtherExp2");
stack.readParser(input, new EvaluationCallback() {
@Override
public void resultFound(Path path) {
System.out.println(path);
if (path == path1) {
TokenStackElement elem = stack.getStack().peek();
String stringToken = ((StringToken)elem).value;
System.out.println(stringToken);
} else if (path == path2) {
// do something else
TokenStackElement elem = stack.getStack().peek();
String stringToken = ((StringToken) elem).value;
System.out.println(stringToken);
}
}
@Override
public void resultFoundExit(Path path) {
}
});And yes it also makes sense to pin the streaming API to Jackson as that API is currently hard-coded in TokenStack. It would be nice to have a generic token style API that the streaming code uses instead of using Jackson directly. When doing streaming, you setup several paths beforehand and an event handler (or possibly an event handler for each path) and then you start parsing a stream which might take a very long time. The entire point of a streaming API is that only one pass over the document is ever done. There is a clear division between setup (registering paths and event handlers) and inner loop (reading InputStream). You can't confuse those two concepts in streaming and expect it to work for most use cases. OK, I have not had a lot of time but here are some thoughts... InputStream stream = getClass().getClassLoader().getResourceAsStream("goessner.json"); final TokenStack stack = new TokenStack(JACKSON_CONFIGURATION); JsonFactory factory = new JsonFactory(); stack.read(factory.createParser(stream), new EvaluationCallback() { });I was more thinking something like thisInputStream inputStream = someInputStream(); JsonPath.using(JACKSON_CONFIGURATION).stream(inputStream, "store.book[*].title", StreamingCallback(){ // or using defaults assert (jacksonJsonProvider instanceof StreamingJsonProvider); Configuration JACKSON_CONFIGURATION = Configuration.builder() Error handling |
Split a large JSON payload using streaming API, return results.
|
This feature would be really interesting. Btw I agree with @kallestenflo comment #93 (comment) and to go further it could be even something like these: JsonPath.stream(json)
.match("$.store.book[*].author" , t -> {})
.match(path("$..book[?(@.isbn)]").or(path("$..book[?(@.title)]")) , t -> {})
.consume();The callback can allow nested streaming: JsonPath.stream(json)
.match("$.store.book[*].author" , t -> {
Author author = new Author();
JsonPath.stream(t).match("name" , t -> { author.setName(...);});
})
.match("$..book[?(@.isbn)]" , t -> {})
.consume();I have followed the same approach for a streaming XML API (https://github.com/nithril/stax-css-matcher) |
|
Any updates on this? |
|
I've done everything asked of me in the PR. If can be merged anytime as far as I know. I've changed jobs and don't actively work on JsonPath anymore so someone else will have to take this on. Any updates on this?— |
|
This would be really nice to have. Any updates on the merge? |
|
Thought I submitted a PR years ago. I didn't even realize it hadn't been merged. I don't have privs to do a merge on it I don't think.
Hunter
On Tuesday, December 27, 2016 12:52 PM, Austin Heyne <notifications@github.com> wrote:
This would be really nice to have. Any updates on the merge?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
I gave this a shot but the conflicts are not trivial. Anyone more familiar with the codebase can give a helping hand? |
|
Any update on this ? This would be a really nice feature and improvment for large JSON data. |
|
Hi, I am using currently JayWay in conjunction with the Jackson API. |
|
You could have look at my project https://github.com/nithril/json-stream. It is stable and used in production. Should released on maven central if it make sense. |
|
I'm did the json-path work for a company I no longer work for. However, if you want to continue this work, that would be great. I will give you whatever privs you need to continue the project.
Hunter
On Wednesday, June 21, 2017 12:57 PM, Nicolas Labrot <notifications@github.com> wrote:
You could have look at my project https://github.com/nithril/json-stream. It is stable and used in production.Should released on maven central if it make sense.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Adding a set of methods for high-performance streaming use. For each Path implementation, we've added a method with the following signature:
boolean checkForMatch(TokenStack stack);
Some extra classes that use the Jackson streaming API are also added to cut down on unnecessary object creation and allow for multiple paths to be evaluated at once and notified via a callback. These classes are also for handling large JSON objects which can't be parsed into one tree.
Here is the callback API:
public interface EvaluationCallback {
}
and TokenStack has the following method:
void registerPath(Path path);
To use this streaming API: