Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Codidact Meta!

Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.

Comments on How should a Codidact public API work?

Post

How should a Codidact public API work?

+9
−0

I'm planning to add a public API to Codidact so people can make applications that read Codidact data automatically.

Features and purposes

I have my own purposes in mind for this[1], but there are many other potential uses. Before I start building this, I'd like to hear what features you would like the API to have, and what purposes you might use it for.

General thoughts

Even if you are not planning anything yet, I would still like to hear your thoughts on what might be useful for the future. I'm hoping this discussion will help avoid designing an API that is too specific, or that needs breaking changes later.

Whatever your thoughts, please share them. This could be any of the following or anything else you want to bring up:

  • Ideas for projects you might make one day
  • Thoughts on the structure of an API
  • Things to bear in mind / avoid (I'm new to this, so things that are obvious to you may not be obvious to me)
  • Preferred data format(s)
  • What to include from the start and what can wait until later
  • Limits, such as how many results to permit in a single response (like pagination) and how to handle requesting more

  1. I'll add my own personal API requirements in an answer. Please comment on the answer if there's a better approach I should consider. ↩︎

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

4 comment threads

Can be it used with "Approach zero" math-aware search engine? (1 comment)
Licencing (19 comments)
What kind of API? (3 comments)
Reference (1 comment)
Licencing
mr Tsjolder‭ wrote over 1 year ago

Does access through an API comply with all possible licenses for questions and answers? If yes, shouldn't the licenses be included when this content is programmatically accessed, so that end-users know what they are allowed (not) to do with the retrieved content?

trichoplax‭ wrote over 1 year ago

Good point. Thanks for raising this.

I don't know the best approach for this, but the options that come to mind are:

  1. Include the licence with all licensed content (so questions, answers, articles, but not comments).
  2. Only allow programmatically accessing content that does not require a licence (comments but not licensed content like questions, answers, articles; only metadata about licensed content and not the content itself).
  3. Request by licence: every request for licensed content must specify exactly 1 licence, so all the results are implicitly under that licence (or the licence can be explicitly returned but only once per response rather than duplicated for every post in the response).

I'm interested to hear whether there are any other options, and any problems with the options I've suggested.

Monica Cellio‭ wrote over 1 year ago

I think if you're retrieving content we should include the license in the returned values, and if you're retrieving content metadata like scores or authors or tags, that's not licensed content so no need. Edits are an interesting case as they aren't explicitly licensed but carry the post license.

trichoplax‭ wrote over 1 year ago

Since the licence isn't editable I guess it applies to all edits regardless of who does the editing.

trichoplax‭ wrote over 1 year ago

So I guess we need to include the licence with any response that contains all or part of the content of any revision.

I'm not sure if I see a need not to include the license with all content; this can be as simple (and small) as a single byte, which is an ID of a predefined license, or an index into a table of licenses. So, this data is very small, and can be transmitted regardless of how the content is requested.

Is there any reason why one would prefer option 2?

Also, all edits are tied to a specific post, so no need to tie a license to them; they are already tied to a post, so if the license is needed from the point of an edit, that's just one extra step of indirection.

Andreas is speechless at the number of bird deaths‭ wrote over 1 year ago · edited over 1 year ago

Simple idea:

class Post {
    long id;
    byte license;
    List<Comment> comments;
    List<Revision> revisions;
}

class Revision {
    long authorID;
    String text;
    List<String> tags;
    Date date;
    int index; // in case certain revisions are omitted
}

class Comment {
    long authorID;
    String text;
    Date date;
}

class Question: Post {
    List<Post> answers;
}

Question getQuestion(long id, dict options);

(I likely have no idea what I'm talking about here).

options = { "answer-licenses": [1, 2] }
question = getQuestion(options);
trichoplax‭ wrote over 1 year ago · edited over 1 year ago

Interesting thoughts. I get 2 questions from this:

  1. In those places where a licence is required, how much of it is required?
    • a licence id?
    • a licence short name such CC BY-SA 4.0?
    • a URL for the licence text available online?
    • the full licence text?
  2. Where is a licence required?
    • every time any licenced content is included in a response?
    • only in responses that do not depend on a previous response that already provided the licence?

For example, if the revision history can only be requested using the list of revision ids, and those are only available from the post content response, which already included the licence, do we still need to include the licence again in the revision history response? That is, is the licence required per interaction (which may extend to several responses), or for every response that contains licensed content?

trichoplax‭ wrote over 1 year ago

As for preferring option 2 of 3 from my first comment: No, I can't imagine anyone preferring that. I only included it as an approach that definitely doesn't cause any legal worries. If we can't agree on what is required for licensing, option 2 is still available.

Andreas is speechless at the number of bird deaths‭ wrote over 1 year ago · edited over 1 year ago

I don’t think you need to address question 1 upfront, if you go with the idea of using an ID or index. Doesn’t Codidact only permit a predefined set of licenses anyway? So each time Codidact accepts a new license, assign a new ID to it, or add it to the list of licenses, and use its index.

If you ever need more information about a license, other than its ID/index, make a request to Codidact’s server (or bundle it in a library), and use that ID/index to retrieve all the attributes for that license.

For instance:

// dictionary if using IDs; an array if using indices
licenses = {
    436312: {
        "name": "CC BY-SA",
        "version": "4.0",
        "url": "https://....."
        "text": "...",
    }
    592343: {
        // etc
    }
}

If the API requires downloading this upon requests, you can for instance specify which fields to include; some are included by default; some are excluded by default.

licenses = get_licenses({ "url": true, "text": true })
  1. Where is a license required?
  • every time any licensed content is included in a response?
    • No; document that certain content is under a license (that said, isn't everything under a license (does Codidact not have a license for its content?). Why would you want to process license data every single time you make these requests? All you need to ensure, is that the license is available when required; meaning, that the API user can always get hold of the license for the content they request. If that requires a new request to be made, that's fine.

I think the question of when to bundle license information, will depend too much on the overall design of the API. I suggest just postponing this concern until later.

trichoplax‭ wrote over 1 year ago

I like this approach of only being required to make the licence available on request, rather than having to include it everywhere. I'd want to know what the legal situation is, and to what extent it varies by country, but my rough understanding of international copyright law is that if a licence is not included, the content is automatically copyright. That is, including the licence gives the recipient additional rights, for their benefit. In the absence of the licence, they must regard the content as copyright protected.

If we can confirm that this is the case, then the licence and its various licence metadata can just be another API endpoint, as you describe.

trichoplax‭ wrote over 1 year ago

Possibly relevant aside:

My impression of the problem people had with SE retrospectively changing all content to CC BY-SA 4.0 is that they objected to SE adding a new licence that they did not have the legal authority to decide on.

I see it as very important that we never accidentally provide an incorrect licence for a post. The user provided a licence, and the user still owns the copyright. We can never provide the content under any other licence. (Technically there could be a licence that is legally compatible with all of the rights and constraints provided by the user's licence, but I wouldn't want to make such judgement calls.)

With this as background, the problem would be with adding an incorrect licence, not with omitting the correct licence from any particular response. The correct licence must be available, but it seems an API endpoint would cover that.

mr Tsjolder‭ wrote over 1 year ago

trichoplax‭ I am afraid that these licences are typically stricter than copyright law. E.g. from the CC-by-SA license (4a):

[...] You must include a copy of, or the Uniform Resource Identifier for, this License with every copy or phonorecord of the Work You distribute [or] publicly display [...].

@mrTsjolder My understanding is that this can be satisfied by first downloading all the licenses with one request, then let subsequent requests just refer to one of the licenses already downloaded. Is that wrong to assume?

mr Tsjolder‭ wrote over 1 year ago

Andreas is speechless at the number of bird deaths‭ I am afraid I am too little of a legal person to judge that. The main point that I wanted to bring across is that it probably will not suffice to only provide the licenses on request as suggested by trichoplax‭.

trichoplax‭ wrote over 1 year ago

It's good to hear that a URL is sufficient rather than the full licence text.

Even if it turns out that we need the URL in every response that contains licensed content, that doesn't seem too bad as it will be small in proportion to the length of most posts.

Monica Cellio‭ wrote over 1 year ago

I'm not a legal expert, but "you must include the license with content" seems to be part of most licenses, so I'd prefer we do that. A URL is fine and sounds lightweight. Licenses can be added, so IDs shouldn't be baked into the code -- the URL is a unique identifier already, so we can use that.

Are APIs usually stateful or stateless? The comment about sending the license with the post and thus not needing it with history entries surprised me because my mental model was stateless, but I don't have an actual good basis or informed opinion here.

trichoplax‭ wrote over 1 year ago

In response to the stateful/stateless query:

My earlier comment was a contrived example imagining that we had a history API that could not be accessed without info only found in the response to the post API. I don't think this is realistic, but that's how I was imagining we could know that the request for history had already been preceded by a request for the post. It would not be a stateful API, there would just be no way to request history until you had found out the required ids from the request for the post.

I'm not suggesting we use such a model. It was deliberately restrictive for the purpose of finding out if there could be a way to avoid needing the licence to be included everywhere. Now that I understand that we can just include the licence URL I would rather just include it everywhere, and not make the API needlessly convoluted.

It was the thought of having to include the (often long) full text of each licence that had me looking for artificial ways to reduce how often.

Skipping 1 deleted comment.