Welcome to Codidact Meta!

Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.

Comments on How should a Codidact public API work?

Post

How should a Codidact public API work?

+11

−0

I'm planning to add a public API to Codidact so people can make applications that read Codidact data automatically.

Features and purposes

I have my own purposes in mind for this^[1], but there are many other potential uses. Before I start building this, I'd like to hear what features you would like the API to have, and what purposes you might use it for.

General thoughts

Even if you are not planning anything yet, I would still like to hear your thoughts on what might be useful for the future. I'm hoping this discussion will help avoid designing an API that is too specific, or that needs breaking changes later.

Whatever your thoughts, please share them. This could be any of the following or anything else you want to bring up:

Ideas for projects you might make one day
Thoughts on the structure of an API
Things to bear in mind / avoid (I'm new to this, so things that are obvious to you may not be obvious to me)
Preferred data format(s)
What to include from the start and what can wait until later
Limits, such as how many results to permit in a single response (like pagination) and how to handle requesting more

I'll add my own personal API requirements in an answer. Please comment on the answer if there's a better approach I should consider. ↩︎

discussion api

posted almost 2 years ago

CC BY-SA 4.0

2y ago

trichoplax‭

245 89 829 355

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

−0

I'd like to host contests that read code directly from answers Here's what I'd like to see in an API to allow me to h …

2y ago

−0

I'd like to make charts of user and community stats These could involve a large amount of data as the site grows, so …

1y ago

−0

Rewriting the Code Golf leaderboard script Code Golf Codidact has an automated leaderboard to show all the answers at …

1mo ago

−1

Another use case: creating a data dump of the website. Any format is fine, but that'd be good to include as much informa …

2mo ago

4 comment threads

Can be it used with "Approach zero" math-aware search engine? (1 comment)

Licencing (19 comments)

What kind of API? (3 comments)

Reference (1 comment)

trichoplax‭ wrote almost 2 years ago

copy link

Good point. Thanks for raising this.

I don't know the best approach for this, but the options that come to mind are:

Include the licence with all licensed content (so questions, answers, articles, but not comments).
Only allow programmatically accessing content that does not require a licence (comments but not licensed content like questions, answers, articles; only metadata about licensed content and not the content itself).
Request by licence: every request for licensed content must specify exactly 1 licence, so all the results are implicitly under that licence (or the licence can be explicitly returned but only once per response rather than duplicated for every post in the response).

I'm interested to hear whether there are any other options, and any problems with the options I've suggested.

Andreas demands justice for humanity‭ wrote almost 2 years ago

copy link

I'm not sure if I see a need not to include the license with all content; this can be as simple (and small) as a single byte, which is an ID of a predefined license, or an index into a table of licenses. So, this data is very small, and can be transmitted regardless of how the content is requested.

Is there any reason why one would prefer option 2?

Also, all edits are tied to a specific post, so no need to tie a license to them; they are already tied to a post, so if the license is needed from the point of an edit, that's just one extra step of indirection.

Andreas demands justice for humanity‭ wrote almost 2 years ago · edited almost 2 years ago

copy link

Simple idea:

class Post {
    long id;
    byte license;
    List<Comment> comments;
    List<Revision> revisions;
}

class Revision {
    long authorID;
    String text;
    List<String> tags;
    Date date;
    int index; // in case certain revisions are omitted
}

class Comment {
    long authorID;
    String text;
    Date date;
}

class Question: Post {
    List<Post> answers;
}

Question getQuestion(long id, dict options);

(I likely have no idea what I'm talking about here).

options = { "answer-licenses": [1, 2] }
question = getQuestion(options);

trichoplax‭ wrote almost 2 years ago · edited almost 2 years ago

copy link

Interesting thoughts. I get 2 questions from this:

In those places where a licence is required, how much of it is required?
- a licence id?
- a licence short name such CC BY-SA 4.0?
- a URL for the licence text available online?
- the full licence text?
Where is a licence required?
- every time any licenced content is included in a response?
- only in responses that do not depend on a previous response that already provided the licence?

For example, if the revision history can only be requested using the list of revision ids, and those are only available from the post content response, which already included the licence, do we still need to include the licence again in the revision history response? That is, is the licence required per interaction (which may extend to several responses), or for every response that contains licensed content?

Andreas demands justice for humanity‭ wrote almost 2 years ago · edited almost 2 years ago

copy link

I don’t think you need to address question 1 upfront, if you go with the idea of using an ID or index. Doesn’t Codidact only permit a predefined set of licenses anyway? So each time Codidact accepts a new license, assign a new ID to it, or add it to the list of licenses, and use its index.

If you ever need more information about a license, other than its ID/index, make a request to Codidact’s server (or bundle it in a library), and use that ID/index to retrieve all the attributes for that license.

For instance:

// dictionary if using IDs; an array if using indices
licenses = {
    436312: {
        "name": "CC BY-SA",
        "version": "4.0",
        "url": "https://....."
        "text": "...",
    }
    592343: {
        // etc
    }
}

If the API requires downloading this upon requests, you can for instance specify which fields to include; some are included by default; some are excluded by default.

licenses = get_licenses({ "url": true, "text": true })

Andreas demands justice for humanity‭ wrote almost 2 years ago

copy link

Where is a license required?

every time any licensed content is included in a response?
- No; document that certain content is under a license (that said, isn't everything under a license (does Codidact not have a license for its content?). Why would you want to process license data every single time you make these requests? All you need to ensure, is that the license is available when required; meaning, that the API user can always get hold of the license for the content they request. If that requires a new request to be made, that's fine.

I think the question of when to bundle license information, will depend too much on the overall design of the API. I suggest just postponing this concern until later.

trichoplax‭ wrote almost 2 years ago

copy link

I like this approach of only being required to make the licence available on request, rather than having to include it everywhere. I'd want to know what the legal situation is, and to what extent it varies by country, but my rough understanding of international copyright law is that if a licence is not included, the content is automatically copyright. That is, including the licence gives the recipient additional rights, for their benefit. In the absence of the licence, they must regard the content as copyright protected.

If we can confirm that this is the case, then the licence and its various licence metadata can just be another API endpoint, as you describe.

trichoplax‭ wrote almost 2 years ago

copy link

Possibly relevant aside:

My impression of the problem people had with SE retrospectively changing all content to CC BY-SA 4.0 is that they objected to SE adding a new licence that they did not have the legal authority to decide on.

I see it as very important that we never accidentally provide an incorrect licence for a post. The user provided a licence, and the user still owns the copyright. We can never provide the content under any other licence. (Technically there could be a licence that is legally compatible with all of the rights and constraints provided by the user's licence, but I wouldn't want to make such judgement calls.)

With this as background, the problem would be with adding an incorrect licence, not with omitting the correct licence from any particular response. The correct licence must be available, but it seems an API endpoint would cover that.

Monica Cellio‭ wrote almost 2 years ago

copy link

I'm not a legal expert, but "you must include the license with content" seems to be part of most licenses, so I'd prefer we do that. A URL is fine and sounds lightweight. Licenses can be added, so IDs shouldn't be baked into the code -- the URL is a unique identifier already, so we can use that.

Are APIs usually stateful or stateless? The comment about sending the license with the post and thus not needing it with history entries surprised me because my mental model was stateless, but I don't have an actual good basis or informed opinion here.

trichoplax‭ wrote almost 2 years ago

copy link

In response to the stateful/stateless query:

My earlier comment was a contrived example imagining that we had a history API that could not be accessed without info only found in the response to the post API. I don't think this is realistic, but that's how I was imagining we could know that the request for history had already been preceded by a request for the post. It would not be a stateful API, there would just be no way to request history until you had found out the required ids from the request for the post.

I'm not suggesting we use such a model. It was deliberately restrictive for the purpose of finding out if there could be a way to avoid needing the licence to be included everywhere. Now that I understand that we can just include the licence URL I would rather just include it everywhere, and not make the API needlessly convoluted.

It was the thought of having to include the (often long) full text of each licence that had me looking for artificial ways to reduce how often.

Communities

Comments on How should a Codidact public API work?

How should a Codidact public API work?

Features and purposes

General thoughts

4 comment threads