Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Codidact Meta!

Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.

We should delete all old imported content.

+8
−4

Early for the Codidact sites, it seemed like importing content from SE might be a good way to get a new site going quickly. Now with the clarity of hindsight, we can see that this didn't work. The few sites that did mass-import content are doing very poorly. As far as I can tell, these are Writing, Outdoors, and Scientific Speculation. These are three of the four least-active sites we have. There is a strong correlation between sites that imported content, and sites that are doing poorly.

OK, so importing content doesn't work, and we're not doing that anymore. However, the imported content is still hurting us. Even today, it has negative value.

It seems that search engines, particularly Google, are penalizing us for having lots of duplicate content.

I did some tests searching for titles of post by copying them verbatim into Google and Bing search bars. I tried to pick questions with reasonably generic titles so that there would be lots of content out there on the web for the search engines to match against. I considered a site as "not listed" by a search engine if there was no reference to it in the first two pages of search results. Here is what I found:

Search for title of imported post

"Is it safe to carry propane gas cylinder in minivan?" from Outdoors

    Google: Stack Exchange #1, Codidact not listed.

    Bing: Stack Exchange #1, Codidact not listed.

Search for title of home-grown post from site with imported content

"In the United States, where would I find how a geological feature got its name?" from Outdoors

    Google: Codidact not listed.

    Bing: Codidact not listed.

Search for title of home-grown post from a non-import site

"Is ESD Overhyped" from Electrical Engineering

    Google: Codidact #3.

    Bing: Codidact #1.

Conclusion

The imported content isn't being presented by search engines, so it does us no good. But even worse, it seems to cause search engines to "black list" us so that non-imported content isn't shown either.

Some of this damaging effect seems to be spilling over between Codidact sites, particularly on Google. In other words, the whole codidact.com domain is at least somewhat effected because some sites have duplicate content. Note that in the last test case, Google showed two looser matches ahead of the exact match on Codidact.

I therefore propose that we mass-delete all imported content. We may lose a few stray locally-grown answers to old questions, but those are very minor in the scheme of things. Those few answers effectively don't exist according to the search engines anyway. Meanwhile, old imported content is hurting all Codidact sites, not just the ones with the imported content. This is therefore a Codidact-wide issue, and should be dealt with as such. It's time to get past the sunk cost fallacy and cut our losses.


Data about limited imports

Mithrandir pointed out in a comment:

selectively importing posts seems to have worked quite well on the one site [Judaism] that did so

I did similar tests as above from the Judaism site, copying the title of questions directly into Google and Bing search bars. I tried to pick generic-looking questions so that there would be other stuff out there for the search engines to find, but I may have gotten this wrong because I don't know that much about Judaism. Anyway, here are the results:

Search for title of imported post

Why is Tzaara'as considered a Sakana?

    Google: Stack Exchange #1, Codidact #5.

    Bing: Stack Exchange #1, Codidact #2.

Experience-based advice for focusing and slowing down prayers?

    Google: Stack Exchange #1, Codidact #2.

    Bing: Stack Exchange #1, Codidact #14.

Search for title of home-grown post

Are flowers muktzah on Shabbat?

    Google: Codidact not listed.

    Bing: Codidact #21.

What are the flaws in the ten kal vachomer arguments in the torah?

    Google: Codidact #1.

    Bing: Codidact #2.

Updated Conclusions

  1. A small number of imports doesn't seem to hurt a site as much as mass-imports.

  2. The search engine ranking still isn't "great" for home-grown posts. It is hard to say whether that is due to the limited duplicates on the particular Codidact site, or the many duplicates in the whole codidact.com domain.

  3. There is a clear case for deleting all mass-imported content that hasn't been touched. These posts are absolutely hurting the sites they are in, and they are also hurting everything in the codidact.com domain to a some extent. All these posts must be deleted Codidact-wide. It's not just up to the individual sites because everyone is getting hurt.

  4. There is no clear case at this time for deleting the small number of selectively-imported posts, or those that have been modified here. This should be re-evaluated once the mass-imported posts have been deleted and some time (a month, maybe?) has passed to let the search engines adjust to the new conditions. Of course if individual sites wish to delete all their imported content, then this should be supported.

Are you testing in an incognito browser?

No, I didn't think of that. I'm not sure that matters, though. I thought those modes don't so much hide who you are, but delete all temporary stored data (like cookies) when you end the session. I don't see how deleting cookies after the search should matter.

Maybe someone who actually understands this stuff (unlike me), can weigh in here.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

General comments (7 comments)

2 answers

You are accessing this answer with a direct link, so it's being shown above all other answers regardless of its score. You can return to the normal view.

+12
−0

I broadly agree, with two main caveats, one about what to delete, and one about how we go about it.

By "what to delete", I'm looking at your point that:

We may lose a few stray locally-grown answers to old questions

I haven't done the queries to confirm this, but I suspect that number is relatively small. I see no harm in identifying the posts where this has happened and keeping just those - deleting everything else.

However, more importantly, this isn't our (Codidact's) decision to make. We've always said our communities would be run by the people who use them, and we've stuck by that principle. I'm happy to delete imported posts, if the communities that would be affected want that. We'd be looking for consensus on meta for each community before we start any deletion, whether or not we keep natively-answered posts too.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

General comments (9 comments)
+6
−0

Writing already has consensus to delete unclaimed, untouched, imported posts. We should see how that changes things, and then the other two communities with large-scale imports can decide how they want to proceed.

Writing, at least, is unlikely to be interested in throwing out work that people here put effort into, like answers to imported questions or improvements to their imported posts. If people want to make better contributions here than exist Somewhere Else, I'm all for that. It's hard enough to get people to migrate; inertia and the sunk-costs fallacy are powerful forces.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

General comments (1 comment)

Sign up to answer this question »