Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users

Dashboard
Notifications
Mark all as read
Q&A

Welcome to Codidact Meta!

Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.

We should delete all old imported content.

+7
−4

Early for the Codidact sites, it seemed like importing content from SE might be a good way to get a new site going quickly. Now with the clarity of hindsight, we can see that this didn't work. The few sites that did mass-import content are doing very poorly. As far as I can tell, these are Writing, Outdoors, and Scientific Speculation. These are three of the four least-active sites we have. There is a strong correlation between sites that imported content, and sites that are doing poorly.

OK, so importing content doesn't work, and we're not doing that anymore. However, the imported content is still hurting us. Even today, it has negative value.

It seems that search engines, particularly Google, are penalizing us for having lots of duplicate content.

I did some tests searching for titles of post by copying them verbatim into Google and Bing search bars. I tried to pick questions with reasonably generic titles so that there would be lots of content out there on the web for the search engines to match against. I considered a site as "not listed" by a search engine if there was no reference to it in the first two pages of search results. Here is what I found:

Search for title of imported post

"Is it safe to carry propane gas cylinder in minivan?" from Outdoors

    Google: Stack Exchange #1, Codidact not listed.

    Bing: Stack Exchange #1, Codidact not listed.

Search for title of home-grown post from site with imported content

"In the United States, where would I find how a geological feature got its name?" from Outdoors

    Google: Codidact not listed.

    Bing: Codidact not listed.

Search for title of home-grown post from a non-import site

"Is ESD Overhyped" from Electrical Engineering

    Google: Codidact #3.

    Bing: Codidact #1.

Conclusion

The imported content isn't being presented by search engines, so it does us no good. But even worse, it seems to cause search engines to "black list" us so that non-imported content isn't shown either.

Some of this damaging effect seems to be spilling over between Codidact sites, particularly on Google. In other words, the whole codidact.com domain is at least somewhat effected because some sites have duplicate content. Note that in the last test case, Google showed two looser matches ahead of the exact match on Codidact.

I therefore propose that we mass-delete all imported content. We may lose a few stray locally-grown answers to old questions, but those are very minor in the scheme of things. Those few answers effectively don't exist according to the search engines anyway. Meanwhile, old imported content is hurting all Codidact sites, not just the ones with the imported content. This is therefore a Codidact-wide issue, and should be dealt with as such. It's time to get past the sunk cost fallacy and cut our losses.


Data about limited imports

Mithrandir pointed out in a comment:

selectively importing posts seems to have worked quite well on the one site [Judaism] that did so

I did similar tests as above from the Judaism site, copying the title of questions directly into Google and Bing search bars. I tried to pick generic-looking questions so that there would be other stuff out there for the search engines to find, but I may have gotten this wrong because I don't know that much about Judaism. Anyway, here are the results:

Search for title of imported post

Why is Tzaara'as considered a Sakana?

    Google: Stack Exchange #1, Codidact #5.

    Bing: Stack Exchange #1, Codidact #2.

Experience-based advice for focusing and slowing down prayers?

    Google: Stack Exchange #1, Codidact #2.

    Bing: Stack Exchange #1, Codidact #14.

Search for title of home-grown post

Are flowers muktzah on Shabbat?

    Google: Codidact not listed.

    Bing: Codidact #21.

What are the flaws in the ten kal vachomer arguments in the torah?

    Google: Codidact #1.

    Bing: Codidact #2.

Updated Conclusions

  1. A small number of imports doesn't seem to hurt a site as much as mass-imports.

  2. The search engine ranking still isn't "great" for home-grown posts. It is hard to say whether that is due to the limited duplicates on the particular Codidact site, or the many duplicates in the whole codidact.com domain.

  3. There is a clear case for deleting all mass-imported content that hasn't been touched. These posts are absolutely hurting the sites they are in, and they are also hurting everything in the codidact.com domain to a some extent. All these posts must be deleted Codidact-wide. It's not just up to the individual sites because everyone is getting hurt.

  4. There is no clear case at this time for deleting the small number of selectively-imported posts, or those that have been modified here. This should be re-evaluated once the mass-imported posts have been deleted and some time (a month, maybe?) has passed to let the search engines adjust to the new conditions. Of course if individual sites wish to delete all their imported content, then this should be supported.

Are you testing in an incognito browser?

No, I didn't think of that. I'm not sure that matters, though. I thought those modes don't so much hide who you are, but delete all temporary stored data (like cookies) when you end the session. I don't see how deleting cookies after the search should matter.

Maybe someone who actually understands this stuff (unlike me), can weigh in here.

Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

7 comments

As far as I'm personally concerned, I agree that mass-imports was a bad idea and didn't help. I would like to caveat me agreeing with you here that mass-imports are the problem and selectively importing posts seems to have worked quite well on the one site that did so. I'm not really involved in Outdoors at the minute, but I personally would like to see this happen on both Writing and Scientific Speculation Mithrandir24601‭ 5 months ago

@Mith: What sites did selective imports? Olin Lathrop‭ 5 months ago

@OlinLathrop I know Judaism did, although it was maybe only ~20 questions Mithrandir24601‭ 5 months ago

@Mith: It would be interesting to try the first two experiments on the Judaism site to see how much limited imports effect things. Someone that knows the topic should do that so that they can pick generic questions with lots of stuff already out there on the web. Olin Lathrop‭ 5 months ago

@Mith: See update to the question. Olin Lathrop‭ 5 months ago

Show 2 more comments

2 answers

+9
−0

I broadly agree, with two main caveats, one about what to delete, and one about how we go about it.

By "what to delete", I'm looking at your point that:

We may lose a few stray locally-grown answers to old questions

I haven't done the queries to confirm this, but I suspect that number is relatively small. I see no harm in identifying the posts where this has happened and keeping just those - deleting everything else.

However, more importantly, this isn't our (Codidact's) decision to make. We've always said our communities would be run by the people who use them, and we've stuck by that principle. I'm happy to delete imported posts, if the communities that would be affected want that. We'd be looking for consensus on meta for each community before we start any deletion, whether or not we keep natively-answered posts too.

Why does this post require moderator attention?
You might want to add some details to your flag.

9 comments

It is a global Codidact issue because apparently imported content in one site effects all the others. We should work with the effected communities as best as possible, but deleting imported content is for the health of Codidact overall. Olin Lathrop‭ 5 months ago

@OlinLathrop We're not going to override what each community wants here. If a community wants their imported content, they can have it. ArtOfCode‭ 5 months ago

I suspect that you're right and the number of imported questions with local answers is small. Keeping those probably (hopefully?) won't hurt us much with the search engines. We could delete all but those, then see what that does to search results a month later, if the sites really want to keep those questions. Olin Lathrop‭ 5 months ago

What if Codidact gets dismissed by searching engines as a "Stack Exchange scraper" site though? Not sure if that's even a valid concern, but it would affect the whole network. Lundin‭ 5 months ago

There have to be some over-arching rules for any site in the codidact.com domain. How many users are effected Codidact-wide, versus how many are on sites with imported content, and that actually want to keep that content? At some point there is a threshold where something needs to be done to protect codidact.com as a whole. I don't know where that threshold is, or how close this issue comes to it, but there should be a discussion. This effects us all, so we all have some rights. Olin Lathrop‭ 5 months ago

Show 4 more comments
+4
−0

Writing already has consensus to delete unclaimed, untouched, imported posts. We should see how that changes things, and then the other two communities with large-scale imports can decide how they want to proceed.

Writing, at least, is unlikely to be interested in throwing out work that people here put effort into, like answers to imported questions or improvements to their imported posts. If people want to make better contributions here than exist Somewhere Else, I'm all for that. It's hard enough to get people to migrate; inertia and the sunk-costs fallacy are powerful forces.

Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment

Please take some measurements before and after so that we can all learn from Writing's experiment. Olin Lathrop‭ 5 months ago

Sign up to answer this question »

This community is part of the Codidact network. We have other communities too — take a look!

You can also join us in chat!

Want to advertise this community? Use our templates!