Welcome to Codidact Meta!
Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.
Comments on We should delete all old imported content.
Post
We should delete all old imported content.
Early for the Codidact sites, it seemed like importing content from SE might be a good way to get a new site going quickly. Now with the clarity of hindsight, we can see that this didn't work. The few sites that did mass-import content are doing very poorly. As far as I can tell, these are Writing, Outdoors, and Scientific Speculation. These are three of the four least-active sites we have. There is a strong correlation between sites that imported content, and sites that are doing poorly.
OK, so importing content doesn't work, and we're not doing that anymore. However, the imported content is still hurting us. Even today, it has negative value.
It seems that search engines, particularly Google, are penalizing us for having lots of duplicate content.
I did some tests searching for titles of post by copying them verbatim into Google and Bing search bars. I tried to pick questions with reasonably generic titles so that there would be lots of content out there on the web for the search engines to match against. I considered a site as "not listed" by a search engine if there was no reference to it in the first two pages of search results. Here is what I found:
Search for title of imported post
"Is it safe to carry propane gas cylinder in minivan?" from Outdoors
Google: Stack Exchange #1, Codidact not listed.
Bing: Stack Exchange #1, Codidact not listed.
Search for title of home-grown post from site with imported content
"In the United States, where would I find how a geological feature got its name?" from Outdoors
Google: Codidact not listed.
Bing: Codidact not listed.
Search for title of home-grown post from a non-import site
"Is ESD Overhyped" from Electrical Engineering
Google: Codidact #3.
Bing: Codidact #1.
Conclusion
The imported content isn't being presented by search engines, so it does us no good. But even worse, it seems to cause search engines to "black list" us so that non-imported content isn't shown either.
Some of this damaging effect seems to be spilling over between Codidact sites, particularly on Google. In other words, the whole codidact.com domain is at least somewhat effected because some sites have duplicate content. Note that in the last test case, Google showed two looser matches ahead of the exact match on Codidact.
I therefore propose that we mass-delete all imported content. We may lose a few stray locally-grown answers to old questions, but those are very minor in the scheme of things. Those few answers effectively don't exist according to the search engines anyway. Meanwhile, old imported content is hurting all Codidact sites, not just the ones with the imported content. This is therefore a Codidact-wide issue, and should be dealt with as such. It's time to get past the sunk cost fallacy and cut our losses.
Data about limited imports
Mithrandir pointed out in a comment:
selectively importing posts seems to have worked quite well on the one site [Judaism] that did so
I did similar tests as above from the Judaism site, copying the title of questions directly into Google and Bing search bars. I tried to pick generic-looking questions so that there would be other stuff out there for the search engines to find, but I may have gotten this wrong because I don't know that much about Judaism. Anyway, here are the results:
Search for title of imported post
Why is Tzaara'as considered a Sakana?
Google: Stack Exchange #1, Codidact #5.
Bing: Stack Exchange #1, Codidact #2.
Experience-based advice for focusing and slowing down prayers?
Google: Stack Exchange #1, Codidact #2.
Bing: Stack Exchange #1, Codidact #14.
Search for title of home-grown post
Are flowers muktzah on Shabbat?
Google: Codidact not listed.
Bing: Codidact #21.
What are the flaws in the ten kal vachomer arguments in the torah?
Google: Codidact #1.
Bing: Codidact #2.
Updated Conclusions
- A small number of imports doesn't seem to hurt a site as much as mass-imports.
- The search engine ranking still isn't "great" for home-grown posts. It is hard to say whether that is due to the limited duplicates on the particular Codidact site, or the many duplicates in the whole codidact.com domain.
- There is a clear case for deleting all mass-imported content that hasn't been touched. These posts are absolutely hurting the sites they are in, and they are also hurting everything in the codidact.com domain to a some extent. All these posts must be deleted Codidact-wide. It's not just up to the individual sites because everyone is getting hurt.
- There is no clear case at this time for deleting the small number of selectively-imported posts, or those that have been modified here. This should be re-evaluated once the mass-imported posts have been deleted and some time (a month, maybe?) has passed to let the search engines adjust to the new conditions. Of course if individual sites wish to delete all their imported content, then this should be supported.
Are you testing in an incognito browser?
No, I didn't think of that. I'm not sure that matters, though. I thought those modes don't so much hide who you are, but delete all temporary stored data (like cookies) when you end the session. I don't see how deleting cookies after the search should matter.
Maybe someone who actually understands this stuff (unlike me), can weigh in here.
1 comment thread