Bug 23581 - bugs not showing up on Google
Summary: bugs not showing up on Google
Status: RESOLVED INVALID
Alias: None
Product: sourceware
Classification: Unclassified
Component: spam (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-29 06:45 UTC by Frederick Eaton
Modified: 2020-01-02 21:41 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frederick Eaton 2018-08-29 06:45:22 UTC
I first commented on this here: https://sourceware.org/bugzilla/show_bug.cgi?id=22479

but thought it would be more appropriate to create a separate bug.

joseph@codesourcery.com said:

> > Also, I noticed that this bug doesn't appear on Google, is that
> > intentional? I think collaboration would be easier if search
> > engines were allowed to index the bugs in your software.
> 
> I see nothing in robots.txt or any robots meta tags to exclude
> indexing of either Bugzilla or the glibc-bugs list archives. You
> would have to ask Google about why it's not included (or maybe
> someone on overseers has access to or could set up Google Webmaster
> Tools for sourceware).

I don't think it is a problem with robots.txt or a misconfigured
Google Webmaster Tools, for example if I Google "FAIL:
misc/tst-preadvwritev2" then it brings up a result from this website.
I think that's because that particular bug is linked to from
bugs.launchpad.net. I imagine that the reason that bug 22479 doesn't
show up anywhere is because no one links to it. But there should be
some kind of listing of all bugs so that even bugs which are not
linked from other websites can be seen by Google. Is this not a
problem that other Bugzilla installations run into?
Comment 1 jsm-csl@polyomino.org.uk 2018-08-29 12:54:33 UTC
You have to *email* overseers@sourceware.org, the sysadmin mailing list 
for sourceware (the mailing list doesn't have an account on Bugzilla).  
However, as all bugs get posted to the relevant mailing lists they should 
be reachable from the list archives.  So, again, ask Google, or get 
someone with Google Webmaster Tools access for sourceware (who you'll need 
to find on the overseers mailing list, again) to check if that gives any 
relevant information about what is / is not indexed and why.  But I think 
this is an issue for Google, so you need to find support contacts there, 
not an issue for sourceware admins at all, and so this bug should be 
closed as INVALID.
Comment 2 Frédéric Buclin 2018-09-04 00:47:34 UTC
(In reply to joseph@codesourcery.com from comment #1)
> this is an issue for Google, so you need to find support contacts there, 
> not an issue for sourceware admins at all, and so this bug should be 
> closed as INVALID.

So INVALID.
Comment 3 Frederick Eaton 2018-09-18 12:16:37 UTC
I just wanted to update this with the discussion from the overseers
list, I'm assuming that is OK.

----------------

Dear Overseers,

joseph@codesourcery.com suggested that I email you about my
observation that most of your bugs are not showing up on Google. I
opened a bug against Bugzilla but he suggests that is invalid and that
I need to email instead.

https://sourceware.org/bugzilla/show_bug.cgi?id=23581

[...]

----------------

Hi -

> joseph@codesourcery.com suggested that I email you about my
> observation that most of your bugs are not showing up on Google.
> [...]

I don't know about "most"; undoubtedly many appear and some do not.
It may be relevant that we have had to throttle googlebot from
full access to the sourceware web servers because it was repeatedly
found ignoring robots.txt and saturating the server with traffic.
So we have reluctantly slowed its access down.  I expect it to
get around to all the bugzilla entries over time, just maybe not as
fast as you expect.

- FChE

----------------

Thanks Frank for your reply. The entry I was looking at was over a
year old. I don't know what you mean by "over time" but I would
consider that too long. Also I don't think it would take that long for
even a throttled Googlebot to crawl your site.

I'm not sure how a crawler is supposed to see all the bugs, is there a
way of listing them all without going through a search form?

Apparently there are ways to enforce robots.txt using mod_rewrite: as
long as Googlebot doesn't change its user agent, I think you can more
or less easily prevent it from accessing a given URL:

https://perishablepress.com/eight-ways-to-blacklist-with-apaches-mod_rewrite/comment-page-4/

That seems easier to me than QoS tuning.

Even better would be if we could report bugs to Google [...]

By the way, I couldn't find a public archive of this mailing list,
should we be discussing this on Bugzilla in case other Bugzilla
maintainers want to benefit from your experience?

https://sourceware.org/bugzilla/show_bug.cgi?id=23581

Maybe I can paste these messages into a comment on that bug and then
add overseers to the Cc list?

[...]

----------------

On Wed, Sep 05, 2018 at 11:01:17PM -0700, frederik@ofb.net wrote:
>By the way, I couldn't find a public archive of this mailing list,

That's because this is a restricted mailing list.  There is a public
archive but it's not advertised.

cgf