[PATCH] make setup mirror list more like web page not just urls

Brian Inglis Brian.Inglis@SystematicSw.ab.ca
Mon Nov 20 21:30:00 GMT 2017


On 2017-11-17 06:53, Jon Turney wrote:
On 11/17/2017 8:48 AM, Ken Brown wrote:
> On 15/11/2017 21:35, Brian Inglis wrote:

Thanks for your comments, I understand all your points, and agree with them.
I merged the posts and responses as they addressed the same issues.
I also applied Ken's setup patch and rejigged mine where required.

KB> Ken Brown
JT> Jon Turney

KB> I think this is a good idea in principle, but it still needs some work.

JT> use a grid control to present this information.

Don't think I want to learn that much about Windows controls!
The width of the longest fields in each column would need to adjust the grid
and dialogue box widths, and Add button position, dynamically, which I did
manually, and the internal blank space may be a bit harder to scan for users
who are interested only in their geographic region, country, or territory
- these are pretty well aligned as strings.

KB> Adding the protocol to the key instead of the full url isn't enough to
KB> guarantee uniqueness of the key.
KB> Users can add their own sites for which that fails.
KB> Also, you shouldn't be relying on the servername field being filled in.
KB> For example, the function SiteSetting::registerSavedSite() explicitly
KB> constructs a site 'tempSite' that has an empty servername.

JT> This code needs to handle manually added URLs with non-FQDNs.
JT> - - url...'s going to need fixing.

If host name provided, use as servername, or default to localhost.

If at least two domain components:
if CC TLD, use as area in uppercase, or
if gTLD, use as area with init cap;
if 2-3 char TLD and 2-3 char 2nd level domain, may be subTLD, so if 3rd level
domain exists and does not start with (or contain?) "cygwin", "ftp", "mirror",
"www", add space plus 2-3 char 2nd level domain with init cap to area.
Use next level domain as location with init cap and uppercase after any "-".
Otherwise default area to Local, and use servername as location with init cap.
This adhoc approach seems to work well enough and avoids having to drag in and
process the rules in the Public Suffix List (libpsl5 and
publicsuffix-list-dafsa).

Examples:

Input	->
    List
    Displayed
    Key

http://mirror.cpsc.ucalgary.ca/mirror/cygwin.com		->
http://mirror.cpsc.ucalgary.ca/mirror/cygwin.com/;mirror.cpsc.ucalgary.ca;CA;Ucalgary
CA - Ucalgary - http://mirror.cpsc.ucalgary.ca/mirror/cygwin.com
CA Ucalgary ca ucalgary cpsc mirror
http://mirror.cpsc.ucalgary.ca/mirror/cygwin.com/

[cyg]file://server/c$/usr/local/cygwin/var/cache/setup/packages	->
[cyg]file://server/c$/usr/local/cygwin/var/cache/setup/packages/;server;Local;Server
Local - Server - file://server/c$/usr/local/cygwin/var/cache/setup/packages
Local Server server file://server/c$/usr/local/cygwin/var/cache/setup/packages/

cygfile:///var/cache/setup/packages		->
cygfile:///var/cache/setup/packages/;localhost;Local;Localhost
Local - Localhost - cygfile://var/cache/setup/packages
Local Localhost localhost cygfile://var/cache/setup/packages/

file:///c|/usr/local/cygwin/var/cache/setup/packages		->
file:///c|/usr/local/cygwin/var/cache/setup/packages/;localhost;Local;Localhost
Local - Localhost - file:///c|/usr/local/cygwin/var/cache/setup/packages
Local Localhost localhost file:///c|/usr/local/cygwin/var/cache/setup/packages/

Cygwin file:// does not appear to use /// or support anything but local files,
so should always be treated as localhost; or does it support servers, but they
can be treated as if directories under Windows, so we need to do a server or
directory check, or that check is done at a lower level?

JT> you can also just add a path.

If UNC path with server, use as above, but include share name in display,
otherwise default as above.

Examples:

\\server\c$\usr\local\cygwin\var\cache\setup\packages		->
\\server\c$\usr\local\cygwin\var\cache\setup\packages/;server;Local;Server
Local - Server - \\server\c$
Local Server server \\server\c$\usr\local\cygwin\var\cache\setup\packages/

c:\usr\local\cygwin\var\cache\setup\packages			->
c:\usr\local\cygwin\var\cache\setup\packages/;localhost;Local;Localhost
Local - Localhost - c:\usr\local\cygwin\var\cache\setup\packages
Local Localhost localhost c:\usr\local\cygwin\var\cache\setup\packages/

Where are the cached sites stashed - setup keeps showing my old test entries?

JT> if you add two URLs with the same protocol/host, they aren't distinguishable

Check for and eliminate duplicates after sort if same site_list fields.
Handled by KB patch.
Note on flag added by that patch - would is_current, is_good, or is_known be a
better reflection of what that flag means?
Should we add ==(string) to compare only the url member?
Should we add a constructor(string,is_official) to refactor string to site
conversions, and the url additions?

KB> If the saved url corresponds to a site in the list, I think setup should
KB> recognize this rather than listing the same site again as " - - url".

JT> If the URL matches one in the list, that item should be selected.

Check for URL match with site list and use match as site:
added for tempSite in RegisterSavedSites and newsite on Add;
required as key now includes area and location, not just URL.
Could we add a check to a string contructor to return only a reference to an
existing entry, if the url matches, to refactor the above changes?

JT> Store area and location so we can populate those fields if it doesn't.

See above.

JT> Please, please use git-format-patch in future.

I can't tell from looking, does it differ from diff output, and is there a
reason to use git-format-patch for review?
I use diffs for feedback until it DTRT.
I can manage one commit and one format-patch on a good day.
I should just stick to using what Tortoise Git supports.
My record with git includes a lot of rm -rf and re-clone, as the files, trees,
and indices get out of whack when I forget some incantation and try to fix it!
Very brittle and unfriendly - if it's going to bork itself, just don't do that!
I never had a single problem with CVS or Hg in many years ;^>

JT> The patch also has some spurious changes to the mode of the manifest files.

Realized I shouldn't tidy up unnecessary x modes under git control - habit
- gets me into trouble and I learn new things - checked out masters.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
-------------- next part --------------
diff --git a/res.rc b/res.rc
index 80d1bf1..fa90a65 100644
--- a/res.rc
+++ b/res.rc
@@ -135,10 +135,10 @@ CAPTION "Cygwin Setup - Choose Download Site(s)"
 FONT 8, "MS Shell Dlg"
 BEGIN
     ICON            IDI_CYGWIN,IDC_HEADICON,SETUP_HEADICON_X,0,21,20
-    LISTBOX         IDC_URL_LIST,66,45,185,110,LBS_NOINTEGRALHEIGHT | 
+    LISTBOX         IDC_URL_LIST,66,45,216,110,LBS_NOINTEGRALHEIGHT | 
                     LBS_EXTENDEDSEL | WS_VSCROLL | WS_HSCROLL | WS_GROUP | 
                     WS_TABSTOP
-    LTEXT           "Available Download Sites:",IDC_STATIC,66,34,183,8,NOT 
+    LTEXT           "Available Download Sites:",IDC_STATIC,66,34,216,8,NOT 
                     WS_GROUP
     CONTROL         "",IDC_HEADSEPARATOR,"Static",SS_BLACKFRAME | SS_SUNKEN,0,28,
                     SETUP_STANDARD_DIALOG_W,1
@@ -146,10 +146,10 @@ BEGIN
                     IDC_STATIC,21,9,239,16,NOT WS_GROUP
     LTEXT           "Choose A Download Site",IDC_STATIC_HEADER_TITLE,7,0,258,
                     8,NOT WS_GROUP
-    EDITTEXT        IDC_EDIT_USER_URL,65,160,185,14,ES_AUTOHSCROLL | 
+    EDITTEXT        IDC_EDIT_USER_URL,65,160,216,14,ES_AUTOHSCROLL | 
                     WS_GROUP
     LTEXT           "User URL:",IDC_SITE_USERURL,15,162,45,8,NOT WS_GROUP
-    PUSHBUTTON      "Add",IDC_BUTTON_ADD_URL,255,160,50,14
+    PUSHBUTTON      "Add",IDC_BUTTON_ADD_URL,288,160,50,14
 END
 
 IDD_NET DIALOG DISCARDABLE  0, 0, SETUP_STANDARD_DIALOG_DIMS
diff --git a/site.cc b/site.cc
index c9fac12..a2f9406 100644
--- a/site.cc
+++ b/site.cc
@@ -18,6 +18,7 @@
 
 #include <string>
 #include <algorithm>
+#include <cctype>
 
 #include "site.h"
 #include "win32.h"
@@ -141,6 +142,130 @@ SiteSetting::~SiteSetting ()
     save ();
 }
 
+static bool
+common (const string &domain)
+{
+    const string prefixes[] = { "cygwin", "ftp", "mirror", "www" };
+
+    for (auto& s: prefixes)
+    {
+      if (s.length() <= domain.length() &&
+	  mismatch(s.begin(), s.end(), domain.begin()).first == s.end())
+	return true;
+    }
+
+    return false;
+}
+
+static string
+area_default (const string &servername)
+{
+  string::size_type last_idx = servername.length () - 1;
+  string::size_type idx = servername.find_last_of (".", last_idx);
+
+  /* no dots => local server */
+  if (idx == string::npos)
+  {
+    return "Local";
+  }
+
+  string area = servername.substr (idx + 1, last_idx - idx);
+  /* gTLD - initial cap. */
+  area[0] = toupper (area[0]);
+
+  /* length 2 TLD => CC TLD - uppercase. */
+  if (last_idx - idx == 2)
+  {
+    area[1] = toupper (area[1]);
+  }
+
+  /* length 2-3 TLD => may have subTLD - check. */
+  if (last_idx - idx <= 3)
+  {
+    last_idx = idx - 1;
+    idx = servername.find_last_of (".", last_idx);
+
+    /* domain below length 2-3 subTLD? */
+    if (idx != string::npos && last_idx - idx <= 3)
+    {
+      string::size_type next_idx = servername.find_last_of (".", idx - 1);
+
+      /* no more dots - begins at start */
+      if (next_idx == string::npos)
+      {
+	next_idx = -1;
+      }
+
+      /* check for common domain prefixes to ignore as location */
+      if (!common (servername.substr (next_idx + 1, idx - next_idx - 1)))
+      {
+	/* not common domain - add subTLD to area with initial cap. */
+	area += " " + toupper (servername[idx + 1]) +
+			servername.substr (idx + 2, last_idx - idx - 1);
+      }
+    }
+  }
+
+  return area;
+}
+
+static string
+location_default (const string &servername)
+{
+  string::size_type last_idx = servername.length () - 1;
+  string::size_type idx = servername.find_last_of (".", last_idx);
+  string location = servername;
+  /* initial cap. */
+  location[0] = toupper (location[0]);
+
+  /* no dots => local server */
+  if (idx == string::npos)
+  {
+    return location;
+  }
+
+  /* length 2-3 TLD => may have subTLD - check. */
+  if (last_idx - idx <= 3)
+  {
+    last_idx = idx - 1;
+    idx = servername.find_last_of (".", last_idx);
+
+    /* domain below length 2-3 subTLD? */
+    if (idx != string::npos && last_idx - idx <= 3)
+    {
+      string::size_type next_idx = servername.find_last_of (".", idx - 1);
+
+      /* no more dots - begins at start */
+      if (next_idx == string::npos)
+      {
+	next_idx = -1;
+      }
+
+      /* check for common domain prefixes to ignore as location */
+      if (!common (servername.substr (next_idx + 1, idx - next_idx - 1)))
+      {
+	/* not common domain - use that as location. */
+	last_idx = idx - 1;
+	idx = next_idx;
+      }
+    }
+  }
+
+  location = servername.substr (idx + 1, last_idx - idx);
+
+  /* initial cap */
+  location[0] = toupper (location[0]);
+
+  idx = 0;
+  while ((idx = location.find( '-', idx)) != string::npos)
+  {
+    ++idx;
+    location[idx] = toupper (location[idx]);
+  }
+
+  return location;
+}
+
 site_list_type::site_list_type (const string &_url,
 				const string &_servername,
 				const string &_area,
@@ -157,30 +282,72 @@ site_list_type::site_list_type (const string &_url,
   if (url.at(url.length()-1) != '/')
     url.append("/");
 
-  /* displayed_url is protocol and site name part of url */
-  string::size_type path_offset = url.find ("/", url.find ("//") + 2);
-  displayed_url = url.substr(0, path_offset);
+  /* get protocol, host flag, and site name parts of url */
+  string::size_type host_flag_posn = url.find ("//");
+  string::size_type path_offset = (host_flag_posn == string::npos)
+					? string::npos
+					: url.find ("/", host_flag_posn + 2);
+  /* not url-like, check for Windows UNC paths */
+  if (host_flag_posn == string::npos)
+  {
+    if ('\\' == url[0] && '\\' == url[1])
+    {
+      /* UNC path - get server */
+      host_flag_posn = 0;
+      /* terminal '/' ensures result */
+      path_offset = url.find_first_of ("\\/", host_flag_posn + 2);
+    }
+  }
+  /* default servername to url host or localhost */
+  if (servername.length() == 0) {
+      servername = (path_offset != string::npos &&
+		    host_flag_posn + 2 < path_offset)
+			? url.substr (host_flag_posn + 2,
+					path_offset - (host_flag_posn + 2))
+			: "localhost";
+  }
+  /* default area to TLD and maybe short 2nd level domain */
+  if (area.length() == 0) {
+      area = area_default (servername);
+  }
+  /* default location to next level domain */
+  if (location.length() == 0) {
+      location = location_default (servername);
+  }
+
+  /* the sorting key is area, location, domain name components in reverse order
+   * (to group by country code) plus url (to ensure uniqueness) */
+  key = area + " " + location + " ";
+  string::size_type last_idx = path_offset - 1;
+  string::size_type idx = url.find_last_of ("./", last_idx);
 
-  /* the sorting key is hostname components in reverse order (to sort by country code)
-     plus the url (to ensure uniqueness) */
-  key = string();
-  string::size_type last_idx = displayed_url.length () - 1;
-  string::size_type idx = url.find_last_of("./", last_idx);
-  if (last_idx - idx == 3)
+  if (3 <= last_idx - idx)
   {
-    /* Sort non-country TLDs (.com, .net, ...) together. */
+    /* Sort gTLDs (.com, .net, .info, ...) together. */
     key += " ";
   }
-  do
+
+  while (idx < last_idx)
   {
-    key += url.substr(idx + 1, last_idx - idx);
+    key += url.substr (idx + 1, last_idx - idx);
     key += " ";
     last_idx = idx - 1;
-    idx = url.find_last_of("./", last_idx);
-    if (idx == string::npos)
-      idx = 0;
-  } while (idx > 0);
+    idx = url.find_last_of ("./\\", last_idx);
+  }
+
   key += url;
+
+  /* if double (back)slashes at start of url */
+  if (host_flag_posn == 0)
+  {
+    /* check for next path delimiter */
+    string::size_type share_end = url.find_first_of ("\\/", path_offset + 1);
+    /* include UNC server share name */
+    if (share_end != string::npos)
+      path_offset = share_end;
+  }
+  /* displayed_url is area, location, protocol and site name part of url */
+  displayed_url = area + " - " + location + " - " + url.substr (0, path_offset);
 }
 
 site_list_type::site_list_type (site_list_type const &rhs)
@@ -379,6 +546,15 @@ void
 SiteSetting::registerSavedSite (const char * site)
 {
   site_list_type tempSite(site, "", "", "", false);
+
+  /* use site in list if matching url */
+  for (auto& s: all_site_list)
+    if (site == s.url)
+    {
+      tempSite = s;
+      break;
+    }
+
   SiteList::iterator i = find (all_site_list.begin(),
 			       all_site_list.end(), tempSite);
   if (i == all_site_list.end())
@@ -699,6 +875,15 @@ bool SitePage::OnMessageCmd (int id, HWND hwndctl, UINT code)
 	    if (other_url.size())
 	    {
 	    site_list_type newsite (other_url, "", "", "", false);
+
+	    /* use site in list if matching url */
+	    for (auto& s: all_site_list)
+	      if (other_url == s.url)
+	      {
+		newsite = s;
+		break;
+	      }
+
 	    SiteList::iterator i = find (all_site_list.begin(),
 					 all_site_list.end(), newsite);
 	    if (i == all_site_list.end())


More information about the Cygwin-apps mailing list