Justin Goldberg's Blog

Justin Goldberg

Compare this:

http://www.google.com/search?rlz=1C1GGLS_enUS291&sourceid=chrome&ie=UTF-8&q=CL.TechSupport.Jobs@gmail.com&pws=0

with this:

http://www.google.com/search?rlz=1C1GGLS_enUS291&sourceid=chrome&ie=UTF-8&q=CL.TechSupport.Jobs@gmail.com&pws=0

I thought it was curious that of this obvious craigslist spam, Google showed me only the result from Atlanta, which is a word I've been searching for a lot lately, eg atlanta jobs, employment, etc.... Is this a secret? No. But it's another piece of the puzzle. Btw, hittail is still the easiest way to SEO your site!

Justin Goldberg

So what exactly does it do? Does it really enable searching and sorting of your myspace inbox? I never installed used because it just has too many privacy implications. Never use it on a computer other than your own (it stores your myspace inbox on your hard drive!).

Justin Goldberg

I've created a new custom google search engine here, which limits searches to only sites which have free service manuals to download for common Laptops, such as eserviceinfo.com and sonyrecovery.co.uk. As an example, here is a search for Acer 1300.

If anyone knows the URL's to electronics manufacturers internal, partner-only, or ASP (authorized service provider) support portals (and hopefully passwords too ;-) such as csd.acer.com.tw please leave a comment below, email me, or give me a phone call.

Also, if you need your computer, laptop, printer, or cell phone repaired, or home or business computer networking, data recovery, and you're in the New Orleans area I am available for business. My contact information is below.

Here's an RSS feed that may interest you ;-)

Justin Goldberg

This message was sent through plugoo. I'm pretty amazed it took so long to get my first plugoo spam:
4:19 PM plugoobuddy: [OFFLINE MESSAGE] by [xixo1951] ERROR > "102 my spans dont del"

Sweet.

Justin Goldberg

I just tried logging in to the MSN Webmessenger and I get this lovely message:

We can't sign any more people in right now. Please try signing in again later.

Perhaps Microsoft should consider using Google App Engine.

Justin Goldberg

I've been working as a PC technician, and to make it easier to identify what Windows processes are legitimate and which ones are bad, and I've created a custom google search engine which currently searches four Microsoft Windows process information sites, file.net, bleepingcomputer.com, neuber.com, and auditmypc.com. Of course, just this alone won't fix everything, but I find this to be kind of useful. If you find another site that offers real information, please leave a comment below.

Another search engine I've created with Google CSE is one that searches anti-scam sites such as ripoffreport etc....

Justin Goldberg

It looks like the query deserves freshness knob has been tweaked a little too far. A query for details in El Google shows the USGS data for the recent quake in Sichuan, China as the second result. The only curious thing I can see in the code is that the details link is the first link after the first H1 and H2 HTML tags.

Justin Goldberg

Since HyperTerminal no longer comes with Windows Vista, you get still download Hilgraeve's HyperTerminal private edition 6.3 (HTPE), the last and final free version, from the Internet Archive here. It should install and work fine under Vista/7/Server 2003/2008/2012. You may need to disable uac or run the installer as an administrator.

Update: 7.0 is here but it is trialware.

There is an old FAQ from 1999 here.

The source code for Hyperterminal is in the Windows 2000 source code (htpe version 4) that was leaked.

If you are working with binary data or other difficult data streams, I strongly suggest using Realterm instead. To quote their page:

Realterm is a terminal program specially designed for capturing, controlling and debugging binary and other difficult data streams. It is far better for debugging comms than Hyperterminal. It has no support for dialing modems, BBS etc - that is what hyperterminal does.

Justin Goldberg

Is mint.com really as secure as they purport it to be? The blogosphere tends to disagree, except for wilkinsonlaw and a few others. But is it really?

At least you are logged out automatically after ten minutes. But if you've ever submitted a password reset request on the Mint.com web site, the link stays active for a long time, much too long. It was still active a month after the request. I emailed the webmaster as I couldn't find any other contact address on the site, and got back a boilerplate response, naturally:

Please do the following to recover your password:

1. Go to the login page at: https://wwws.mint.com/recovery.event
2. Click on the “recover it” link next to “forgot your password”.
3. Enter in the email address you used to create your Mint account.
4. An email will be sent to the email address you specified (note: the link is valid for only two hours).
5. If you don’t see the information in your inbox, please be sure to check your spam and bulk mail folders as well (ISPs sometimes route emails to these folders).

At least the email got through to a person and didn't sit around forever in unread email lalaland. You have to give them credit on that, in this age of email inundation. On a tangent, is knowledge management the solution? Back to the topic, I emailed them saying that email can be captured and snooped. All I ever got back was the standard "a highly trained team of monkeys is feverishly working on the situation" automatic reply email.

Today I reset my password again, and the same thing happens. The reset link stays alive after using it. It's not a big deal if you use the link, because you'll notice if someone snooped and reset it. You'd think Mint would send an email alerting you that your password has changed.

Here's the reset email:

This email was sent in response to your request to recover your password. To reset your password and access your account, click on the link below.

Reset your password [https://wwws.mint.com/recovery.event?username=email@example.com&token=xxxxxxxxxxxxxxxxxxxx&utm_source=xxx&utm_medium=xxx&utm_content=xxx]

The link will reset your forgotten password, and let you create a new one. For security purposes, this link will remain active only for the next 2 hours.

If you did not request that we send this Forgotten Password email to you, please report this email to us at: support@mint.com

Thank you for using Mint.com!

Cheers,
The Mint Team

Also they are using a google analytics urchin tracking link, which is kind of irksome for the paranoidal borderline-schizo types like me.

One final thing, I also get a "Connection Partially Encrypted" message in the Firefox "Page Info" window.

Justin Goldberg

I just read this article Search engines warned over data, and it really makes me mad. Why can't the search engines just use a unique id for each user, using a one-way hash function which cannot be decoded back to the original IP address unless it's brute-forced, which would take years just to get one ip address unless you're the NSA with their alien technology?

Or is the real privacy problem with relating different searches together, and not ip addresses? They could merely be removing ip addresses like they say and not removing your GUID linking the searches together for their relational data.

I hope someone at Google reads this.

Also I doubt Yahoo! has done anything like the article says, and if they have, when why did they give up data on the chinese dissident blogger who is now sitting in jail? It's hypocrisy, and Yahoo!'s privacy reputation is now ruined forever. They are the Micro$oft of search engines.

Justin Goldberg

Enter it today?

unzip, strip, touch, finger, grep, mount, fsck, more, yes,fsck,fsck,fsck,umount, sleep

Justin Goldberg

Bow down in my eliteness for this nugget of knowledge:

http://web.archive.org/web/*/people.netscape.com/*

I found an really amazing quote on Jamie Zawinski's old page:

``We all enter this world in the same way: naked; screaming; soaked in blood. But if you live your life right, that kind of thing doesn't have to stop there.''

-- Dana Gould

Justin Goldberg

Google's newest april fools joke, Gmail Custom TimeTM, brings up a great idea for Gmail, or any email service or architecture. What about sending an email in the future? If my eight year old Nokia phone can schedule an SMS text message at a certain point in the future, why can't gmail?

read more | digg story

Justin Goldberg

Looking through my google alert for site:arxiv.org pagerank I found this relatively new paper on arxiv.org, Maximizing PageRank via outlinks. The PDF has been uploaded to scribd.

Here's the abstract:

We analyze linkage strategies for a set I of webpages for which the webmaster wants to maximize the sum of Google's PageRank scores. The webmaster can only choose the hyperlinks starting from the webpages of I and has no control on the hyperlinks from other webpages. We provide an optimal linkage strategy under some reasonable assumptions.

And the conclusion:

Conclusions
In this paper we provide the general shape of an optimal link structure for a website in order to maximize its PageRank. This structure with a forward chain and every possible backward links may be not intuitive. At our knowledge, it has never been mentioned, while topologies like a clique, a ring or a star are considered in the literature on collusion and alliance between pages [3, 8]. Moreover, this optimal structure gives new insight into the affirmation of Bianchini et al. [5] that, in order to maximize the PageRank of a website, hyperlinks to the rest of the webgraph “should be in pages with a small PageRank and that have many internal hyperlinks”. More precisely, we have seen that the leaking pages must be choosen with respect to the mean number of visits before zapping they give to the website, rather than their PageRank.

I have no clue whatsoever as to what that means. I've posted the full text below:

arXiv:0711.2867v1 [cs.IR] 19 Nov 2007
Maximizing PageRank via outlinks
Cristobald de Kerchove Laure Ninove Paul Van Dooren
CESAME, Universit´e catholique de Louvain,
Avenue Georges Lemaˆıtre 4–6, B-1348 Louvain-la-Neuve, Belgium
{c.dekerchove, laure.ninove, paul.vandooren}@uclouvain.be
Abstract
We analyze linkage strategies for a set I of webpages for which the
webmaster wants to maximize the sum of Google’s PageRank scores.
The webmaster can only choose the hyperlinks starting from the webpages
of I and has no control on the hyperlinks from other webpages.
We provide an optimal linkage strategy under some reasonable assumptions.
Keywords: PageRank, Google matrix, Markov chain, Perron vector,
Optimal linkage strategy
AMS classification: 15A18, 15A48, 15A51, 60J15, 68U35
1 Introduction
PageRank, a measure of webpages’ relevance introduced by Brin and Page, is
at the heart of the well known search engine Google [6, 15]. Google classifies
the webpages according to the pertinence scores given by PageRank, which
are computed from the graph structure of the Web. A page with a high
PageRank will appear among the first items in the list of pages corresponding
to a particular query.
If we look at the popularity of Google, it is not surprising that some
webmasters want to increase the PageRank of their webpages in order to
get more visits from websurfers to their website. Since PageRank is based
on the link structure of the Web, it is therefore useful to understand how
addition or deletion of hyperlinks influence it.
Mathematical analysis of PageRank’s sensitivity with respect to perturbations
of the matrix describing the webgraph is a topical subject of interest
(see for instance [2, 5, 11, 12, 13, 14] and the references therein). Normwise
and componentwise conditioning bounds [11] as well as the derivative [12, 13]
are used to understand the sensitivity of the PageRank vector. It appears
that the PageRank vector is relatively insensitive to small changes in the
graph structure, at least when these changes concern webpages with a low
1
Preliminary version – February 2, 2008
PageRank score [5, 12]. One could think therefore that trying to modify
its PageRank via changes in the link structure of the Web is a waste of
time. However, what is important for webmasters is not the values of the
PageRank vector but the ranking that ensues from it. Lempel and Morel [14]
showed that PageRank is not rank-stable, i.e. small modifications in the link
structure of the webgraph may cause dramatic changes in the ranking of the
webpages. Therefore, the question of how the PageRank of a particular page
or set of pages could be increased–even slightly–by adding or removing links
to the webgraph remains of interest.
As it is well known [1, 9], if a hyperlink from a page i to a page j is
added, without no other modification in the Web, then the PageRank of j
will increase. But in general, you do not have control on the inlinks of your
webpage unless you pay another webmaster to add a hyperlink from his/her
page to your or you make an alliance with him/her by trading a link for a
link [3, 8]. But it is natural to ask how you could modify your PageRank by
yourself. This leads to analyze how the choice of the outlinks of a page can
influence its own PageRank. Sydow [17] showed via numerical simulations
that adding well chosen outlinks to a webpage may increase significantly its
PageRank ranking. Avrachenkov and Litvak [2] analyzed theoretically the
possible effect of new outlinks on the PageRank of a page and its neighbors.
Supposing that a webpage has control only on its outlinks, they gave the
optimal linkage strategy for this single page. Bianchini et al. [5] as well as
Avrachenkov and Litvak in [1] consider the impact of links between web
communities (websites or sets of related webpages), respectively on the sum
of the PageRanks and on the individual PageRank scores of the pages of
some community. They give general rules in order to have a PageRank as
high as possible but they do not provide an optimal link structure for a
website.
Our aim in this paper is to find a generalization of Avrachenkov–Litvak’s
optimal linkage strategy [2] to the case of a website with several pages. We
consider a given set of pages and suppose we have only control on the outlinks
of these pages. We are interested in the problem of maximizing the sum of
the PageRanks of these pages.
Suppose G = (N, E) be the webgraph, with a set of nodes N = {1, . . . , n}
and a set of links E ⊆ N × N. For a subset of nodes I ⊆ N, we define
EI = {(i, j) ∈ E : i, j ∈ I} the set of internal links,
Eout(I) = {(i, j) ∈ E : i ∈ I, j /∈ I} the set of external outlinks,
Ein(I) = {(i, j) ∈ E : i /∈ I, j ∈ I} the set of external inlinks,
EI = {(i, j) ∈ E : i, j /∈ I} the set of external links.
If we do not impose any condition on EI and Eout(I), the problem of
maximizing the sum of the PageRanks of pages of I is quite trivial and does
2
Preliminary version – February 2, 2008
not have much interest (see the discussion in Section 4). Therefore, when
characterizing optimal link structures, we will make the following accessibility
assumption: every page of the website must have an access to the rest
of the Web.
Our first main result concerns the optimal outlink structure for a given
website. In the case where the subgraph corresponding to the website is
strongly connected, Theorem 10 can be particularized as follows.
Theorem. Let EI, Ein(I) and EI be given. Suppose that the subgraph (I, EI)
is strongly connected and EI 6= ∅. Then every optimal outlink structure
Eout(I) is to have only one outlink to a particular page outside of I.
We are also interested in the optimal internal link structure for a website.
In the case where there is a unique leaking node in the website, that is only
one node linking to the rest of the web, Theorem 11 can be particularized
as follows.
Theorem. Let Eout(I), Ein(I) and EI be given. Suppose that there is only one
leaking node in I. Then every optimal internal link structure EI is composed
of together with every possible backward link.
Putting together Theorems 10 and 11, we get in Theorem 12 the optimal
link structure for a website. This optimal structure is illustrated in Figure 1.
Theorem. Let Ein(I) and EI be given. Then, for every optimal link structure,
EI is composed of a forward chain of links together with every possible
backward link, and Eout(I) consists of a unique outlink, starting from the last
node of the chain.
I
Figure 1: Every optimal linkage strategy for a set I of five pages must
have this structure.
This paper is organized as follows. In the following preliminary section,
we recall some graph concepts as well as the definition of the PageRank, and
3
Preliminary version – February 2, 2008
we introduce some notations. In Section 3, we develop tools for analysing the
PageRank of a set of pages I. Then we come to the main part of this paper:
in Section 4 we provide the optimal linkage strategy for a set of nodes. In
Section 5, we give some extensions and variants of the main theorems. We
end this paper with some concluding remarks.
2 Graphs and PageRank
Let G = (N, E) be a directed graph representing the Web. The webpages
are represented by the set of nodes N = {1, . . . , n} and the hyperlinks are
represented by the set of directed links E ⊆ N × N. That means that
(i, j) ∈ E if and only if there exists a hyperlink linking page i to page j.
Let us first briefly recall some usual concepts about directed graphs (see
for instance [4]). A link (i, j) is said to be an outlink for node i and an
inlink for node j. If (i, j) ∈ E, node i is called a parent of node j. By
j ← i,
we mean that j belongs to the set of children of i, that is j ∈ {k ∈ N : (i, k) ∈
E}. The outdegree di of a node i is its number of children, that is
di = |{j ∈ N : (i, j) ∈ E}|.
A path from i0 to is is a sequence of nodes hi0, i1, . . . , isi such that (ik, ik+1) ∈
E for every k = 0, 1, . . . , s − 1. A node i has an access to a node j if there
exists a path from i to j. In this paper, we will also say that a node i has an
access to a set J if i has an access to at least one node j ∈ J . The graph G
is strongly connected if every node of N has an access to every other node
of N. A set of nodes F ⊆ N is a final class of the graph G = (N, E) if the
subgraph (F, EF) is strongly connected and moreover Eout(F) = ∅ (i.e. nodes
of F do not have an access to N \ F).
Let us now briefly introduce the PageRank score (see [5, 6, 12, 13, 15]
for background). Without loss of generality (please refer to the book of
Langville and Meyer [13] or the survey of Bianchini et al. [5] for details),
we can make the assumption that each node has at least one outlink, i.e.
di 6= 0 for every i ∈ N. Therefore the n×n stochastic matrix P = [Pij ]i,j2N
given by
Pij =
(
di
−1 if (i, j) ∈ E,
0 otherwise,
is well defined and is a scaling of the adjacency matrix of G. Let also
0 < c < 1 be a damping factor and z be a positive stochastic personalization
vector, i.e. zi > 0 for all i = 1, . . . , n and zT 1 = 1, where 1 denotes the
vector of all ones. The Google matrix is then defined as
G = cP + (1 − c)1zT .
4
Preliminary version – February 2, 2008
Since z > 0 and c < 1, this stochastic matrix is positive, i.e. Gij > 0 for all
i, j. The PageRank vector is then defined as the unique invariant measure
of the matrix G, that is the unique left Perron vector of G,
T = TG,
T 1 = 1.
(1)
The PageRank of a node i is the ith entry i = T ei of the PageRank
vector.
The PageRank vector is usually interpreted as the stationary distribution
of the following Markov chain (see for instance [13]): a random surfer moves
on the webgraph, using hyperlinks between pages with a probability c and
zapping to some new page according to the personalization vector with a
probability (1−c). The Google matrix G is the probability transition matrix
of this random walk. In this stochastic interpretation, the PageRank of a
node is equal to the inverse of its mean return time, that is −1
i is the mean
number of steps a random surfer starting in node i will take for coming back
to i (see [7, 10]).
3 PageRank of a website
We are interested in characterizing the PageRank of a set I. We define this
as the sum
T eI =
X
i2I
i,
where eI denotes the vector with a 1 in the entries of I and 0 elsewhere.
Note that the PageRank of a set corresponds to the notion of energy of a
community in [5].
Let I ⊆ N be a subset of the nodes of the graph. The PageRank of I can
be expressed as T eI = (1−c)zT (I−cP)−1eI from PageRank equations (1).
Let us then define the vector
v = (I − cP)−1eI. (2)
With this, we have the following expression for the PageRank of the set I:
T eI = (1 − c)zT v. (3)
The vector v will play a crucial role throughout this paper. In this
section, we will first present a probabilistic interpretation for this vector
and prove some of its properties. We will then show how it can be used in
order to analyze the influence of some page i ∈ I on the PageRank of the
set I. We will end this section by briefly introducing the concept of basic
absorbing graph, which will be useful in order to analyze optimal linkage
strategies under some assumptions.
5
Preliminary version – February 2, 2008
3.1 Mean number of visits before zapping
Let us first see how the entries of the vector v = (I − cP)−1eI can be
interpreted. Let us consider a random surfer on the webgraph G that,
as described in Section 2, follows the hyperlinks of the webgraph with a
probability c. But, instead of zapping to some page of G with a probability
(1 − c), he stops his walk with probability (1 − c) at each step of
time. This is equivalent to consider a random walk on the extended graph
Ge = (N ∪ {n + 1}, E ∪ {(i, n + 1) : i ∈ N}) with a transition probability
matrix
Pe =

cP (1 − c)1
0 1

.
At each step of time, with probability 1−c, the random surfer can disappear
from the original graph, that is he can reach the absorbing node n + 1.
The nonnegative matrix (I −cP)−1 is commonly called the fundamental
matrix of the absorbing Markov chain defined by Pe (see for instance [10,
16]). In the extended graph Ge, the entry [(I − cP)−1]ij is the expected
number of visits to node j before reaching the absorbing node n + 1 when
starting from node i. From the point of view of the standard random surfer
described in Section 2, the entry [(I − cP)−1]ij is the expected number of
visits to node j before zapping for the first time when starting from node i.
Therefore, the vector v defined in equation (2) has the following probabilistic
interpretation. The entry vi is the expected number of visits to the
set I before zapping for the first time when the random surfer starts his
walk in node i.
Now, let us first prove some simple properties about this vector.
Lemma 1. Let v ∈ Rn
0 be defined by v = cPv + eI. Then,
(a) maxi/2I vi ≤ c maxi2I vi,
(b) vi ≤ 1 + c vi for all i ∈ N; with equality if and only if the node i does
not have an access to I,
(c) vi ≥ minj i vj for all i ∈ I; with equality if and only if the node i
does not have an access to I;
Proof. (a) Since c < 1, for all i /∈ I,
max
i/2I
vi = max
i/2I

c
X
j i
vj
di

≤ cmax
j
vj .
Since c < 1, it then follows that maxj vj = maxi2I vi.
(b) The inequality vi ≤ 1
1−c follows directly from
max
i
vi ≤ max
i

1 + c
X
j i
vj
di

≤ 1 + cmax
j
vj .
6
Preliminary version – February 2, 2008
From (a) it then also follows that vi ≤ c
1−c for all i /∈ I. Now, let
i ∈ N such that vi = 1
1−c . Then i ∈ I. Moreover,
1 + c vi = vi = 1 + c
X
j i
vj
di
,
that is vj = 1
1−c for every j ← i. Hence node j must also belong to I.
By induction, every node k such that i has an access to k must belong
to I.
(c) Let i ∈ I. Then, by (b)
1 + c vi ≥ vi = 1 + c
X
j i
vj
di
≥ 1 + cmin
j i
vj ,
so vi ≥ minj i vj for all i ∈ I. If vi = minj i vj then also 1+c vi = vi
and hence, by (b), the node i does not have an access to I.
Let us denote the set of nodes of I which on average give the most visits
to I before zapping by
V = argmax
j2I
vj .
Then the following lemma is quite intuitive. It says that, among the nodes
of I, those which provide the higher mean number of visits to I are parents
of I, i.e. parents of some node of I.
Lemma 2 (Parents of I). If Ein(I) 6= ∅, then
V ⊆ {j ∈ I : there exists ℓ ∈ I such that (j, ℓ) ∈ Ein(I)}.
If Ein(I) = ∅, then vj = 0 for every j ∈ I.
Proof. Suppose first that Ein(I) 6= ∅. Let k ∈ V with v = (I − cP)−1eI. If
we supposed that there does not exist ℓ ∈ I such that (k, ℓ) ∈ Ein(I), then
we would have, since vk > 0,
vk = c
X
j k
vj
dk
≤ cmax
j /2I
vj = cvk < vk,
which is a contradiction. Now, if Ein(I) = ∅, then there is no access to I
from I, so clearly vj = 0 for every j ∈ I.
Lemma 2 shows that the nodes j ∈ I which provide the higher value
of vj must belong to the set of parents of I. The converse is not true, as
we will see in the following example: some parents of I can provide a lower
mean number of visits to I that other nodes which are not parents of I. In
other word, Lemma 2 gives a necessary but not sufficient condition in order
to maximize the entry vj for some j ∈ I.
7
Preliminary version – February 2, 2008
1
2
3
4 5
6
7
8
9
10
11
I
Figure 2: The node 6 /∈ V and yet it is a parent of I = {1} (see Example
1).
Example 1. Let us see on an example that having (j, i) ∈ Ein(I) for some
i ∈ I is not sufficient to have j ∈ V. Consider the graph in Figure 2. Let
I = {1} and take a damping factor c = 0.85. For v = (I −cP)−1e1, we have
v2 = v3 = v4 = 4.359 > v5 = 3.521 > v6 = 3.492 > v7 > · · · > v11,
so V = {2, 3, 4}. As ensured by Lemma 2, every node of the set V is a parent
of node 1. But here, V does not contain all parents of node 1. Indeed, the
node 6 /∈ V while it is a parent of 1 and is moreover its parent with the
lowest outdegree. Moreover, we see in this example that node 5, which is a
not a parent of node 1 but a parent of node 6, gives a higher value of the
expected number of visits to I before zapping, than node 6, parent of 1.
Let us try to get some intuition about that. When starting from node 6,
a random surfer has probability one half to reach node 1 in only one step.
But he has also a probability one half to move to node 11 and to be send
far away from node 1. On the other side, when starting from node 5, the
random surfer can not reach node 1 in only one step. But with probability
3/4 he will reach one of the nodes 2, 3 or 4 in one step. And from these
nodes, the websurfer stays very near to node 1 and can not be sent far away
from it.
In the next lemma, we show that from some node i ∈ I which has an
access to I, there always exists what we call a decreasing path to I. That is,
we can find a path such that the mean number of visits to I is higher when
starting from some node of the path than when starting from the successor
of this node in the path.
Lemma 3 (Decreasing paths to I). For every i0 ∈ I which has an access
to I, there exists a path hi0, i1, . . . , isi with i1, . . . , is−1 ∈ I and is ∈ I such
8
Preliminary version – February 2, 2008
that
vi0 > vi1 > ... > vis .
Proof. Let us simply construct a decreasing path recursively by
ik+1 ∈ argmin
j ik
vj ,
as long as ik ∈ I. If ik has an access to I, then vik+1 < vik < 1
1−c by
Lemma 1(b) and (c), so the node ik+1 has also an access to I. By assumption,
i0 has an access to I. Moreover, the set I has a finite number of elements,
so there must exist an s such that is ∈ I.
3.2 Influence of the outlinks of a node
We will now see how a modification of the outlinks of some node i ∈ N can
change the PageRank of a subset of nodes I ⊆ N. So we will compare two
graphs on N defined by their set of links, E and e E respectively.
Every item corresponding to the graph defined by the set of links e E will
be written with a tilde symbol. So e P denotes its scaled adjacency matrix,
e the corresponding PageRank vector, e di = |{j : (i, j) ∈ e E}| the outdegree
of some node i in this graph, ev = (I − c e P)−1eI and eV = argmaxj2I evj .
Finally, by jf←i we mean j ∈ {k : (i, k) ∈ e E}.
So, let us consider two graphs defined respectively by their set of links E
and e E. Suppose that they differ only in the links starting from some given
node i, that is {j : (k, j) ∈ E} = {j : (k, j) ∈ e E} for all k 6= i. Then their
scaled adjacency matrices P and e P are linked by a rank one correction. Let
us then define the vector
=
X
jf i
ej
e di
−
X
j i
ej
di
,
which gives the correction to apply to the line i of the matrix P in order to
get e P.
Now let us first express the difference between the PageRank of I for two
configurations differing only in the links starting from some node i. Note
that in the following lemma the personalization vector z does not appear
explicitly in the expression of e.
Lemma 4. Let two graphs defined respectively by E and e E and let i ∈ N
such that for all k 6= i, {j : (k, j) ∈ E} = {j : (k, j) ∈ e E}. Then
eT eI = T eI + ci
T v
1 − c T (I − cP)−1ei
.
9
Preliminary version – February 2, 2008
Proof. Clearly, the scaled adjacency matrices are linked by e P = P + ei T .
Since c < 1, the matrix (I − cP)−1 exists and the PageRank vectors can be
expressed as
T = (1 − c)zT (I − cP)−1,
eT = (1 − c)zT (I − c (P + eiT ))−1.
Applying the Sherman–Morrison formula to ((I − cP) − ceiT )−1, we get
eT = (1 − c)zT (I − cP)−1 + (1 − c)zT (I − cP)−1ei
cT (I − cP)−1
1 − cT (I − cP)−1ei
,
and the result follows immediately.
Let us now give an equivalent condition in order to increase the PageRank
of I by changing outlinks of some node i. The PageRank of I increases
essentially when the new set of links favors nodes giving a higher mean
number of visits to I before zapping.
Theorem 5 (PageRank and mean number of visits before zapping). Let
two graphs defined respectively by E and e E and let i ∈ N such that for all
k 6= i, {j : (k, j) ∈ E} = {j : (k, j) ∈ e E}. Then
eT eI > T eI if and only if T v > 0
and eT eI = T eI if and only if T v = 0.
Proof. Let us first show that T (I − cP)−1ei ≤ 1 is always verified. Let
u = (I −cP)−1ei. Then u−cPu = ei and, by Lemma 1(a), uj ≤ ui for all
j. So
Tu =
X
jf i
uj
e di
−
X
j i
uj
di
≤ ui −
X
j i
uj
di
≤ ui − c
X
j i
uj
di
= 1.
Now, since c < 1 and > 0, the conclusion follows by Lemma 4.
The following Proposition 6 shows how to add a new link (i, j) starting
from a given node i in order to increase the PageRank of the set I. The
PageRank of I increases as soon as a node i ∈ I adds a link to a node j
with a larger or equal expected number of visits to I before zapping.
Proposition 6 (Adding a link). Let i ∈ I and let j ∈ N be such that
(i, j) /∈ E and vi ≤ vj . Let e E = E ∪ {(i, j)}. Then
eT eI ≥ T eI
with equality if and only if the node i does not have an access to I.
10
Preliminary version – February 2, 2008
Proof. Let i ∈ I and let j ∈ N be such that (i, j) /∈ E and vi ≤ vj . Then
1 + c
X
k i
vk
di
= vi ≤ 1 + cvi ≤ 1 + cvj ,
with equality if and only if i does not have an access to I by Lemma 1(b).
Let eE = E ∪ {(i, j)}. Then
T v =
1
di + 1

vj −
X
k i
vk
di

≥ 0,
with equality if and only if i does not have an access to I. The conclusion
follows from Theorem 5.
Now let us see how to remove a link (i, j) starting from a given node i in
order to increase the PageRank of the set I. If a node i ∈ N removes a link
to its worst child from the point of view of the expected number of visits to
I before zapping, then the PageRank of I increases.
Proposition 7 (Removing a link). Let i ∈ N and let j ∈ argmink i
vk.
Let eE = E \ {(i, j)}. Then
eT eI ≥ T eI
with equality if and only if vk = vj for every k such that (i, k) ∈ E.
Proof. Let i ∈ N and let j ∈ argmink i
vk. Let e E = E \ {(i, j)}. Then
T v =
X
k i
vk − vj
di(di − 1)
≥ 0,
with equality if and only if vk = vj for all k ← i. The conclusion follows by
Theorem 5.
In order to increase the PageRank of I with a new link (i, j), Proposition
6 only requires that vj ≤ vi. On the other side, Proposition 7 requires
that vj = mink i vk in order to increase the PageRank of I by deleting link
(i, j). One could wonder whether or not this condition could be weakened
to vj < vi, so as to have symmetric conditions for the addition or deletion
of links. In fact, this can not be done as shown in the following example.
Example 2. Let us see by an example that the condition j ∈ argmink i
vk
in Proposition 7 can not be weakened to vj < vi. Consider the graph in
Figure 3 and take a damping factor c = 0.85. Let I = {1, 2, 3}. We have
v1 = 2.63 > v2 = 2.303 > v3 = 1.533.
As ensured by Proposition 7, if we remove the link (1, 3), the PageRank of
I increases (e.g. from 0.199 to 0.22 with a uniform personalization vector
z = 1
n1), since 3 ∈ argmink 1
vk. But, if we remove instead the link (1, 2),
the PageRank of I decreases (from 0.199 to 0.179 with z uniform) even if
v2 < v1.
11
Preliminary version – February 2, 2008
1
2
3 4
5
6
7
I
Figure 3: For I = {1, 2, 3}, removing link (1, 2) gives eT eI < T eI,
even if v1 > v2 (see Example 2).
Remark. Let us note that, if the node i does not have an access to the set I,
then for every deletion of a link starting from i, the PageRank of I will not
be modified. Indeed, in this case T v = 0 since by Lemma 1(b), vj = 1
1−c
for every j ← i.
3.3 Basic absorbing graph
Now, let us introduce briefly the notion of basic absorbing graph (see Chapter
III about absorbing Markov chains in Kemeny and Snell’s book [10]).
For a given graph (N, E) and a specified subset of nodes I ⊆ N, the basic
absorbing graph is the graph (N, E0) defined by E0
out(I) = ∅, E0
I = {(i, i) : i ∈
I}, E0
in(I) = Ein(I) and E0
I = EI. In other words, the basic absorbing graph
(N, E0) is a graph constructed from (N, E), keeping the same sets of external
inlinks and external links Ein(I), EI, removing the external outlinks Eout(I)
and changing the internal link structure EI in order to have only self-links
for nodes of I.
Like in the previous subsection, every item corresponding to the basic
absorbing graph will have a zero symbol. For instance, we will write 0
for the PageRank vector corresponding to the basic absorbing graph and
V0 = argmaxj2I[(I − cP0)−1eI]j .
Proposition 8 (PageRank for a basic absorbing graph). Let a graph defined
by a set of links E and let I ⊆ N. Then
T eI ≤ T0
eI,
with equality if and only if Eout(I) = ∅.
Proof. Up to a permutation of the indices, equation (2) can be written as

I − cPI −cPout(I)
−cPin(I) I − cPI

vI
v
I

=

1
0

,
12
Preliminary version – February 2, 2008
so we get
v =

vI
c(I − cPI)−1Pin(I)
vI

. (4)
By Lemma 1(b) and since (I − cPI)−1 is a nonnegative matrix (see for
instance the chapter on M-matrices in Berman and Plemmons’s book [4]),
we then have
v ≤
1
1−c 1
c
1−c (I − cPI)−1Pin(I)1

= v0,
with equality if and only if no node of I has an access to I, that is Eout(I) = ∅.
The conclusion now follows from equation (3) and z > 0.
Let us finally prove a nice property of the set V when I = {i} is a
singleton: it is independent of the outlinks of i. In particular, it can be
found from the basic absorbing graph.
Lemma 9. Let a graph defined by a set of links E and let I = {i} Then there
exists an α 6= 0 such that (I −cP)−1ei = α(I −cP0)−1ei. As a consequence,
V = V0.
Proof. Let I = {i}. Since vI = vi is a scalar, it follows from equation (4)
that the direction of the vector v does not depend on EI and Eout(I) but
only on Ein(I) and EI.
4 Optimal linkage strategy for a website
In this section, we consider a set of nodes I. For this set, we want to choose
the sets of internal links EI ⊆ I × I and external outlinks Eout(I) ⊆ I × I
in order to maximize the PageRank score of I, that is T eI.
Let us first discuss about the constraints on E we will consider. If we do
not impose any condition on E, the problem of maximizing T eI is quite
trivial. As shown by Proposition 8, you should take in this case Eout(I) =
∅ and EI an arbitrary subset of I × I such that each node has at least
one outlink. You just try to lure the random walker to your pages, not
allowing him to leave I except by zapping according to the preference vector.
Therefore, it seems sensible to impose that Eout(I) must be nonempty.
Now, let us show that, in order to avoid trivial solutions to our maximization
problem, it is not enough to assume that Eout(I) must be nonempty.
Indeed, with this single constraint, in order to lose as few as possible visits
from the random walker, you should take a unique leaking node k ∈ I (i.e.
Eout(I) = {(k, ℓ)} for some ℓ ∈ I) and isolate it from the rest of the set I
(i.e. {i ∈ I : (i, k) ∈ EI} = ∅).
Moreover, it seems reasonable to imagine that Google penalizes (or at
least tries to penalize) such behavior in the context of spam alliances [8].
13
Preliminary version – February 2, 2008
All this discussion leads us to make the following assumption.
Assumption A (Accessibility). Every node of I has an access to at least
one node of I.
Let us now explain the basic ideas we will use in order to determine an
optimal linkage strategy for a set of webpages I. We determine some forbidden
patterns for an optimal linkage strategy and deduce the only possible
structure an optimal strategy can have. In other words, we assume that
we have a configuration which gives an optimal PageRank T eI. Then we
prove that if some particular pattern appeared in this optimal structure,
then we could construct another graph for which the PageRank eT eI is
strictly higher than T eI.
We will firstly determine the shape of an optimal external outlink structure
Eout(I), when the internal link structure EI is given, in Theorem 10.
Then, given the external outlink structure Eout(I) we will determine the possible
optimal internal link structure EI in Theorem 11. Finally, we will put
both results together in Theorem 12 in order to get the general shape of an
optimal linkage strategy for a set I when Ein(I) and EI are given.
Proofs of this section will be illustrated by several figures for which we
take the following drawing convention.
Convention. When nodes are drawn from left to right on the same horizontal
line, they are arranged by decreasing value of vj . Links are represented
by continuous arrows and paths by dashed arrows.
The first result of this section concerns the optimal outlink structure
Eout(I) for the set I, while its internal structure EI is given. An example of
optimal outlink structure is given after the theorem.
Theorem 10 (Optimal outlink structure). Let EI, Ein(I) and EI be given.
Let F1, . . . ,Fr be the final classes of the subgraph (I, EI). Let Eout(I) such
that the PageRank T eI is maximal under Assumption A. Then Eout(I) has
the following structure:
Eout(I) = Eout(F1) ∪ · · · ∪ Eout(Fr),
where for every s = 1, . . . , r,
Eout(Fs) ⊆ {(i, j) : i ∈ argmin
k2Fs
vk and j ∈ V}.
Moreover for every s = 1, . . . , r, if EFs 6= ∅, then |Eout(Fs)| = 1.
14
Preliminary version – February 2, 2008
Proof. Let EI, Ein(I) and EI be given. Suppose Eout(I) is such that T eI is
maximal under Assumption A.
We will determine the possible leaking nodes of I by analyzing three
different cases.
Firstly, let us consider some node i ∈ I such that i does not have children
in I, i.e. {k ∈ I : (i, k) ∈ EI} = ∅. Then clearly we have {i} = Fs for some
s = 1, . . . , r, with i ∈ argmink2Fs
vk and EFs = ∅. From Assumption A, we
have Eout(Fs) 6= ∅, and from Theorem 5 and the optimality assumption, we
have Eout(Fs) ⊆ {(i, j) : j ∈ V} (see Figure 4).
i ℓ j
I
Figure 4: If vj < vℓ, then eT eI > T eI with e Eout(I) = Eout(I)∪{(i, ℓ)}\
{(i, j)}.
Secondly, let us consider some i ∈ I such that i has children in I, i.e.
{k ∈ I : (i, k) ∈ EI} 6= ∅ and
vi ≤ min
k i
k2I
vk.
Let j ∈ argmink i
vk. Then j ∈ I and vj < vi by Lemma 1(c). Suppose
by contradiction that the node i would keep an access to I if we took
e Eout(I) = Eout(I) \ {(i, j)} instead of Eout(I). Then, by Proposition 7, considering
e Eout(I) instead of Eout(I) would increase strictly the PageRank of I
while Assumption A remains satisfied (see Figure 5). This would contradict
i j
I
Figure 5: If vj = mink i vk and i has another access to I, then eT eI >
T eI with eEout(I) = Eout(I) \ {(i, j)}.
the optimality assumption for Eout(I). From this, we conclude that
• the node i belongs to final class Fs of the subgraph (I, EI) with EFs 6= ∅
for some s = 1, . . . , r;
• there does not exist another ℓ ∈ I, ℓ 6= j such that (i, ℓ) ∈ Eout(I);
• there does not exist another k in the same final class Fs, k 6= i such
that such that (k, ℓ) ∈ Eout(I) for some ℓ ∈ I.
Again, by Theorem 5 and the optimality assumption, we have j ∈ V (see
Figure 4).
15
Preliminary version – February 2, 2008
Let us now notice that
max
k2I
vk < min
k2I
vk. (5)
Indeed, with i ∈ argmink2I
vk, we are in one of the two cases analyzed above
for which we have seen that vi > vj = argmaxk2I
vk.
Finally, consider a node i ∈ I that does not belong to any of the final
classes of the subgraph (I, EI). Suppose by contradiction that there exists
j ∈ I such that (i, j) ∈ Eout(I). Let ℓ ∈ argmink i
vk. Then it follows
from inequality (5) that ℓ ∈ I. But the same argument as above shows
that the link (i, ℓ) ∈ Eout(I) must be removed since Eout(I) is supposed to
be optimal (see Figure 5 again). So, there does not exist j ∈ I such that
(i, j) ∈ Eout(I) for a node i ∈ I which does not belong to any of the final
classes F1, . . . ,Fr.
Example 3. Let us consider the graph given in Figure 6. The internal link
structure EI, as well as Ein(I) and EI are given. The subgraph (I, EI) has two
final classes F1 and F2. With c = 0.85 and z the uniform probability vector,
this configuration has six optimal outlink structures (one of these solutions
is represented by bold arrows in Figure 6). Each one can be written as
Eout(I) = Eout(F1) ∪ Eout(F2), with Eout(F1) = {(4, 6)} or Eout(F1) = {(4, 7)}
and ∅ 6= Eout(F2) ⊆ {(5, 6), (5, 7)}. Indeed, since EF1 6= ∅, as stated by
Theorem 10, the final class F1 has exactly one external outlink in every
optimal outlink structure. On the other hand, the final class F2 may have
several external outlinks, since it is composed of a unique node and moreover
this node does not have a self-link. Note that V = {6, 7} in each of these six
optimal configurations, but this set V can not be determined a priori since
it depends on the chosen outlink structure.
Now, let us determine the optimal internal link structure EI for the set
I, while its outlink structure Eout(I) is given. Examples of optimal internal
structure are given after the proof of the theorem.
Theorem 11 (Optimal internal link structure). Let Eout(I), Ein(I) and EI
be given. Let L = {i ∈ I : (i, j) ∈ Eout(I) for some j ∈ I} be the set of
leaking nodes of I and let nL = |L| be the number of leaking nodes. Let
EI such that the PageRank T eI is maximal under Assumption A. Then
there exists a permutation of the indices such that I = {1, 2, . . . , nI}, L =
{nI − nL + 1, . . . , nI},
v1 > · · · > vnI−nL > vnI−nL+1 ≥ · · · ≥ vnI ,
and EI has the following structure:
EL
I ⊆ EI ⊆ EU
I ,
16
Preliminary version – February 2, 2008
1
2
3
4
5
6
7
8
I
F1
F2
Figure 6: Bold arrows represent one of the six optimal outlink structures
for this configuration with two final classes (see Example 3).
where
EL
I = {(i, j) ∈ I × I : j ≤ i} ∪ {(i, j) ∈ (I \ L) × I : j = i + 1},
EU
I = EL
I ∪ {(i, j) ∈ L × L: i < j}.
Proof. Let Eout(I), Ein(I) and EI be given. Suppose EI is such that T eI is
maximal under Assumption A.
Firstly, by Proposition 6 and since every node of I has an access to I,
every node i ∈ I links to every node j ∈ I such that vj ≥ vi (see Figure 7),
that is
{(i, j) ∈ EI : vi ≤ vj} = {(i, j) ∈ I × I : vi ≤ vj}. (6)
i
I
Figure 7: Every i ∈ I must link to every j ∈ I with vj ≥ vi.
Secondly, let (k, i) ∈ EI such that k 6= i and k ∈ I \L. Let us prove that,
if the node i has an access to I by a path hi, i1, . . . , isi such that ij 6= k for
all j = 1, . . . , s and is ∈ I, then vi < vk (see Figure 8). Indeed, if we had
vk ≤ vi then, by Lemma 1(c), there would exists ℓ ∈ I such that (k, ℓ) ∈ EI
and vℓ = minj k vj < vi ≤ vk. But, with e EI = EI \ {(k, ℓ)}, we would
have eT eI > T eI by Proposition 7 while Assumption A remains satisfied
since the node k would keep access to I via the node i (see Figure 9). That
17
Preliminary version – February 2, 2008
i k j
I
Figure 8: The node i can not have an access to I without crossing k
since in this case we should then have vi < vk.
i k ℓ
I
Figure 9: If vℓ = minj k vj, then eT eI > T eI with e Eout(I) = Eout(I) \
{(k, ℓ)}.
contradicts the optimality assumption. This leads us to the conclusion that
vk > vi for every k ∈ I\L and i ∈ L. Moreover vi 6= vk for every i, k ∈ I\L,
i 6= k. Indeed, if we had vi = vk, then (k, i) ∈ EI by (6) while by Lemma 3,
the node i would have an access to I by a path independant from k. So we
should have vi < vk.
We conclude from this that we can relabel the nodes of N such that
I = {1, 2, . . . nI}, L = {nI − nL + 1, . . . , nI} and
v1 > v2 > · · · > vnI−nL > vnI−nL+1 ≥ · · · ≥ vnI . (7)
It follows also that, for i ∈ I \ L and j > i, (i, j) ∈ EI if and only if j =
i + 1. Indeed, suppose first i < nI − nL. Then, we cannot have (i, j) ∈ EI
with j > i+1 since in this case we would contradict the ordering of the nodes
given by equation (7) (see Figure 8 again with k = i+1 and remember that
by Lemma 3, node j has an access to I by a decreasing path). Moreover,
node i must link to some node j > i in order to satisfy Assumption A, so
(i, i+1) must belong to EI. Now, consider the case i = nI −nL. Suppose we
had (i, j) ∈ EI with j > i+1. Let us first note that there can not exist two or
more different links (i, ℓ) with ℓ ∈ L since in this case we could remove one
of these links and increase strictly the PageRank of the set I. If vj = vi+1,
we could relabel the nodes by permuting these two indices. If vj < vi+1,
then with e EI = EI ∪ {(i, i + 1)} \ {(i, j)}, we would have eT eI > T eI
by Theorem 5 while Assumption A remains satisfied since the i would keep
access to I via node i + 1. That contradicts the optimality assumption. So
we have proved that
{(i, j) ∈ EI : i < j and i ∈ I \ L} = {(i, i + 1) : i ∈ I \ L}. (8)
Thirdly, it is obvious that
{(i, j) ∈ EI : i < j and i ∈ L} ⊆ {(i, j) ∈ L × L: i < j}. (9)
18
Preliminary version – February 2, 2008
The announced structure for a set EI giving a maximal PageRank score
T eI under Assumption A now follows directly from equations (6), (8)
and (9).
Example 4. Let us consider the graphs given in Figure 10. For both cases,
the external outlink structure Eout(I) with two leaking nodes, as well as Ein(I)
and EI are given. With c = 0.85 and z the uniform probability vector, the
optimal internal link structure for configuration (a) is given by EI = EL
I ,
while in configuration (b) we have EI = EU
I (bold arrows), with EL
I and EU
I
defined in Theorem 11.
I
L
(a)
I
L
(b)
Figure 10: Bold arrows represent optimal internal link structures. In (a)
we have EI = EL
I , while EI = EU
I in (b).
Finally, combining the optimal outlink structure and the optimal internal
link structure described in Theorems 10 and 11, we find the optimal linkage
strategy for a set of webpages. Let us note that, since we have here control
on both EI and Eout(I), there are no more cases of several final classes or
several leaking nodes to consider. For an example of optimal link structure,
see Figure 1.
Theorem 12 (Optimal link structure). Let Ein(I) and EI be given. Let EI
and Eout(I) such that T eI is maximal under Assumption A. Then there
exists a permutation of the indices such that I = {1, 2, . . . , nI},
v1 > · · · > vnI > vnI+1 ≥ · · · ≥ vn,
19
Preliminary version – February 2, 2008
and EI and Eout(I) have the following structure:
EI = {(i, j) ∈ I × I : j ≤ i or j = i + 1},
Eout(I) = {(nI, nI + 1)}.
Proof. Let Ein(I) and EI be given and suppose EI and Eout(I) are such that
T eI is maximal under Assumption A. Let us relabel the nodes of N such
that I = {1, 2, . . . , nI} and v1 ≥ · · · ≥ vnI > vnI+1 = maxj2I
vj . By
Theorem 11, (i, j) ∈ EI for every nodes i, j ∈ I such that j ≤ i. In particular,
every node of I has an access to node 1. Therefore, there is a unique final
class F1 ⊆ I in the subgraph (I, EI). So, by Theorem 10, Eout(I) = {(k, ℓ)}
for some k ∈ F1 and ℓ ∈ I. Without loss of generality, we can suppose that
ℓ = nI + 1. By Theorem 11 again, the leaking node k = nI and therefore
(i, i + 1) ∈ EI for every node i ∈ {1, . . . , nI − 1}.
Let us note that having a structure like described in Theorem 12 is a
necessary but not sufficient condition in order to have a maximal PageRank.
Example 5. Let us show by an example that the graph structure given in
Theorem 12 is not sufficient to have a maximal PageRank. Consider for instance
the graphs in Figure 11. Let c = 0.85 and a uniform personalization
vector z = 1
n1. Both graphs have the link structure required Theorem 12 in
order to have a maximal PageRank, with v
(a) =
􀀀
6.484 6.42 6.224 5.457
T
and v
(b) =
􀀀
6.432 6.494 6.247 5.52
T
. But the configuration (a) is
not optimal since in this case, the PageRank T
(a)
eI = 0.922 is strictly
less than the PageRank T
(b)
eI = 0.926 obtained by the configuration (b).
Let us nevertheless note that, with a non uniform personalization vector
z =
􀀀
0.7 0.1 0.1 0.1
T
, the link structure (a) would be optimal.
1 2 3 4
I
(a)
2 1 3 4
I
(b)
Figure 11: For I = {1, 2, 3}, c = 0.85 and z uniform, the link structure
in (a) is not optimal and yet it satisfies the necessary conditions of
Theorem 12 (see Example 5).
20
Preliminary version – February 2, 2008
5 Extensions and variants
Let us now present some extensions and variants of the results of the previous
section. We will first emphasize the role of parents of I. Secondly, we will
briefly talk about Avrachenkov–Litvak’s optimal link structure for the case
where I is a singleton. Then we will give variants of Theorem 12 when
self-links are forbidden or when a minimal number of external outlinks is
required. Finally, we will make some comments of the influence of external
inlinks on the PageRank of I.
5.1 Linking to parents
If some node of I has at least one parent in I then the optimal linkage strategy
for I is to have an internal link structure like described in Theorem 12
together with a single link to one of the parents of I.
Corollary 13 (Necessity of linking to parents). Let Ein(I)6= ∅ and EI be
given. Let EI and Eout(I) such that T eI is maximal under Assumption A.
Then Eout(I) = {(i, j)}, for some i ∈ I and j ∈ I such that (j, k) ∈ Ein(I)
for some k ∈ I.
Proof. This is a direct consequence of Lemma 2 and Theorem 12.
Let us nevertheless remember that not every parent of nodes of I will
give an optimal link structure, as we have already discussed in Example 1
and we develop now.
Example 6. Let us continue Example 1. We consider the graph in Figure 2 as
basic absorbing graph for I = {1}, that is Ein(I) and EI are given. We take
c = 0.85 as damping factor and a uniform personalization vector z = 1
n1.
We have seen in Example 1 than V0 = {2, 3, 4}. Let us consider the value of
the PageRank 1 for different sets EI and Eout(I):
Eout(I)
∅ {(1, 2)} {(1, 5)} {(1, 6)} {(1, 2), (1, 3)}
EI = ∅ 0.1739 0.1402 0.1392 0.1739
EI = {(1, 1)} 0.5150 0.2600 0.2204 0.2192 0.2231
As expected from Corollary 15, the optimal linkage strategy for I = {1} is
to have a self-link and a link to one of the nodes 2, 3 or 4. We note also that
a link to node 6, which is a parent of node 1 provides a lower PageRank that
a link to node 5, which is not parent of 1. Finally, if we suppose self-links
are forbidden (see below), then the optimal linkage strategy is to link to one
or more of the nodes 2, 3, 4.
In the case where no node of I has a parent in I, then every structure
like described in Theorem 12 will give an optimal link structure.
21
Preliminary version – February 2, 2008
Proposition 14 (No external parent). Let Ein(I) and EI be given. Suppose
that Ein(I) = ∅. Then the PageRank T eI is maximal under Assumption A
if and only if
EI = {(i, j) ∈ I × I : j ≤ i or j = i + 1},
Eout(I) = {(nI, nI + 1)}.
for some permutation of the indices such that I = {1, 2, . . . , nI}.
Proof. This follows directly from T eI = (1 − c)zT v and the fact that, if
Ein(I) = ∅,
v = (I − cP)−1eI =

(I − cPI)−11
0

,
up to a permutation of the indices.
5.2 Optimal linkage strategy for a singleton
The optimal outlink structure for a single webpage has already been given
by Avrachenkov and Litvak in [2]. Their result becomes a particular case of
Theorem 12. Note that in the case of a single node, the possible choices for
Eout(I) can be found a priori by considering the basic absorbing graph, since
V = V0.
Corollary 15 (Optimal link structure for a single node). Let I = {i} and
let Ein(I) and EI be given. Then the PageRank i is maximal under Assumption
A if and only if EI = {(i, i)} and Eout(I) = {(i, j)} for some j ∈ V0.
Proof. This follows directly from Lemma 9 and Theorem 12.
5.3 Optimal linkage strategy under additional assumptions
Let us consider the problem of maximizing the PageRank T eI when selflinks
are forbidden. Indeed, it seems to be often supposed that Google’s
PageRank algorithm does not take self-links into account. In this case,
Theorem 12 can be adapted readily for the case where |I| ≥ 2. When I is
a singleton, we must have EI = ∅, so Eout(I) can contain several links, as
stated in Theorem 10.
Corollary 16 (Optimal link structure with no self-links). Suppose |I| ≥ 2.
Let Ein(I) and EI be given. Let EI and Eout(I) such that T eI is maximal
under Assumption A and assumption that there does not exist i ∈ I such
that {(i, i)} ∈ EI. Then there exists a permutation of the indices such that
I = {1, 2, . . . , nI}, v1 > · · · > vnI > vnI+1 ≥ · · · ≥ vn, and EI and Eout(I)
have the following structure:
EI = {(i, j) ∈ I × I : j < i or j = i + 1},
Eout(I) = {(nI, nI + 1)}.
22
Preliminary version – February 2, 2008
Corollary 17 (Optimal link structure for a single node with no self-link).
Suppose I = {i}. Let Ein(I) and EI be given. Suppose EI = ∅. Then the
PageRank i is maximal under Assumption A if and only if ∅ 6= Eout(I) ⊆ V0.
Let us now consider the problem of maximizing the PageRank T eI
when several external outlinks are required. Then the proof of Theorem 10
can be adapted readily in order to have the following variant of Theorem 12.
Corollary 18 (Optimal link structure with several external outlinks). Let
Ein(I) and EI be given. Let EI and Eout(I) such that T eI is maximal under
Assumption A and assumption that |Eout(I)| ≥ r. Then there exists a
permutation of the indices such that I = {1, 2, . . . , nI}, v1 > · · · > vnI >
vnI+1 ≥ · · · ≥ vn, and EI and Eout(I) have the following structure:
EI = {(i, j) ∈ I × I : j < i or j = i + 1},
Eout(I) = {(nI, jk) : jk ∈ V for k = 1, . . . , r}.
5.4 External inlinks
Finally, let us make some comments about the addition of external inlinks to
the set I. It is well known that adding an inlink to a particular page always
increases the PageRank of this page [1, 9]. This can be viewed as a direct
consequence of Theorem 5 and Lemma 1. The case of a set of several pages
I is not so simple. We prove in the following theorem that, if the set I has
a link structure as described in Theorem 12 then adding an inlink to a page
of I from a page j ∈ I which is not a parent of some node of I will increase
the PageRank of I. But in general, adding an inlink to some page of I from
I may decrease the PageRank of the set I, as shown in Examples 7 and 8.
Theorem 19 (External inlinks). Let I ⊆ N and a graph defined by a set
of links E. If
min
i2I
vi > max
j /2I
vj ,
then, for every j ∈ I which is not a parent of I, and for every i ∈ I, the
graph defined by e E = E ∪ {(j, i)} gives eT eI > T eI.
Proof. This follows directly from Theorem 5.
Example 7. Let us show by an example that a new external inlink is not
always profitable for a set I in order to improve its PageRank, even if I has
an optimal linkage strategy. Consider for instance the graph in Figure 12.
With c = 0.85 and z uniform, we have T eI = 0.8481. But if we consider
the graph defined by e Ein(I) = Ein(I)∪{(3, 2)}, then we have eT eI = 0.8321 <
T eI.
23
Preliminary version – February 2, 2008
1 2 3
I
Figure 12: For I = {1, 2}, adding the external inlink (3, 2) gives eT eI <
T eI (see Example 7).
Example 8. A new external inlink does not not always increase the PageRank
of a set I in even if this new inlink comes from a page which is not already a
parent of some node of I. Consider for instance the graph in Figure 13. With
c = 0.85 and z uniform, we have T eI = 0.6. But if we consider the graph
defined by eEin(I) = Ein(I) ∪ {(4, 3)}, then we have eT eI = 0.5897 < T eI.
1 2 3
5 4
I
Figure 13: For I = {1, 2, 3}, adding the external inlink (4, 3) gives
eT eI < T eI (see Example 8).
6 Conclusions
In this paper we provide the general shape of an optimal link structure
for a website in order to maximize its PageRank. This structure with a
forward chain and every possible backward links may be not intuitive. At
our knowledge, it has never been mentioned, while topologies like a clique,
a ring or a star are considered in the literature on collusion and alliance
between pages [3, 8]. Moreover, this optimal structure gives new insight
into the affirmation of Bianchini et al. [5] that, in order to maximize the
PageRank of a website, hyperlinks to the rest of the webgraph “should be
in pages with a small PageRank and that have many internal hyperlinks”.
More precisely, we have seen that the leaking pages must be choosen with
respect to the mean number of visits before zapping they give to the website,
rather than their PageRank.
Let us now present some possible directions for future work.
We have noticed in Example 5 that the first node of I in the forward
chain of an optimal link structure is not necessarily a child of some node of
I. In the example we gave, the personalization vector was not uniform. We
wonder if this could occur with a uniform personalization vector and make
the following conjecture.
24
Preliminary version – February 2, 2008
Conjecture. Let Ein(I)6= ∅ and EI be given. Let EI and Eout(I) such that
T eI is maximal under Assumption A. If z = 1
n1, then there exists j ∈ I
such that (j, i) ∈ Ein(I), where i ∈ argmaxk
vk.
If this conjecture was true we could also ask if the node j ∈ I such that
(j, i) ∈ Ein(I) where i ∈ argmaxk
vk belongs to V.
Another question concerns the optimal linkage strategy in order to maximize
an arbitrary linear combination of the PageRanks of the nodes of I.
In particular, we could want to maximize the PageRank T eS of a target
subset S ⊆ I by choosing EI and Eout(I) as usual. A general shape for
an optimal link structure seems difficult to find, as shown in the following
example.
Example 9. Consider the graphs in Figure 14. In both cases, let c = 0.85
and z = 1
n1. Let I = {1, 2, 3} and let S = {1, 2} be the target set. In the
configuration (a), the optimal sets of links EI and Eout(I) for maximizing
T eS has the link structure described in Theorem 12. But in (a), the
optimal EI and Eout(I) do not have this structure. Let us note nevertheless
that, by Theorem 12, the subsets ES and Eout(S) must have the link structure
described in Theorem 12.
1 2 3 4 5
6
7
8
S
I
(a)
1 2 3 4
S
I
(b)
Figure 14: In (a) and (b), bold arrows represent optimal link structures
for I = {1, 2, 3} with respect to a target set S = {1, 2} (see Example 9).
Acknowledgements
This paper presents research supported by a grant “Actions de recherche
concert´ees – Large Graphs and Networks” of the “Communaut´e Fran¸caise
de Belgique” and by the Belgian Network DYSCO (Dynamical Systems,
Control, and Optimization), funded by the Interuniversity Attraction Poles
25
Preliminary version – February 2, 2008
Programme, initiated by the Belgian State, Science Policy Office. The second
author was supported by a research fellow grant of the “Fonds de la
Recherche Scientifique – FNRS” (Belgium). The scientific responsibility
rests with the authors.
References
[1] Konstantin Avrachenkov and Nelly Litvak, Decomposition of the Google
PageRank and optimal linking strategy, Tech. report, INRIA, 2004,
http://www.inria.fr/rrrt/rr-5101.html.
[2] , The effect of new links on Google PageRank, Stoch. Models 22
(2006), no. 2, 319–331.
[3] Ricardo Baeza-Yates, Carlos Castillo, and Vicente L´opez, PageRank
increase under different collusion topologies, First International
Workshop on Adversarial Information Retrieval on the Web, 2005,
http://airweb.cse.lehigh.edu/2005/baeza-yates.pdf.
[4] Abraham Berman and Robert J. Plemmons, Nonnegative matrices in
the mathematical sciences, Classics in Applied Mathematics, vol. 9,
Society for Industrial and Applied Mathematics (SIAM), Philadelphia,
PA, 1994.
[5] Monica Bianchini, Marco Gori, and Franco Scarselli, Inside PageRank,
ACM Trans. Inter. Tech. 5 (2005), no. 1, 92–128.
[6] Sergey Brin and Lawrence Page, The anatomy of a large-scale hypertextual
web search engine, Computer Networks and ISDN Systems
30 (1998), no. 1–7, 107–117, Proceedings of the Seventh International
World Wide Web Conference, April 1998.
[7] Grace E. Cho and Carl D. Meyer, Markov chain sensitivity measured by
mean first passage times, Linear Algebra Appl. 316 (2000), no. 1-3, 21–
28, Conference Celebrating the 60th Birthday of Robert J. Plemmons
(Winston-Salem, NC, 1999).
[8] Zolt´an Gy¨ongyi and Hector Garcia-Molina, Link spam alliances,
VLDB ’05: Proceedings of the 31st international conference
on Very large data bases, VLDB Endowment, 2005,
http://portal.acm.org/citation.cfm?id=1083654, pp. 517–528.
[9] Ilse C. F. Ipsen and Rebecca S. Wills, Mathematical properties and
analysis of Google’s PageRank, Bol. Soc. Esp. Mat. Apl. 34 (2006),
191–196.
26
Preliminary version – February 2, 2008
[10] John G. Kemeny and J. Laurie Snell, Finite Markov chains, The University
Series in Undergraduate Mathematics, D. Van Nostrand Co.,
Inc., Princeton, N.J.-Toronto-London-New York, 1960.
[11] Steve Kirkland, Conditioning of the entries in the stationary vector of a
Google-type matrix, Linear Algebra Appl. 418 (2006), no. 2-3, 665–681.
[12] Amy N. Langville and Carl D. Meyer, Deeper inside PageRank, Internet
Math. 1 (2004), no. 3, 335–380.
[13] , Google’s PageRank and beyond: the science of search engine
rankings, Princeton University Press, Princeton, NJ, 2006.
[14] Ronny Lempel and Shlomo Moran, Rank-stability and rank-similarity
of link-based web ranking algorithms in authority-connected graphs., Inf.
Retr. 8 (2005), no. 2, 245–264.
[15] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd,
The PageRank citation ranking: Bringing order to the Web, Tech.
report, Computer Science Department, Stanford University, 1998,
http://dbpubs.stanford.edu:8090/pub/1999-66.
[16] Eugene Seneta, Nonnegative matrices and Markov chains, second ed.,
Springer Series in Statistics, Springer-Verlag, New York, 1981.
[17] Marcin Sydow, Can one out-link change your PageRank?, Advances
in Web Intelligence (AWIC 2005), Lecture Notes in Computer Science,
vol. 3528, Springer Berlin Heidelberg, 2005, pp. 408–414.
27
Preliminary version – February 2, 2008

Justin Goldberg

Tired of ugly japanese, chinese, or other CJK fonts? I hate reading english text on web pages using CJK character sets because the fonts are so ugly and unhinted.

Here's how to fix this problem.

Open Mozilla Firefox's about:config address. Search for font.na. Find your languages character set (it's usually in the html meta tags, type ctrl-u to view the source html, then press ctrl-f to search).

For Japanese text the charset will be font.name-list.sans-serif.ja. The default is "MS PGothic, ＭＳＰゴシック, MS Gothic, MS PMincho, MS Mincho". Simply add your unicode fonts to the beginning of the list, separated by commas, in the order that you want your operating system to use them. In Windows you can add Arial Unicode MS, so your list will look like this: "Arial Unicode MS, MS PGothic, ＭＳＰゴシック, MS Gothic, MS PMincho, MS Mincho". The Code2000 typeface supposedly supports the most characters, but is ugly as hell. Lucida Sans Unicode or Bitstream Cyberbit is probably much better. There's a great list of unicode fonts in wikipedia.

What looks better? The first text, or the second one? (note: these fonts are scaled up in size)

Justin Goldberg

I think I know the secret sauce of hittail. It might possibly be the distance between words in the page. It might also be the distance within a paragraph, or other delineators.

I know this because searching for a phrase for example "Rockbox is an open source firmware replacement for a growing number of digital audio players" shows 5 results

this one shows 7:
"Rockbox is an open source firmware replacement for a growing number of digital"

this one shows 1020:
"Rockbox is an open source firmware replacement for a growing number"

this one shows 1330:
"Rockbox is an open source firmware replacement"

The greater the distance between the words the more under-performing potential energy the phrase has.

Discussion continues here.

This is my intuitive hunch after my entirely unrelated examination of many serps. What do you think?

Justin Goldberg

This text was never before posted anywhere on the 'net, so I posted it here. It's what is displayed whenever you unblock someone from AIM or Google-talk.

will show in your chat list depending on how often you email them. It's Magic! Undo

Justin Goldberg

I translated this email "NEW Wordze Google Database Offer!" with this site's jive translator.

Great news! Today we's iz happy ta announce da launch o' our free Google keyword download service at Wordze.

For da last month we's gots been downloading Google top terms an' storing dem into uh database fo' anyone ta download in an excel format.

As an affiliate o' Wordze ya can use dis here ta yo' advantage by posting da link ta da download form, an' picking up uh sale or two when dey join da newsletter ta download da database.

The database iz in real tyme, so it’s always updating an' getting mo' data. As I type dis here, we's gots over 90,000 top Google terms fo' yo' readers ta download. All dey gots ta do iz provide they first name, an' uh valid email address.

This offer will allow us ta he`p ya sell yo' customers into Wordze, as we's will be able ta send dem updates on other offers, an' deals dat we's run each month, including uh seminar dat we's will be holding online in uh couple months.

To tell ya readers about dis here offer, all ya gots ta do iz send dem to:
http://www.wordze.com/Gtrends?roia=!YzUxMgBVAAAU7EEAAY_m

Please remember ta change da roia= value ta yo' own affiliate roia code. We gots also added dis here option ta da affiliate link list, as we's plan ta continue ta support dis here service fo' uh while.

As always, we's wish ya da bomb o' luck, an' iz here ta he`p ya make money wiff Wordze!

The Wordze Team all ye damn hood ratz..

Justin Goldberg

Has anyone else seen this "add to cart to see price" thing on Amazon?

Here's an example.

Perhaps they are trying to stop people price comparison shopping or using automated methods of downloading price lists. Either way it still seems fraudulent, or just plain BS.

Amazon's explanation says an additional discount is in effect, and this discount is calculated in the shopping cart. It still seems extremely shoddy, as in they are giving logged in people different prices based on what they've bought in the past. It seems this is called GROI, or gross return on inventory. It's still BS

Justin Goldberg

Sorry everyone, the original Hulu invites are all gone. If you setup a Google Alert for site:hulu.com/beta, the newest provisioned invites will show up. If you want one soonder perhaps you'll find one here if more are being given away. There are 30 availabe through this link: http://www.hulu.com/beta/megaleecher

My original post is below:

Found these hulu invitations in Google:

1528 hulu invites:
phttp://www.hulu.com/beta/wired

1508
http://www.hulu.com/beta/gigaom

226
http://www.hulu.com/beta/techcrunch

It's amazing what you can find in Google just from poking around.

Trackback

Justin Goldberg

Google docs now supports HTTPS. It's enabled for spreadsheets, presentations (powerpoint work-alike), and documents (writely). Previously you could enable https, but only on the docs.google.com domain (eg only in writely) , and only by typing the https URL directly. This is great news for anyone editing documents online. See the screenshot for proof.

Monday, September 22, 2008

Thursday, August 21, 2008

Thursday, July 10, 2008

Wednesday, June 11, 2008

Sunday, June 8, 2008

Thursday, May 29, 2008

Monday, May 19, 2008

Sunday, April 20, 2008

Wednesday, April 9, 2008

Friday, April 4, 2008

Tuesday, April 1, 2008

Tuesday, March 18, 2008

Wednesday, March 12, 2008

Saturday, March 8, 2008

Friday, March 7, 2008

Friday, February 15, 2008

Wednesday, February 6, 2008

Thursday, January 31, 2008

Twitter

Contact Me Below

Blog Archive